Skip to main content
Version: 2.0

Quickstart

A pipeline is an automated flow that pulls data from a source system — like an S3 bucket or a SharePoint site — and sends each record through an agent you've configured. Every record gets its own fresh agent session, and the agent decides what to do with it: index it into a corpus, extract structured information, route it somewhere else, or discard it. This gives you the full flexibility of an agent applied to bulk data, without writing any orchestration code yourself.

You define a pipeline with three things: where the data comes from (the source), when it should run (the trigger), and which agent processes each record (the transform). The pipeline handles the rest — fetching records from the source, creating an agent session per record, tracking progress, and retrying failures. In incremental mode (the default), pipelines track what's already been processed so subsequent runs only pick up new or changed records.

This guide walks through creating a pipeline that reads files from an S3 bucket every 6 hours and sends each one to an agent for processing. For a deeper explanation of sources, triggers, transforms, verification, and sync modes, see Concepts.

Prerequisites

  • An agent that knows how to process a file (for example, one that uses the structured_document_index tool to index documents into a corpus).
  • An S3 bucket with read permissions for the files you want to process.

Create a pipeline

Send a POST request to the pipelines endpoint with the source, trigger, and transform configuration:

CREATE A PIPELINE

Code example with bash syntax.
1

The body fields:

  • source — where data comes from. The S3 credentials are encrypted at rest and never returned in API responses. See S3 for details.
  • trigger — when the pipeline runs. cron uses a standard 5-field cron expression in UTC.
  • transform — the agent that processes each file. Each source record creates a fresh agent session.
  • sync_modeincremental only processes new or changed files since the last run; full_refresh processes everything on every run.

Trigger a run

The pipeline will start running on its cron schedule, but you can also trigger a run immediately:

TRIGGER A PIPELINE RUN

Code example with bash syntax.
1

Check progress

List the runs to see status and record counts:

LIST PIPELINE RUNS

Code example with bash syntax.
1

The response includes each run's status and record counts:

PIPELINE RUNS RESPONSE

Code example with json syntax.
1

Next steps

  • Read pipeline concepts for a full explanation of sources, triggers, transforms, verification, and sync modes.
  • See the S3 source for all S3 configuration options and incremental sync behavior.
  • Learn how to handle failed records with the dead letter queue.