Quickstart
A pipeline is an automated flow that pulls data from a source system — like an S3 bucket or a SharePoint site — and sends each record through an agent you've configured. Every record gets its own fresh agent session, and the agent decides what to do with it: index it into a corpus, extract structured information, route it somewhere else, or discard it. This gives you the full flexibility of an agent applied to bulk data, without writing any orchestration code yourself.
You define a pipeline with three things: where the data comes from (the source), when it should run (the trigger), and which agent processes each record (the transform). The pipeline handles the rest — fetching records from the source, creating an agent session per record, tracking progress, and retrying failures. In incremental mode (the default), pipelines track what's already been processed so subsequent runs only pick up new or changed records.
This guide walks through creating a pipeline that reads files from an S3 bucket every 6 hours and sends each one to an agent for processing. For a deeper explanation of sources, triggers, transforms, verification, and sync modes, see Concepts.
Prerequisites
- An agent that knows how to process a
file (for example, one that uses the
structured_document_indextool to index documents into a corpus). - An S3 bucket with read permissions for the files you want to process.
Create a pipeline
Send a POST request to the pipelines endpoint with the source, trigger,
and transform configuration:
CREATE A PIPELINE
Code example with bash syntax.1
The body fields:
source— where data comes from. The S3 credentials are encrypted at rest and never returned in API responses. See S3 for details.trigger— when the pipeline runs.cronuses a standard 5-field cron expression in UTC.transform— the agent that processes each file. Each source record creates a fresh agent session.sync_mode—incrementalonly processes new or changed files since the last run;full_refreshprocesses everything on every run.
Trigger a run
The pipeline will start running on its cron schedule, but you can also trigger a run immediately:
TRIGGER A PIPELINE RUN
Code example with bash syntax.1
Check progress
List the runs to see status and record counts:
LIST PIPELINE RUNS
Code example with bash syntax.1
The response includes each run's status and record counts:
PIPELINE RUNS RESPONSE
Code example with json syntax.1
Next steps
- Read pipeline concepts for a full explanation of sources, triggers, transforms, verification, and sync modes.
- See the S3 source for all S3 configuration options and incremental sync behavior.
- Learn how to handle failed records with the dead letter queue.