S3
The S3 source reads objects from Amazon S3 or any S3-compatible service (MinIO, Ceph, etc.). Each object in the bucket becomes a record in the pipeline — its contents are uploaded to a new agent session for processing.
Configuration
SOURCE FIELD (S3)
Code example with json syntax.1
Fields
| Field | Required | Description |
|---|---|---|
bucket | Yes | The S3 bucket name. |
region | Yes | The region of the S3-compatible service (e.g. us-east-1). |
access_key_id | Yes | AWS access key ID. Encrypted at rest and never returned in responses. |
secret_access_key | Yes | AWS secret access key. Encrypted at rest and never returned in responses. |
prefix | No | Key prefix to scope ingestion to a subset of objects (e.g. legal/contracts/). |
endpoint_url | No | Custom endpoint URL for S3-compatible services. If omitted, defaults to AWS S3. |
Using with S3-compatible services
Set endpoint_url to point at your service. The rest of the configuration
works the same as with AWS S3.
SOURCE FIELD (S3-COMPATIBLE, MINIO)
Code example with json syntax.1
How records are fetched
Each run lists objects in the bucket using the S3 ListObjectsV2 API,
scoped by the optional prefix. Folder markers (empty keys ending in /)
are skipped. The pipeline paginates through the full listing, processing
each object concurrently in small batches.
The pipeline captures an upper-bound timestamp at the start of each run
and only processes objects whose lastModified is at or before that
timestamp. This ensures that objects added to the bucket while a run is
in progress are left for the next run — they aren't partially processed.
Incremental sync
When sync_mode is incremental (the default), the pipeline tracks a
watermark based on each object's lastModified timestamp. After a
successful run, the watermark advances to the upper bound captured at the
start of that run.
On the next run, only objects with lastModified > stored_watermark are
processed. This ensures:
- New files are picked up.
- Modified files (S3 updates
lastModifiedon overwrite) are reprocessed. - Unchanged files are skipped, keeping costs low.
Deleted files are not explicitly tracked — they simply stop appearing in the listing.
Permissions
The credentials you provide need these S3 permissions on the bucket:
s3:ListBucket— to enumerate objects in the bucket (scoped by prefix).s3:GetObject— to download each object's contents.
Example IAM policy:
MINIMAL IAM POLICY
Code example with json syntax.1