Skip to main content
Version: 2.0

S3

The S3 source reads objects from Amazon S3 or any S3-compatible service (MinIO, Ceph, etc.). Each object in the bucket becomes a record in the pipeline — its contents are uploaded to a new agent session for processing.

Configuration

SOURCE FIELD (S3)

Code example with json syntax.
1

Fields

FieldRequiredDescription
bucketYesThe S3 bucket name.
regionYesThe region of the S3-compatible service (e.g. us-east-1).
access_key_idYesAWS access key ID. Encrypted at rest and never returned in responses.
secret_access_keyYesAWS secret access key. Encrypted at rest and never returned in responses.
prefixNoKey prefix to scope ingestion to a subset of objects (e.g. legal/contracts/).
endpoint_urlNoCustom endpoint URL for S3-compatible services. If omitted, defaults to AWS S3.

Using with S3-compatible services

Set endpoint_url to point at your service. The rest of the configuration works the same as with AWS S3.

SOURCE FIELD (S3-COMPATIBLE, MINIO)

Code example with json syntax.
1

How records are fetched

Each run lists objects in the bucket using the S3 ListObjectsV2 API, scoped by the optional prefix. Folder markers (empty keys ending in /) are skipped. The pipeline paginates through the full listing, processing each object concurrently in small batches.

The pipeline captures an upper-bound timestamp at the start of each run and only processes objects whose lastModified is at or before that timestamp. This ensures that objects added to the bucket while a run is in progress are left for the next run — they aren't partially processed.

Incremental sync

When sync_mode is incremental (the default), the pipeline tracks a watermark based on each object's lastModified timestamp. After a successful run, the watermark advances to the upper bound captured at the start of that run.

On the next run, only objects with lastModified > stored_watermark are processed. This ensures:

  • New files are picked up.
  • Modified files (S3 updates lastModified on overwrite) are reprocessed.
  • Unchanged files are skipped, keeping costs low.

Deleted files are not explicitly tracked — they simply stop appearing in the listing.

Permissions

The credentials you provide need these S3 permissions on the bucket:

  • s3:ListBucket — to enumerate objects in the bucket (scoped by prefix).
  • s3:GetObject — to download each object's contents.

Example IAM policy:

MINIMAL IAM POLICY

Code example with json syntax.
1