Session metadata
When a pipeline run processes a source record, it creates a new agent session and
attaches structured metadata to that session. The metadata records which pipeline
and run created the session, which source record it came from, the record's
processing status, and the record's source-system metadata and permissions. You
can read it from the session's metadata field.
Top-level shape
SESSION METADATA
Code example with json syntax.1
| Field | Type | Nullable | Description |
|---|---|---|---|
pipeline_key | string | no | The pipeline that created the session. |
run_id | string | no | Id of the pipeline run that created the session. |
source_record_id | string | no | The source system's own id for the record (for example, a Drive file id), not a Vectara-issued id. |
watermark | string | yes | Version marker of the source record at fetch time, such as a modified timestamp or etag. Used to decide whether an already-processed record needs reprocessing. See Sync mode. |
type | enum | no | pipeline_transform (worker) or pipeline_judge (judge). See Session types. |
worker_session_key | string | yes | For pipeline_judge sessions, the key of the worker session being verified. Null for transform sessions. |
status | enum | no | pending at creation, then succeeded or failed after the record completes. See Status lifecycle. |
error_message | string | yes | The failure message. Populated only when status is failed. |
source_record_metadata | object | no | Source-system metadata for the record. Defaults to empty. See Source record metadata. |
Session types
Every record processed by a pipeline produces a transform session. If the pipeline configures verification with a judge agent, each transform session is additionally checked by a judge session.
pipeline_transform: the worker session that fetches the source record, uploads it to the agent, and runs the transform.worker_session_keyis null.pipeline_judge: the judge session that validates a worker session's output. It carries the samesource_record_id,watermark, andsource_record_metadataas the worker session it judges, and setsworker_session_keyto that worker's session key.
Status lifecycle
A session is created with status set to pending. After the record finishes
processing, the status is updated to succeeded, or to failed with
error_message populated. Records that fail are added to the
dead letter queue.
On a later run of the same pipeline, a record whose latest transform session is
succeeded and whose watermark is unchanged is skipped as already processed. A
retry run reprocesses regardless.
Source record metadata
source_record_metadata splits a record's metadata into three parts, populated by
the source connector at fetch time. Each part defaults to empty when the source
provides nothing for it.
| Field | Type | Description |
|---|---|---|
system_metadata | object | Metadata the source system manages, such as size, timestamps, etag, and parent path. |
user_metadata | object | Metadata the owner of the source object attached, such as S3 object tags or SharePoint custom columns. |
acl_metadata | object | Access-control information the source resolved at fetch time. See ACL metadata. |
ACL metadata
acl_metadata describes a record's effective permissions in a single,
source-independent shape. Every source connector resolves whatever access model it
has (Drive permissions, S3 object ACLs, and so on) down to these same fields, so
you can reason about access without knowing which connector produced the record.
Permissions are organized along two axes:
- Principal type: a direct user, a group, a whole organization or domain, or the public ("anyone").
- Role:
none,reader,commenter, oreditor, ordered from least to most permissive.
ACL_METADATA
Code example with json syntax.1
The bucket name encodes both the principal type and the role, so a reader entry
in group_readers is a group granted read access.
| Field | Type | Principal |
|---|---|---|
owners | array | direct users |
editors | array | direct users |
commenters | array | direct users |
readers | array | direct users |
group_editors | array | groups |
group_commenters | array | groups |
group_readers | array | groups |
public_access | enum | anyone |
org_wide_access | enum | whole organization or domain |
Each entry in a bucket array is a principal identifier as the source reports it, such as an email address or a source-specific id (for example, an AWS canonical user id). Do not assume entries are always emails. See each source's own page for the exact identifiers and grants it emits.
public_access and org_wide_access collapse to a single level each. If multiple
grants exist (for example, several domains), the highest level wins.
Reading the ACL fields
Every ACL field distinguishes two states. Check for null before iterating a bucket.
nullmeans the concept does not apply to this source at all. A source that doesn't model groups leaves thegroup_*buckets null. A source with no notion of public sharing leavespublic_accessnull.- empty array (bucket fields) or
none(public_access,org_wide_access) means the concept applies and the source was checked, but no grants were found.
A single connector is consistent across the sibling group_* buckets: it either
populates all three (using [] for empty buckets) or leaves all three null,
depending on whether the source models groups at all.
Owners are always direct users. There is no group_owners bucket, and no
owner-level public_access or org_wide_access.
For which buckets each source fills and what identifiers it emits, see the individual source pages, such as Google Drive and S3.