Version: 2.0

Session metadata

When a pipeline run processes a source record, it creates a new agent session and attaches structured metadata to that session. The metadata records which pipeline and run created the session, which source record it came from, the record's processing status, and the record's source-system metadata and permissions. You can read it from the session's metadata field.

Top-level shape

SESSION METADATA

Code example with json syntax.

Field	Type	Nullable	Description
`pipeline_key`	string	No	The pipeline that created the session.
`run_id`	string	No	Id of the pipeline run that created the session.
`source_record_id`	string	No	The source system's own id for the record (for example, a Drive file id), not a Vectara-issued id.
`watermark`	string	Yes	Version marker of the source record at fetch time, such as a modified timestamp or `etag`. Used to decide whether an already-processed record needs reprocessing. See Sync mode.
`type`	enum	No	`pipeline_transform` (worker) or `pipeline_judge` (judge). See Session types.
`worker_session_key`	string	Yes	For `pipeline_judge` sessions, the key of the worker session being verified. Null for transform sessions.
`status`	enum	No	`pending` at creation, then `succeeded` or `failed` after the record completes. See Status lifecycle.
`error_message`	string	Yes	The failure message. Populated only when `status` is `failed`.
`source_record_metadata`	object	No	Source-system metadata for the record. Defaults to empty. See Source record metadata.

Session types

Every record processed by a pipeline produces a transform session. If the pipeline configures verification with a judge agent, each transform session is additionally checked by a judge session.

pipeline_transform: the worker session that fetches the source record, uploads it to the agent, and runs the transform. worker_session_key is null.
pipeline_judge: the judge session that validates a worker session's output. It carries the same source_record_id, watermark, and source_record_metadata as the worker session it judges, and sets worker_session_key to that worker's session key.

Status lifecycle

A session is created with status set to pending. After the record finishes processing, the status is updated to succeeded, or to failed with error_message populated. Records that fail are added to the dead letter queue.

On a later run of the same pipeline, a record whose latest transform session is succeeded and whose watermark is unchanged is skipped as already processed. A retry run reprocesses regardless.

Source record metadata

source_record_metadata splits a record's metadata into three parts, populated by the source connector at fetch time. Each part defaults to empty when the source provides nothing for it.

Field	Type	Description
`system_metadata`	object	Metadata the source system manages, such as size, timestamps, etag, and parent path.
`user_metadata`	object	Metadata the owner of the source object attached, such as S3 object tags or SharePoint custom columns.
`acl_metadata`	object	Access-control information the source resolved at fetch time. See ACL metadata.

ACL metadata

acl_metadata describes a record's effective permissions in a single, source-independent shape. Every source connector resolves whatever access model it has (Drive permissions, S3 object ACLs, and so on) down to these same fields, so you can reason about access without knowing which connector produced the record.

note

acl_metadata is descriptive metadata on the pipeline session. Vectara does not enforce document-level access from it automatically. Serving restricts results only when a query supplies a metadata_filter. To enforce these grants, map the record's ACL into document metadata at index time and apply a matching query filter built from the caller's verified identity.

Permissions are organized along two axes:

Principal type: a direct user, a group, a whole organization or domain, or the public ("anyone").
Role: none, reader, commenter, or editor, ordered from least to most permissive.

ACL_METADATA

Code example with json syntax.

The bucket name encodes both the principal type and the role, so a reader entry in group_readers is a group granted read access.

Field	Type	Principal
`owners`	array	Direct users
`editors`	array	Direct users
`commenters`	array	Direct users
`readers`	array	Direct users
`group_editors`	array	Groups
`group_commenters`	array	Groups
`group_readers`	array	Groups
`public_access`	enum	Anyone
`org_wide_access`	enum	Whole organization or domain

Each entry in a bucket array is a principal identifier as the source reports it, such as an email address or a source-specific id (for example, an AWS canonical user id). Do not assume entries are always emails. See each source's own page for the exact identifiers and grants it emits.

public_access and org_wide_access collapse to a single level each. If multiple grants exist (for example, several domains), the highest level wins.

Reading the ACL fields

Every ACL field distinguishes two states. Check for null before iterating a bucket.

null means the concept does not apply to this source at all. A source that doesn't model groups leaves the group_* buckets null. A source with no notion of public sharing leaves public_access null.
empty array (bucket fields) or none (public_access, org_wide_access) means the concept applies and the source was checked, but no grants were found.

A single connector is consistent across the sibling group_* buckets: it either populates all three (using [] for empty buckets) or leaves all three null, depending on whether the source models groups at all.

Owners are always direct users. There is no group_owners bucket, and no owner-level public_access or org_wide_access.

See sources for more information on all the available sources.

Top-level shape​

SESSION METADATA

Session types​

Status lifecycle​

Source record metadata​

ACL metadata​

ACL_METADATA

Reading the ACL fields​

Related​