Skip to main content
Version: 2.0

Session metadata

When a pipeline run processes a source record, it creates a new agent session and attaches structured metadata to that session. The metadata records which pipeline and run created the session, which source record it came from, the record's processing status, and the record's source-system metadata and permissions. You can read it from the session's metadata field.

Top-level shape

SESSION METADATA

Code example with json syntax.
1
FieldTypeNullableDescription
pipeline_keystringnoThe pipeline that created the session.
run_idstringnoId of the pipeline run that created the session.
source_record_idstringnoThe source system's own id for the record (for example, a Drive file id), not a Vectara-issued id.
watermarkstringyesVersion marker of the source record at fetch time, such as a modified timestamp or etag. Used to decide whether an already-processed record needs reprocessing. See Sync mode.
typeenumnopipeline_transform (worker) or pipeline_judge (judge). See Session types.
worker_session_keystringyesFor pipeline_judge sessions, the key of the worker session being verified. Null for transform sessions.
statusenumnopending at creation, then succeeded or failed after the record completes. See Status lifecycle.
error_messagestringyesThe failure message. Populated only when status is failed.
source_record_metadataobjectnoSource-system metadata for the record. Defaults to empty. See Source record metadata.

Session types

Every record processed by a pipeline produces a transform session. If the pipeline configures verification with a judge agent, each transform session is additionally checked by a judge session.

  • pipeline_transform: the worker session that fetches the source record, uploads it to the agent, and runs the transform. worker_session_key is null.
  • pipeline_judge: the judge session that validates a worker session's output. It carries the same source_record_id, watermark, and source_record_metadata as the worker session it judges, and sets worker_session_key to that worker's session key.

Status lifecycle

A session is created with status set to pending. After the record finishes processing, the status is updated to succeeded, or to failed with error_message populated. Records that fail are added to the dead letter queue.

On a later run of the same pipeline, a record whose latest transform session is succeeded and whose watermark is unchanged is skipped as already processed. A retry run reprocesses regardless.

Source record metadata

source_record_metadata splits a record's metadata into three parts, populated by the source connector at fetch time. Each part defaults to empty when the source provides nothing for it.

FieldTypeDescription
system_metadataobjectMetadata the source system manages, such as size, timestamps, etag, and parent path.
user_metadataobjectMetadata the owner of the source object attached, such as S3 object tags or SharePoint custom columns.
acl_metadataobjectAccess-control information the source resolved at fetch time. See ACL metadata.

ACL metadata

acl_metadata describes a record's effective permissions in a single, source-independent shape. Every source connector resolves whatever access model it has (Drive permissions, S3 object ACLs, and so on) down to these same fields, so you can reason about access without knowing which connector produced the record.

Permissions are organized along two axes:

  • Principal type: a direct user, a group, a whole organization or domain, or the public ("anyone").
  • Role: none, reader, commenter, or editor, ordered from least to most permissive.

ACL_METADATA

Code example with json syntax.
1

The bucket name encodes both the principal type and the role, so a reader entry in group_readers is a group granted read access.

FieldTypePrincipal
ownersarraydirect users
editorsarraydirect users
commentersarraydirect users
readersarraydirect users
group_editorsarraygroups
group_commentersarraygroups
group_readersarraygroups
public_accessenumanyone
org_wide_accessenumwhole organization or domain

Each entry in a bucket array is a principal identifier as the source reports it, such as an email address or a source-specific id (for example, an AWS canonical user id). Do not assume entries are always emails. See each source's own page for the exact identifiers and grants it emits.

public_access and org_wide_access collapse to a single level each. If multiple grants exist (for example, several domains), the highest level wins.

Reading the ACL fields

Every ACL field distinguishes two states. Check for null before iterating a bucket.

  • null means the concept does not apply to this source at all. A source that doesn't model groups leaves the group_* buckets null. A source with no notion of public sharing leaves public_access null.
  • empty array (bucket fields) or none (public_access, org_wide_access) means the concept applies and the source was checked, but no grants were found.

A single connector is consistent across the sibling group_* buckets: it either populates all three (using [] for empty buckets) or leaves all three null, depending on whether the source models groups at all.

Owners are always direct users. There is no group_owners bucket, and no owner-level public_access or org_wide_access.

For which buckets each source fills and what identifiers it emits, see the individual source pages, such as Google Drive and S3.