Skip to main content
Version: 2.0

Google Drive

The Google Drive source reads files from Google Workspace shared drives and user My Drives. Each file becomes a record in the pipeline — its contents are uploaded to a new agent session for processing. Authentication uses a Google Cloud service account.

Configuration

A Drive source lists one or more scopes. Each scope is a starting point: a shared drive or a single user's My Drive, each of which can be narrowed to a folder within it.

SOURCE FIELD (GOOGLE DRIVE)

Code example with json syntax.
1

Fields

FieldRequiredDescription
typeYesgoogle_drive.
scopesYesOne or more Drive starting points to ingest. At least one. See Scopes.
client_emailYesThe service account's email address (the client_email field of the service account JSON key).
private_keyYesThe service account's PEM-formatted RSA private key (the private_key field of the JSON key, including the -----BEGIN PRIVATE KEY----- and -----END PRIVATE KEY----- markers and embedded newlines). Encrypted at rest and never returned in responses.

Scopes

Each entry in scopes is discriminated by its type.

Shared drive (shared)

Ingest from a Google Workspace shared drive. No domain-wide delegation is required: the service account (client_email) only needs to be a member of the drive or folder.

FieldRequiredDescription
typeYesshared.
urlYesURL of the folder to ingest. Use a shared drive's root URL (https://drive.google.com/drive/folders/<drive_id>) to enumerate the entire drive, or any subfolder URL to scope ingestion to that subtree.

My Drive (my_drive)

Ingest from a single user's My Drive via domain-wide delegation. The service account impersonates the named user.

FieldRequiredDescription
typeYesmy_drive.
subject_emailYesThe user whose My Drive the service account impersonates.
urlNoA folder URL within the user's My Drive to narrow ingestion to that subtree. If omitted, every accessible file in the user's My Drive is enumerated.

How records are fetched

Each scope starts from its configured folder (or the shared drive root) and walks every descendant subfolder. Trashed files and shortcuts are skipped.

Google Workspace files are exported on download: Docs, Sheets, and Slides become their Office equivalents (.docx, .xlsx, .pptx). Other Workspace types, such as Forms and Drawings, can't be exported and are sent to the dead letter queue.

Source metadata

Each record carries source metadata that the connector resolves at fetch time.

system_metadata captures these Drive fields when present:

KeyDescription
nameThe file name.
mime_typeThe file's Drive MIME type.
sizeFile size in bytes.
modified_atLast modified time (RFC 3339).
created_atCreation time (RFC 3339).
md5_checksumMD5 checksum, when Drive provides one.
web_view_linkLink to open the file in the Drive UI.
parentsIds of the file's parent folders.
drive_idId of the shared drive, for shared-drive files.

user_metadata is empty for Drive.

acl_metadata holds the file's effective permissions in the source-independent ACL metadata shape. Drive models groups, so every group_* bucket is populated (empty arrays when there are no group grants). Permissions are cumulative down the folder tree, so a file's ACL includes grants inherited from its parent folders. Entries in the user and group buckets are email addresses. The buckets map to Drive roles as follows:

BucketDrive grant
ownersthe file's owner(s)
editorswriter, organizer, fileOrganizer roles ("Editor" in the UI)
commenterscommenter role
readersreader role
group_editors, group_commenters, group_readersgroup-email grants by role
public_accessthe "anyone with the link" permission
org_wide_accessa workspace-domain grant

Incremental sync

When sync_mode is incremental (the default), the pipeline tracks a watermark based on each file's modified_at system metadata. On the next run, only files modified since the last successful run are reprocessed, and unchanged files are skipped. See Sync mode.

Permissions

Set up access according to the scope types you use:

  • Shared drive scopes: add the service account email (client_email) as a member of the shared drive, or of the specific folder, with at least Viewer access.
  • My Drive scopes: a Workspace administrator must authorize the service account's client ID for domain-wide delegation with the https://www.googleapis.com/auth/drive.readonly OAuth scope. The service account then impersonates each subject_email.