Skip to main content
Version: 2.0

SharePoint

The SharePoint source reads files from a SharePoint document library via Microsoft Graph. Each file in the library becomes a record in the pipeline — its contents are uploaded to a new agent session for processing. Authentication uses app-only credentials from an Azure AD app registration.

Configuration

A SharePoint source points at a single site. By default it ingests the site's default document library; set drive_id to target a specific library, and folder_path to scope ingestion to a subfolder.

SOURCE FIELD (SHAREPOINT)

Code example with json syntax.
1

Fields

FieldRequiredDescription
typeYessharepoint.
site_urlYesThe full URL of the SharePoint site (e.g. https://contoso.sharepoint.com/sites/legal).
tenant_idYesAzure AD directory (tenant) ID.
client_idYesAzure AD application (client) ID for the app registration.
client_secretYesAzure AD client secret. Encrypted at rest and never returned in responses.
drive_idNoThe ID of a specific document library. If omitted, the site's default document library is used.
folder_pathNoFolder path to scope ingestion to a subdirectory (e.g. /Contracts/2026). If omitted, the entire library is ingested.

How records are fetched

Each run enumerates drive items using Microsoft Graph's /delta change-tracking endpoint, which returns every item in the scoped library or folder. When drive_id is omitted, the connector resolves the site's default document library from site_url; when folder_path is set, enumeration is scoped to that folder and its descendants.

Only files are ingested. Folders are traversed but not themselves indexed, and SharePoint lists, pages, and OneNote notebooks are not indexed.

The pipeline captures an upper-bound timestamp at the start of each run and only processes items whose lastModifiedDateTime is at or before that timestamp. Items modified while a run is in progress are left for the next run, so they aren't partially processed.

Source metadata

Each record carries source metadata that the connector resolves at fetch time.

system_metadata captures these drive-item fields when present:

KeyDescription
nameThe file name.
webUrlLink to open the file in SharePoint.
sizeFile size in bytes.
eTagThe item's ETag.
createdDateTimeCreation time (ISO 8601).
lastModifiedDateTimeLast modified time (ISO 8601).
createdByDisplay name of the user who created the file.
lastModifiedByDisplay name of the user who last modified the file.
parentPathPath of the item's parent folder.
mimeTypeThe file's MIME type.

user_metadata contains the document library's custom columns (the SharePoint listItem fields) for the file, when the library defines any. These are fetched at download time, so they're only populated for files the pipeline actually processes.

acl_metadata is not populated by the SharePoint source. The ACL metadata buckets are left null.

Incremental sync

When sync_mode is incremental (the default), the pipeline tracks a watermark based on each item's lastModifiedDateTime. On the next run, only items modified since the last successful run are reprocessed, and unchanged items are skipped. See Sync mode.

Deletes are not propagated in this version: a file removed from SharePoint stops appearing in new records but is not removed from any corpus it was indexed into.

Permissions

The connector authenticates to Microsoft Graph using app-only credentials via the Azure AD client-credentials flow — it acts as the registered application itself, not as a signed-in user. Set up access in the Azure portal:

  1. Register an application under Azure Active Directory → App registrations. Note its Application (client) ID and Directory (tenant) ID — these are the client_id and tenant_id you provide in the source configuration.
  2. Under Certificates & secrets, create a client secret. This is the client_secret you provide. Store it securely; it isn't shown again.
  3. Under API permissions, grant the application Microsoft Graph application permissions sufficient to read the target site and its files (for example, Sites.Read.All and Files.Read.All), then have a directory administrator grant admin consent.

The connector only ever reads — it lists drive items and downloads file content. It never writes to SharePoint.

note

The exact permission set depends on your organization's policies. Some tenants prefer to scope access to specific sites using Sites.Selected plus a per-site grant, rather than the tenant-wide Sites.Read.All. Grant the least privilege that lets the connector read the document libraries you intend to ingest.