SharePoint
The SharePoint source reads files from a SharePoint document library via Microsoft Graph. Each file in the library becomes a record in the pipeline — its contents are uploaded to a new agent session for processing. Authentication uses app-only credentials from an Azure AD app registration.
Configuration
A SharePoint source points at a single site. By default it ingests the site's
default document library; set drive_id to target a specific library, and
folder_path to scope ingestion to a subfolder.
SOURCE FIELD (SHAREPOINT)
Code example with json syntax.1
Fields
| Field | Required | Description |
|---|---|---|
type | Yes | sharepoint. |
site_url | Yes | The full URL of the SharePoint site (e.g. https://contoso.sharepoint.com/sites/legal). |
tenant_id | Yes | Azure AD directory (tenant) ID. |
client_id | Yes | Azure AD application (client) ID for the app registration. |
client_secret | Yes | Azure AD client secret. Encrypted at rest and never returned in responses. |
drive_id | No | The ID of a specific document library. If omitted, the site's default document library is used. |
folder_path | No | Folder path to scope ingestion to a subdirectory (e.g. /Contracts/2026). If omitted, the entire library is ingested. |
How records are fetched
Each run enumerates drive items using Microsoft Graph's /delta
change-tracking endpoint, which returns every item in the scoped library or
folder. When drive_id is omitted, the connector resolves the site's default
document library from site_url; when folder_path is set, enumeration is
scoped to that folder and its descendants.
Only files are ingested. Folders are traversed but not themselves indexed, and SharePoint lists, pages, and OneNote notebooks are not indexed.
The pipeline captures an upper-bound timestamp at the start of each run and only
processes items whose lastModifiedDateTime is at or before that timestamp.
Items modified while a run is in progress are left for the next run, so they
aren't partially processed.
Source metadata
Each record carries source metadata that the connector resolves at fetch time.
system_metadata captures these drive-item fields when present:
| Key | Description |
|---|---|
name | The file name. |
webUrl | Link to open the file in SharePoint. |
size | File size in bytes. |
eTag | The item's ETag. |
createdDateTime | Creation time (ISO 8601). |
lastModifiedDateTime | Last modified time (ISO 8601). |
createdBy | Display name of the user who created the file. |
lastModifiedBy | Display name of the user who last modified the file. |
parentPath | Path of the item's parent folder. |
mimeType | The file's MIME type. |
user_metadata contains the document library's custom columns (the SharePoint
listItem fields) for the file, when the library defines any. These are fetched
at download time, so they're only populated for files the pipeline actually
processes.
acl_metadata is not populated by the SharePoint source. The
ACL metadata buckets are left
null.
Incremental sync
When sync_mode is incremental (the default), the pipeline tracks a watermark
based on each item's lastModifiedDateTime. On the next run, only items modified
since the last successful run are reprocessed, and unchanged items are skipped.
See Sync mode.
Deletes are not propagated in this version: a file removed from SharePoint stops appearing in new records but is not removed from any corpus it was indexed into.
Permissions
The connector authenticates to Microsoft Graph using app-only credentials via the Azure AD client-credentials flow — it acts as the registered application itself, not as a signed-in user. Set up access in the Azure portal:
- Register an application under Azure Active Directory → App registrations. Note its Application (client) ID and Directory (tenant) ID — these are the
client_idandtenant_idyou provide in the source configuration. - Under Certificates & secrets, create a client secret. This is the
client_secretyou provide. Store it securely; it isn't shown again. - Under API permissions, grant the application Microsoft Graph application permissions sufficient to read the target site and its files (for example,
Sites.Read.AllandFiles.Read.All), then have a directory administrator grant admin consent.
The connector only ever reads — it lists drive items and downloads file content. It never writes to SharePoint.
The exact permission set depends on your organization's policies. Some tenants
prefer to scope access to specific sites using Sites.Selected plus a
per-site grant, rather than the tenant-wide Sites.Read.All. Grant the least
privilege that lets the connector read the document libraries you intend to
ingest.