SharePoint and SharePoint Online Connectors
The SharePoint connector retrieves content and metadata from an on-premises SharePoint repository.
Platform versions
V1 connectors
The SharePoint V1 connectors were deprecated in Fusion 5.2. However, the:
- 
SharePoint V1 connector can be used in Fusion 4.x and Fusion 5.1 - Fusion 5.4. 
- 
SharePoint Online V1 connector can be used in Fusion 4.x and Fusion 5.1 - Fusion 5.3. 
V2 connectors
The Sharepoint V2 connector can be used in Fusion 5.1 - Fusion 5.5. This connector is deprecated as of June 19, 2023. The scheduled date to remove the connector is January 31, 2024.
SharePoint (on-premises)
This connector can access a SharePoint repository running on the following platforms:
- 
Microsoft SharePoint 2013 
- 
Microsoft SharePoint 2016 
- 
Microsoft SharePoint 2019 
Understanding incremental crawls
After you have performed your first successful crawl (it successfully completed with no errors), all subsequent crawls are "incremental crawls".
Incremental crawls use SharePoint’s Changes API. For each site collection, this uses the change token (timestamp) to get all additions, updates, and deletions since the full crawl was started.
If the Limit Documents > Fetch all site collections checkbox selected, you are crawling an entire SharePoint Web application, and a site collection was deleted since the last crawl, then the incremental crawl removes it from your index.
| If you are filtering on fields, be sure to leave the lwfields in place.  These fields are required for successful incremental crawling. | 
Throttling or rate limiting
SharePoint Online is a cloud API. As such, it necessarily has rate limiting policies, which can be an issue during crawling.
Ideally, you want to have a SharePoint Online crawl that runs as fast as possible. But practically, this is not always possible. The SharePoint Online documentation has some important information about this.
This section explains how to identify the errors that indicate that throttling is taking place, and how to adjust your connector’s configuration to help avoid it.
When SharePoint Online performs rate limiting, you may see one of two types of errors in the Log Viewer:
- 
429. Too many requestsThis is by far the most common rate limiting error you will see in the logs. This is SharePoint Online’s main mechanism to protect itself from service interruptions due to denial-of-service (DOS) attacks. 
- 
503. Server too busyThis error is less common, but the result is the same. 
| See Avoid SharePoint Throttling for information about how to minimize or correct errors. | 
User permission configuration options
The SharePoint connectors provide a variety of configuration options for accessing SharePoint and SharePoint Online. Permissions settings should follow the principle of least privilege, as described in the Microsoft SharePoint docs:
Follow the principle of least-privileged: Users should have only the permission levels or individual permissions they must have to perform their assigned tasks.
SharePoint
| Account type | Account config | Description | 
|---|---|---|
| Active Directory Service Account | Account is set up as a Site Collection Auditor | Allows you to list all site collections. | 
| Active Directory Service Account | Account is set up with limited permissions | Does not allow you to list site collections in your SharePoint web application. You must list each site collection you want to crawl manually. Additionally, noindex tags are ignored. Sites will always be indexed regardless of their noindex settings. | 
SharePoint Online
| For the V2 connector: When the access to SharePoint Online is affected by a Conditional Access Policy (CAP), it’s recommended to set a proper user-agent value (depending on the CAP configuration) in the connector configuration (toggle advanced properties): Requests settings > User agent. | 
| Account type | Account config | Description | 
|---|---|---|
| Full Admin | Azure App Only | Allows you to list all site collections in tenant. | 
| Full Admin | OAuth App Only | Does not allow you to list site collections in your SharePoint web application. You must list each site collection you want to crawl manually. | 
| ADFS Account | Account is set up as a Site Collection Auditor | Allows you to list all site collections if the user is a tenant administrator. | 
| ADFS Account | Account is set up with limited permissions | Does not allow you to list site collections in your SharePoint web application. You must list each site collection you want to crawl manually. Use this option if your deployment requires the Lucidworks crawl account to have the fewest privileges possible. |