SharePoint Optimized V2 Connector Configuration Reference
The SharePoint Optimized V2 connector retrieves content and metadata from an on-premises SharePoint repository and cloud-based SharePoint repositories.
|
Verify your connector version
This connector depends on specific Fusion versions. See the following table for the required versions:
Fusion version |
Connector version |
Fusion 5.6.1 and later |
v1.1.0 through v1.6.0 |
Fusion 5.9.0 |
v1.6.0 or later |
Fusion 5.9.1 and later |
v2.0.0 and later |
|
Note the following guidelines for using the SharePoint Optimized V2 connector:
-
There is a pod limit. The SharePoint Optimized V2 connector does not support running multiple instances. Don’t run the connector on more than one pod.
-
Watch for connector compatibility. Use the LDAP ACLs V2 connector with this connector.
To change the number of items to retrieve per page, set the value of apiQueryRowLimit
. The default value is 5000.
To change the number of change events to retrieve per page, set the value of changeApiQueryRowLimit
. The default value is 2000.
|
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.
|
An Optimized Connector for SharePoint 2010, 2013, 2016, 2019 and SharePoint Online
description - string
Optional description
<= 125 characters
pipeline - stringrequired
Name of the IndexPipeline used for processing output.
>= 1 characters
Match pattern: ^[a-zA-Z0-9_-]+$
diagnosticLogging - boolean
Enable diagnostic logging; disabled by default
Default: false
parserId - stringrequired
The Parser to use in the associated IndexPipeline.
coreProperties - Core Properties
Common behavior and performance settings.
fetchSettings - Fetch Settings
System level settings for controlling fetch behavior and performance.
pluginInstances - number
Maximum number of plugin instances for distributed fetching. Only specified number of plugin instanceswill do fetching. This is useful for distributing load between different instances.
>= 1
<= 1
exclusiveMinimum: false
exclusiveMaximum: false
Default: 1
Multiple of: 1
asyncParsing - boolean
When enabled, content will be indexed asynchronously.
Default: false
numFetchThreads - number
Maximum number of fetch threads; defaults to 20.This setting controls the number of threads that call the Connectors fetch method.Higher values can, but not always, help with overall fetch performance.
>= 1
<= 500
exclusiveMinimum: false
exclusiveMaximum: false
Default: 20
Multiple of: 1
indexingThreads - number
Maximum number of indexing threads; defaults to 4.This setting controls the number of threads in the indexing service used for processing content documents emitted by this datasource.Higher values can sometimes help with overall fetch performance.
>= 1
<= 10
exclusiveMinimum: false
exclusiveMaximum: false
Default: 4
Multiple of: 1
fetchResponseScheduledTimeout - number
The maximum amount of time for a response to be scheduled. The task will be canceled if this setting is exceeded.
>= 1000
<= 500000
exclusiveMinimum: false
exclusiveMaximum: false
Default: 300000
Multiple of: 1
indexingInactivityTimeout - number
The maximum amount of time to wait for indexing results (in seconds). If exceeded, the job will fail with an indexing inactivity timeout.
>= 60
<= 691200
exclusiveMinimum: false
exclusiveMaximum: false
Default: 86400
Multiple of: 1
pluginInactivityTimeout - number
The maximum amount of time to wait for plugin activity (in seconds). If exceeded, the job will fail with a plugin inactivity timeout.
>= 60
<= 691200
exclusiveMinimum: false
exclusiveMaximum: false
Default: 600
Multiple of: 1
indexMetadata - boolean
When enabled the metadata of skipped items will be indexed to the content collection.
Default: false
indexContentFields - boolean
When enabled, content fields will be indexed to the crawl-db collection.
Default: false
id - stringrequired
A unique identifier for this Configuration.
>= 1 characters
Match pattern: ^[a-zA-Z0-9_-]+$
properties - SharePoint properties
Plugin specific properties.
webApplication - Web application config
The SharePoint Web application to crawl.
webApplicationUrl - string
>= 1 characters
fetchSiteCollections - boolean
This feature requires site collection administrator rights on your Sharepoint instance. If enabled, the sharepoint crawler will fetch all site collections from the web application automatically. If not enabled, you must explicitly list all site collections in the siteCollections parameter.
Default: true
forceFullCrawl - boolean
Do this if you want to force a full crawl each time you run this datasource.
Default: false
siteCollections - array[string]
A list of site collections to crawl. Because only site collection administrators or site collection auditors can list the site collections in a SharePoint web application, you can use this when you are crawling as a user that is not an admin/auditor. This allows you to explicitly list site collections you want to crawl. Specify paths relative to the web application url, such as /sites/site1
Default:
includedFileExtensions - array[string]
Set of file extensions to be fetched. If specified, all non-matching files will be skipped.
Default:
excludedFileExtensions - array[string]
A set of all file extensions to be skipped from the fetch.
Default:
inclusiveRegexes - array[string]
Regular expressions for URI patterns to include. This will limit this datasource to only URIs that match the regular expression.
Default:
exclusiveRegexes - array[string]
Regular expressions for URI patterns to exclude. This will limit this datasource to only URIs that do not match the regular expression.
Default:
includeContentsExtensions - array[string]
Only files with these file extensions will not have their contents downloaded when indexing this item. The list item metadata will still be indexed but the file contents will not. The comparison is not case sensitive, and you do not have to specify the '.' but it still work if you do. For example "zip" and ".zip" are both acceptable. The whitespace will also be trimmed.
Default:
excludeContentsExtensions - array[string]
File extensions of files that will not have their contents downloaded when indexing this item. The list item metadata will still be indexed but the file contents will not. The comparison is not case sensitive, and you do not have to specify the '.' but it still work if you do. For example "zip" and ".zip" are both acceptable. The whitespace will also be trimmed.
Default:
restrictToSpecificItems - array[string]
Instead of specifying regular expressions to restrict the SharePoint items that are crawled, this allows you to specify specific SharePoint item URLs of the resources that are to be crawled. The crawl will then be restricted to only include these specified SharePoint items URLs. You can specify list, sub-site, folder, and list item URLs.
Default:
apiQueryRowLimit - number
>= 1
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 5000
Multiple of: 1
changeApiQueryRowLimit - number
>= 1
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 2000
Multiple of: 1
aclCommitAfter - number
When doing solr update to the acl collection, specify the commitWithin parameter to use when updating.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 60000
Multiple of: 1
siteCollectionDeletionThreshold - number
Site collections will be removed from the index after they are no longer available for this many hours. Set to 0 for immediate deletion. Default is 2 weeks.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 336
Multiple of: 1
solrSocketTimeout - number
Socket timeout when performing solr operations.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 60000
Multiple of: 1
moderationStatusFilter - array[number]
If specified, only index items with the following moderation statuses specified. Valid values are: 0 = The list item is approved, 1 = The list item has been denied approval, 2 = The list item is pending approval, 3 = The list item is in the draft or checked out state, 4 = The list item is scheduled for automatic approval at a future date.
fetchTaxonomies - boolean
Fetch Taxonomy data from sharepoint.
Default: false
siteCollectionTaxonomyCacheSize - number
To make the connector faster, when the taxonomy terms for a site collection are needed, they are cached to avoid looking up from disk again. This is the size of that cache.
>= 1
<= 10000
exclusiveMinimum: false
exclusiveMaximum: false
Default: 10
Multiple of: 1
fetchACLs - boolean
Fetch Access Control Data
Default: true
asyncParsing - boolean
Enable only if Tika Async is configured in the Fusion environment. Note: To enable async-parsing, check Core Properties -> Fetch Settings -> Async Parsing (since Fusion 5.8.0)
Default: false
zkHosts - string
Solr zk hosts string used for direct connections to solr.
contentCommitAfter - number
When doing solr update to the content collection, specify the commitWithin parameter to use when updating.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 60000
Multiple of: 1
zkChroot - string
Solr zk chroot string used for direct connections to solr.
solrConnectionTimeout - number
Connection timeout when performing solr operations.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 60000
Multiple of: 1
includedListBaseTypes - array[string]
If specified, the only SharePoint lists that will be fetched are the ones that match one of these base types. Accepts values (not case sensitive): [None, GenericList, DocumentLibrary, Unused, DiscussionBoard, Survey, Issue]
includedObjectTypes - array[string]
If specified, only fetch specific SharePoint objects. SharePoint object types that can be specified (not case sensitive): [Site, List, List_Item, Folder, Attachment]
proxyProperties - Proxy options
A set of options for configuring the proxy.
url - string
The proxy URL
>= 1 characters
username - string
Proxy username
>= 1 characters
password - string
Proxy password
>= 1 characters
ntlmProperties - NTLM Authentication settings
user - string
User
>= 1 characters
password - string
Password
>= 1 characters
domain - string
Domain
>= 1 characters
workstation - string
Workstation
>= 1 characters
sharepointOnlineAuthProperties - SharePoint Online Authentication
Settings relevant only when crawling SharePoint online .
account - string
Your Microsoft SharePoint Online Account name which takes the form of username@domain.com
>= 1 characters
password - string
Password for your Microsoft SharePoint Online Account.
>= 1 characters
sessionExpirationMs - number
How long in milliseconds before new SharePoint online authentication cookies should be fetched.
>= 1
<= 172800000
exclusiveMinimum: false
exclusiveMaximum: false
Default: 7200000
Multiple of: 1
userAgent - string
The user agent header decorates the http traffic. This is important for preventing hard rate limiting by SharePoint Online.
Default: ISV|Lucidworks|Fusion/5.x
capUserAgent - string
When "O365 Conditional Access Policy (CAP) setting" is enabled, we need to use a compliant User-Agent that matches one of the supported devices when doing O365 STS authentication. For example if iOS is a supported platform, set this to 'Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_3 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) CriOS/60.0.3112.89 Mobile/14G60 Safari/602.1'
<= 4000 characters
>= 1 characters
appAuthClientId - string
Applicable to SharePoint Online App-Auth Public/Private Service Account. The Azure client ID of your application.
<= 100 characters
>= 1 characters
appAuthPkcs12KeystoreBase64String - string
Applicable to SharePoint Online App-Auth only. This is the base64 string of your PKCS12 keystore loaded with the PFX certificate file supplied by Azure AD. To get this value, first take the Azure AD yourcert.pfx you recieved from Azure and convert to PKCS12 keystore format (example "keytool -importkeystore -srckeystore yourcert.pfx -srcstoretype pkcs12 -destkeystore yourcert.p12 -deststoretype pkcs12"). Next convert yourcert.p12 to base64 string.
<= 10000 characters
>= 1 characters
appAuthPkcs12KeystorePassword - string
Applicable to SharePoint Online App-Auth Public/Private Service Account. Password of the PKCS12 keystore.
<= 100 characters
>= 1 characters
appAuthClientSecret - string
Applicable to SharePoint Online OAuth App-Auth only. The Azure client ID of your application.
<= 100 characters
>= 1 characters
appAuthRefreshToken - string
Applicable to SharePoint Online OAuth App-Auth only. This is a refresh token which is reusable for up to 12 hours. You must obtain a new tokenusing the OAuth login process if the token becomes expired.
<= 1000 characters
>= 1 characters
appAuthTenant - string
Applicable to SharePoint Online App-Auth only. The Office365 tenant name to use when authenticating with Azure AD.
<= 2083 characters
>= 1 characters
appAuthAzureLoginEndpoint - string
Applicable to SharePoint Online App-Auth Public/Private Service Account. The Azure login endpoint to use when authenticating.
<= 2083 characters
>= 1 characters
Default: https://login.windows.net
jsAuthConfigJson - string
JS Auth config json file contains a list of WebCredential to do a web driver login process.
jsAuthLoginUrl - string
JS Auth Login Url to use when doing the login process.
jsAuthSeleniumUrl - string
URL of the Selenium grid service to use while obtaining performing WebDriver auth to sharepoint online.
maximumItemLimitConfig - Item Count Limit
maxItems - number
Limits the number of items emitted to the configured IndexPipeline. The default is no limit (-1).
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: -1
Multiple of: 1
sizeLimitProperties - Item Size Limits
For documents which do not meet the maximum/minimum size limits, only metadata will be indexed without body.The documents will indicate reason why content is not indexed, with the field '_lw_contents_excluded_s: file size'
maxSizeBytes - number
Used for excluding items when the item size is larger than the configured value.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: -1
Multiple of: 1
minSizeBytes - number
Used for excluding items when the item size is smaller than the configured value.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 1
Multiple of: 1
fetchRetryProperties - Retry Options
A set of options for configuring retry behavior.
maxDelayTimeMs - number
The maximum time wait time between successive retries.
>= 1
<= 600000
exclusiveMinimum: false
exclusiveMaximum: false
Default: 300000
Multiple of: 1
maxTimeLimitMs - number
This setting is used to limit the maximum amount of time spent on retries. Note: this will be ignored if "Maximum Retries" is specified.
>= 1
<= 28800000
exclusiveMinimum: false
exclusiveMaximum: false
Default: 600000
Multiple of: 1
errorExclusions - array[string]
Optional regex list that will be matched against failed attempts exception class and message. If any regex matches, do not retry this request. This is needed to prevent the retryer from retrying non-recoverable errors that were not already ignored by the connector implementation.
maxRetries - number
The retryer will retry failed operations in the case that they might succeed if attempted again. This parameter states the number of attempts to retry until giving up. This parameter, if specified, will override the "Stop retrying after time (milliseconds)" parameter.
<= 100
exclusiveMinimum: false
exclusiveMaximum: false
Default: 3
Multiple of: 1
delayFactor - number
The retryer will retry failed operations in the case that they might succeed if attempted again. The retryer will sleep an exponential amount of time after the first failed attempt and retry in exponentially incrementing amounts after each failed attempt up to the maximumTime. nextWaitTime = exponentialIncrement * multiplier.
>= 1
<= 9999
exclusiveMinimum: false
exclusiveMaximum: false
Default: 2
Multiple of: 1
delayMs - number
Sets the delay between retries, exponentially backing off to the maxDelayTimeMs and multiplying successive delays by the delayFactor
>= 1
<= 9223372036854776000
exclusiveMinimum: false
exclusiveMaximum: false
Default: 1000
Multiple of: 1
connections - Http client options
A set of options for configuring the http client.
maxConnections - number
The maximum number of connections
>= 1
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 5000
Multiple of: 1
maxPerRoute - number
Defines a connection limit per one HTTP route. In simple cases you can understand this as a per target host limit. Under the hood things are a bit more interesting: HttpClient maintains a couple of HttpRoute objects, which represent a chain of hosts each, like proxy1 -> proxy2 -> targetHost. Connections are pooled on per-route basis. In simple cases, when you're using default route-building mechanism and provide no proxy suport, your routes are likely to include target host only, so per-route connection pool limit effectively becomes per-host limit.
>= 1
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 1000
Multiple of: 1
ignoreSSLValidationExceptions - boolean
Do not attempt to do an SSL Handshake and do not verify the hostname of SSL certificates. Use this when accessing an https url with a self-signed or enterprise certificate authority that you do not want to put in the Java keystore.
Default: false
readTimeoutMs - number
>= -1
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 60000
Multiple of: 1
connectTimeoutMs - number
>= -1
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 300000
Multiple of: 1
debug - Debug options
Special properties used for debugging the connector.
onlyFetchAcls - boolean
Do a full crawl where we only crawl acls. Also - when the ACLs are all fully indexed, clear any old ACL documents from previous crawl(s) for this datasource. This gives you a fresh SharePoint ACLs without effecting the content.
Default: false
logThreadDumpEveryNSeconds - number
For diagnostic purposes, write a thread dump to logs every N seconds. If set <= 0, no dump is taken.
>= -1
<= 9999999
exclusiveMinimum: false
exclusiveMaximum: false
Default: -1
Multiple of: 1
simulate429ErrorsEveryNRequests - number
If > 0, simulate a SharePoint 429 status (too-many-requests) error such that there will be one error per this many requests.
>= -1
<= 999999
exclusiveMinimum: false
exclusiveMaximum: false
Default: -1
Multiple of: 1
preserveFullExportDb - boolean
The list* tables are normally cleared prior to saving the crawl database. This gives option to leave these files for analysis. This parameter is ignored if using a persistent volume to store the crawl DB because the data will always be saved in that case.
Default: false
onlyFetchMetadata - boolean
For diagnostic purposes, do a dry run where the connector will only generate the metadata sharepoint export database and index the ACL records in the ACL collection, but will not fetch content.
Default: false
logAclInserts - boolean
For diagnostic purposes, log all documents inserted into the ACL collection.
Default: false
security -
collectionId - string
Id of the collection to be used for storing ACL records. If not specified, ACL collection name will be generated automatically using pattern '<datasource_id>_access_control_hierarchy'.