Box.com V2 Connector Configuration Reference
|
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.
|
Connector for Box.com. This connector can work in one of two ways: 1) It can crawl a single user's files (and files shared with that user) using OAuth, or 2) It supports a JWT Service account method that will crawl all users in an enterprise using Box.com's "As-User" header to simulate each user. For large distributed accounts, JWT Service Account is recommended. Otherwise you need to explicitly provide a single user access to every file you want to crawl.
description - string
Optional description
<= 125 characters
pipeline - stringrequired
Name of the IndexPipeline used for processing output.
>= 1 characters
Match pattern: ^[a-zA-Z0-9_-]+$
diagnosticLogging - boolean
Enable diagnostic logging; disabled by default
Default: false
parserId - stringrequired
The Parser to use in the associated IndexPipeline.
coreProperties - Core Properties
Common behavior and performance settings.
fetchSettings - Fetch Settings
System level settings for controlling fetch behavior and performance.
numFetchThreads - number
Maximum number of fetch threads; defaults to 20.This setting controls the number of threads that call the Connectors fetch method.Higher values can, but not always, help with overall fetch performance.
>= 1
<= 500
exclusiveMinimum: false
exclusiveMaximum: false
Default: 20
Multiple of: 1
indexingThreads - number
Maximum number of indexing threads; defaults to 4.This setting controls the number of threads in the indexing service used for processing content documents emitted by this datasource.Higher values can sometimes help with overall fetch performance.
>= 1
<= 10
exclusiveMinimum: false
exclusiveMaximum: false
Default: 4
Multiple of: 1
pluginInstances - number
Maximum number of plugin instances for distributed fetching. Only specified number of plugin instanceswill do fetching. This is useful for distributing load between different instances.
<= 500
exclusiveMinimum: false
exclusiveMaximum: false
Default: 0
Multiple of: 1
indexMetadata - boolean
When enabled the metadata of skipped items will be indexed to the content collection
Default: false
fetchResponseScheduledTimeout - number
The maximum amount of time for a response to be scheduled. The task will be canceled if this setting is exceeded.
>= 1000
<= 500000
exclusiveMinimum: false
exclusiveMaximum: false
Default: 300000
Multiple of: 1
indexingInactivityTimeout - number
The maximum amount of time to wait for indexing results (in seconds). If exceeded, the job will fail with an indexing inactivity timeout.
>= 60
<= 691200
exclusiveMinimum: false
exclusiveMaximum: false
Default: 86400
Multiple of: 1
pluginInactivityTimeout - number
The maximum amount of time to wait for plugin activity (in seconds). If exceeded, the job will fail with a plugin inactivity timeout.
>= 60
<= 691200
exclusiveMinimum: false
exclusiveMaximum: false
Default: 600
Multiple of: 1
indexContentFields - boolean
When enabled, content fields will be indexed to the crawl-db collection
Default: false
id - stringrequired
A unique identifier for this Configuration.
>= 1 characters
Match pattern: ^[a-zA-Z0-9_-]+$
properties - Box properties
Plugin specific properties.
application - Box application
Box application to crawl.
boxItems - array[string]
includedFileExtensions - array[string]
Set of file extensions to be fetched. If specified, all non-matching files will be skipped.
Default:
excludedFileExtensions - array[string]
A set of all file extensions to be skipped from the fetch.
Default:
inclusiveRegexes - array[string]
Regular expressions for URI patterns to include. This will limit this datasource to only URIs that match the regular expression.
Default:
exclusiveRegexes - array[string]
Regular expressions for URI patterns to exclude. This will limit this datasource to only URIs that do not match the regular expression.
Default:
indexFolderMetadata - boolean
Enable indexing of folder metadata. Each folder will be represented by a document in the collection.
Default: false
authenticationProperties - Authentication settings
jwtProperties - JWT Authentication settings
appEntityId - string
The JWT App User ID or JWT App Enterprise ID with access to crawl.
publicKeyId - string
The public key prefix from the box.com public keys
privateKeyBase64 - string
Content of the private key. To get this value, open your key file and convert its content (including first and last line) to base64 string.
privateKeyPassword - string
The password you entered for the private key file.
encryptionAlgorithm - string
Encryption Algorithm.
Default: RSA_SHA_256
Allowed values: RSA_SHA_256RSA_SHA_384RSA_SHA_512
accountType - string
App account type.
Default: USER
Allowed values: USERENTERPRISE
tokenCacheEntries - number
Max Token Cache Entries
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 512
Multiple of: 1
oAuthProperties - OAuth Authentication settings
refreshToken - string
OAuth Refresh token
apiKey - string
The box API Key
apiSecret - string
The Box API Secret
userCacheSize - number
Size of the user cache to save connection for impersonation
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 2000
Multiple of: 1
userCacheExpirationTime - number
Time (minutes) before removing an item from the cache since it was last accessed by a read or a write
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 5
Multiple of: 1
proxyProperties - Proxy settings
proxyType - string
Type of proxy to use.
Allowed values: HTTPSOCKS
proxyHost - string
The address to use when connecting through the proxy.
proxyPort - number
The port to use when connecting through the proxy.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Multiple of: 1
connectionsConfig - Connection settings
readTimeout - number
The box api read timeout in milliseconds.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 120000
Multiple of: 1
connectTimeout - number
The box api connection timeout in milliseconds.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 120000
Multiple of: 1
pageSize - number
Maximum number of items per paginated request
>= 1
<= 1000
exclusiveMinimum: false
exclusiveMaximum: false
Default: 1000
Multiple of: 1
incrementalCrawlingConfig - Incremental crawling settings
cacheSize - number
Size of the folder collaborations cache
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 1000
Multiple of: 1
cacheExpirationTime - number
Time in minutes before removing a folder collaboration set
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 5
Multiple of: 1
maximumItemLimitConfig - Item Count Limit
maxItems - number
Limits the number of items emitted to the configured IndexPipeline. The default is no limit (-1).
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: -1
Multiple of: 1
sizeLimitProperties - Item Size Limits
Options for including or excluding items based on size, in bytes.
maxSizeBytes - number
Used for excluding items when the item size is larger than the configured value.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: -1
Multiple of: 1
minSizeBytes - number
Used for excluding items when the item size is smaller than the configured value.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 1
Multiple of: 1
itemRetryProperties - Item retry settings
Options to configure the retry operation for items.
maxRetries - number
The maximum number of attempts for a failed item
<= 20
exclusiveMinimum: false
exclusiveMaximum: false
Default: 3
Multiple of: 1
retryDelayInSeconds - number
The amount of time, in seconds, before process again a failed item
>= 1
<= 600
exclusiveMinimum: false
exclusiveMaximum: false
Default: 30
Multiple of: 1
security - Security filtering configuration
enabled - boolean
Enable query-time security-trimming
Default: true
collectionId - string
Id of the collection to be used for storing ACL records. If not specified, ACL collection name will be generated automatically using pattern '<datasource_id>_access_control_hierarchy'.