Legacy Product

Fusion 5.10
    Fusion 5.10

    Box.com V2 Connector Configuration Reference

    Table of Contents

    The Box connector retrieves data from a Box.com cloud-based data repository. To fetch content from multiple Box users, you must create a Box app that uses OAuth 2.0 with JWT server authentication. For limited testing using a single user account, you can create a Box app that uses Standard OAuth 2.0 authentication.

    Remote connectors

    V2 connectors support running remotely in Fusion versions 5.7.1 and later. Refer to Configure Remote V2 Connectors.

    Configuration

    When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.

    Connector for Box.com. This connector can work in one of two ways: 1) It can crawl a single user's files (and files shared with that user) using OAuth, or 2) It supports a JWT Service account method that will crawl all users in an enterprise using Box.com's "As-User" header to simulate each user. For large distributed accounts, JWT Service Account is recommended. Otherwise you need to explicitly provide a single user access to every file you want to crawl.

    description - string

    Optional description

    <= 125 characters

    pipeline - stringrequired

    Name of the IndexPipeline used for processing output.

    >= 1 characters

    Match pattern: ^[a-zA-Z0-9_-]+$

    diagnosticLogging - boolean

    Enable diagnostic logging; disabled by default

    Default: false

    parserId - stringrequired

    The Parser to use in the associated IndexPipeline.

    coreProperties - Core Properties

    Common behavior and performance settings.

    fetchSettings - Fetch Settings

    System level settings for controlling fetch behavior and performance.

    numFetchThreads - number

    Maximum number of fetch threads; defaults to 20.This setting controls the number of threads that call the Connectors fetch method.Higher values can, but not always, help with overall fetch performance.

    >= 1

    <= 500

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 20

    Multiple of: 1

    indexingThreads - number

    Maximum number of indexing threads; defaults to 4.This setting controls the number of threads in the indexing service used for processing content documents emitted by this datasource.Higher values can sometimes help with overall fetch performance.

    >= 1

    <= 10

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 4

    Multiple of: 1

    pluginInstances - number

    Maximum number of plugin instances for distributed fetching. Only specified number of plugin instanceswill do fetching. This is useful for distributing load between different instances.

    <= 500

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 0

    Multiple of: 1

    fetchResponseScheduledTimeout - number

    The maximum amount of time for a response to be scheduled. The task will be canceled if this setting is exceeded.

    >= 1000

    <= 500000

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 300000

    Multiple of: 1

    indexMetadata - boolean

    When enabled the metadata of skipped items will be indexed to the content collection.

    Default: false

    indexingInactivityTimeout - number

    The maximum amount of time to wait for indexing results (in seconds). If exceeded, the job will fail with an indexing inactivity timeout.

    >= 60

    <= 691200

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 86400

    Multiple of: 1

    pluginInactivityTimeout - number

    The maximum amount of time to wait for plugin activity (in seconds). If exceeded, the job will fail with a plugin inactivity timeout.

    >= 60

    <= 691200

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 600

    Multiple of: 1

    indexContentFields - boolean

    When enabled, content fields will be indexed to the crawl-db collection.

    Default: false

    asyncParsing - boolean

    When enabled, content will be indexed asynchronously.

    Default: false

    id - stringrequired

    A unique identifier for this Configuration.

    >= 1 characters

    Match pattern: ^[a-zA-Z0-9_-]+$

    properties - Box properties

    Plugin specific properties.

    application - Box application

    Box application to crawl.

    boxItems - array[string]

    includedFileExtensions - array[string]

    Set of file extensions to be fetched. If specified, all non-matching files will be skipped.

    Default:

    excludedFileExtensions - array[string]

    A set of all file extensions to be skipped from the fetch.

    Default:

    inclusiveRegexes - array[string]

    Regular expressions for URI patterns to include. This will limit this datasource to only URIs that match the regular expression.

    Default:

    exclusiveRegexes - array[string]

    Regular expressions for URI patterns to exclude. This will limit this datasource to only URIs that do not match the regular expression.

    Default:

    indexFolderMetadata - boolean

    Enable indexing of folder metadata. Each folder will be represented by a document in the collection.

    Default: false

    authenticationProperties - Authentication settings

    jwtProperties - JWT Authentication settings

    appEntityId - string

    The JWT App User ID or JWT App Enterprise ID with access to crawl.

    publicKeyId - string

    The public key prefix from the box.com public keys

    privateKeyBase64 - string

    Content of the private key. To get this value, open your key file and convert its content (including first and last line) to base64 string.

    privateKeyPassword - string

    The password you entered for the private key file.

    encryptionAlgorithm - string

    Encryption Algorithm.

    Default: RSA_SHA_256

    Allowed values: RSA_SHA_256RSA_SHA_384RSA_SHA_512

    accountType - string

    App account type.

    Default: USER

    Allowed values: USERENTERPRISE

    tokenCacheEntries - number

    Max Token Cache Entries

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 512

    Multiple of: 1

    oAuthProperties - OAuth Authentication settings

    refreshToken - string

    OAuth Refresh token

    apiKey - string

    The box API Key

    apiSecret - string

    The Box API Secret

    userCacheSize - number

    Size of the user cache to save connection for impersonation

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 2000

    Multiple of: 1

    userCacheExpirationTime - number

    Time (minutes) before removing an item from the cache since it was last accessed by a read or a write

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 5

    Multiple of: 1

    proxyProperties - Proxy settings

    proxyType - string

    Type of proxy to use.

    Allowed values: HTTPSOCKS

    proxyHost - string

    The address to use when connecting through the proxy.

    proxyPort - number

    The port to use when connecting through the proxy.

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Multiple of: 1

    connectionsConfig - Connection settings

    readTimeout - number

    The box api read timeout in milliseconds.

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 120000

    Multiple of: 1

    connectTimeout - number

    The box api connection timeout in milliseconds.

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 120000

    Multiple of: 1

    pageSize - number

    Maximum number of items per paginated request

    >= 1

    <= 1000

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 1000

    Multiple of: 1

    incrementalCrawlingConfig - Incremental crawling settings

    cacheSize - number

    Size of the folder collaborations cache

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 1000

    Multiple of: 1

    cacheExpirationTime - number

    Time in minutes before removing a folder collaboration set

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 5

    Multiple of: 1

    maximumItemLimitConfig - Item Count Limit

    maxItems - number

    Limits the number of items emitted to the configured IndexPipeline. The default is no limit (-1).

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: -1

    Multiple of: 1

    sizeLimitProperties - Item Size Limits

    Options for including or excluding items based on size, in bytes.

    maxSizeBytes - number

    Used for excluding items when the item size is larger than the configured value.

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: -1

    Multiple of: 1

    minSizeBytes - number

    Used for excluding items when the item size is smaller than the configured value.

    >= -2147483648

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 1

    Multiple of: 1

    itemRetryProperties - Item retry settings

    Options to configure the retry operation for items.

    maxRetries - number

    The maximum number of attempts for a failed item

    <= 20

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 3

    Multiple of: 1

    retryDelayInSeconds - number

    The amount of time, in seconds, before process again a failed item

    >= 1

    <= 600

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 30

    Multiple of: 1

    security - Security filtering configuration

    enabled - boolean

    Enable query-time security-trimming

    Default: true

    collectionId - string

    Id of the collection to be used for storing ACL records. If not specified, ACL collection name will be generated automatically using pattern '<datasource_id>_access_control_hierarchy'.