AWS S3 V2 Connector Configuration Reference
The AWS S3 V2 connector crawls items in a single bucket. You must specify the bucket name and AWS region in which that bucket is located.
You may crawl specific items in a bucket. If no items are specified, the entire bucket will be crawled.
|
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.
|
The connector requires ListBucket
and GetObject
permissions.
The following is an IAM policy example. When you set permissions, replace bucketname
with the value used in your implementation.
"Statement": [
{
"Action": [
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::bucketname/*"
],
"Effect": "Allow"
},
{
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::bucketname"
],
"Effect": "Allow"
}
]
Connector to index content in AWS S3 buckets.
description - string
Optional description
<= 125 characters
pipeline - stringrequired
Name of the IndexPipeline used for processing output.
>= 1 characters
Match pattern: ^[a-zA-Z0-9_-]+$
diagnosticLogging - boolean
Enable diagnostic logging; disabled by default
Default: false
parserId - stringrequired
The Parser to use in the associated IndexPipeline.
coreProperties - Core Properties
Common behavior and performance settings.
fetchSettings - Fetch Settings
System level settings for controlling fetch behavior and performance.
indexingInactivityTimeout - number
The maximum amount of time to wait for indexing results (in seconds). If exceeded, the job will fail with an indexing inactivity timeout.
>= 60
<= 691200
exclusiveMinimum: false
exclusiveMaximum: false
Default: 86400
Multiple of: 1
pluginInactivityTimeout - number
The maximum amount of time to wait for plugin activity (in seconds). If exceeded, the job will fail with a plugin inactivity timeout.
>= 60
<= 691200
exclusiveMinimum: false
exclusiveMaximum: false
Default: 600
Multiple of: 1
indexMetadata - boolean
When enabled the metadata of skipped items will be indexed to the content collection.
Default: false
indexContentFields - boolean
When enabled, content fields will be indexed to the crawl-db collection.
Default: false
asyncParsing - boolean
When enabled, content will be indexed asynchronously.
Default: false
indexingThreads - number
Maximum number of indexing threads; defaults to 4.This setting controls the number of threads in the indexing service used for processing content documents emitted by this datasource.Higher values can sometimes help with overall fetch performance.
>= 1
<= 10
exclusiveMinimum: false
exclusiveMaximum: false
Default: 4
Multiple of: 1
pluginInstances - number
Maximum number of plugin instances for distributed fetching. Only specified number of plugin instanceswill do fetching. This is useful for distributing load between different instances.
<= 500
exclusiveMinimum: false
exclusiveMaximum: false
Default: 0
Multiple of: 1
fetchResponseScheduledTimeout - number
The maximum amount of time for a response to be scheduled. The task will be canceled if this setting is exceeded.
>= 1000
<= 500000
exclusiveMinimum: false
exclusiveMaximum: false
Default: 300000
Multiple of: 1
numFetchThreads - number
Maximum number of fetch threads; defaults to 20.This setting controls the number of threads that call the Connectors fetch method.Higher values can, but not always, help with overall fetch performance.
>= 1
<= 500
exclusiveMinimum: false
exclusiveMaximum: false
Default: 20
Multiple of: 1
id - stringrequired
A unique identifier for this Configuration.
>= 1 characters
Match pattern: ^[a-zA-Z0-9_-]+$
properties - S3 properties
Plugin specific properties.
application - S3 Application
bucketName - string
region - string
The AWS region in which the bucket is located.
Default: us-west-2
Allowed values: ap-south-1eu-south-1us-gov-east-1ca-central-1eu-central-1us-west-1us-west-2af-south-1eu-north-1eu-west-3eu-west-2eu-west-1ap-northeast-2ap-northeast-1me-south-1sa-east-1ap-east-1cn-north-1us-gov-west-1ap-southeast-1ap-southeast-2us-iso-east-1us-east-1us-east-2cn-northwest-1us-isob-east-1aws-globalaws-cn-globalaws-us-gov-globalaws-iso-globalaws-iso-b-global
objectKeys - array[string]
Limit the crawl to a set of Files or Folders inside the bucket. Folders must end with '/'. Valid input examples: 'folderName/', 'folder/subFolder/', 'file.txt', 'folder/file.txt'
authenticationConfig - S3 Authentication settings
awsBasicAuthConfig - AWS Basic Authentication settings
accessKey - string
An AWS Access Key ID that can access the content.
secretKey - string
The AWS Secret Key associated with the Access Key.
awsSessionAuthenticationConfig - AWS Session Authentication settings
accessKey - string
An AWS Access Key ID that can access the content.
secretKey - string
The AWS Secret Key associated with the Access Key.
sessionToken - string
awsInstanceCredentialsAuthConfig - AWS Instance Credentials Authentication settings
instanceCredentials - boolean
Use AWS instance credentials rather than an AWS key. Requires that Fusion 5 be hosted in an EKS. You can specify another AWS region through Region property in S3 Application settings.
Default: false
proxyConfig - S3 Proxy settings
proxyEndpoint - string
The optional proxy protocol, host and endpoint through which the client will connect
proxyUsername - string
The optional username to use when connecting through a proxy
proxyPassword - string
The optional password to use when connecting through a proxy
maximumItemLimitConfig - Item Count Limits
maxItems - number
Limits the number of items emitted to the configured IndexPipeline. The default is no limit (-1).
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: -1
Multiple of: 1
sizeLimitProperties - Item Size Limits
Options for including or excluding items based on size, in bytes.
maxSizeBytes - number
Used for excluding items when the item size is larger than the configured value.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: -1
Multiple of: 1
minSizeBytes - number
Used for excluding items when the item size is smaller than the configured value.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 1
Multiple of: 1
regexConfig - Regular expression rules
inclusiveRegexes - array[string]
Regular expressions for URI patterns to include. This will limit this datasource to only URIs that match the regular expression.
Default:
exclusiveRegexes - array[string]
Regular expressions for URI patterns to exclude. This will limit this datasource to only URIs that do not match the regular expression.
Default:
regexCacheSize - number
The number of regex matches to cache when evaluating regular expressions. For example if you exclude files by filename, each filename's regex result will be cached so that if this same filename came up again, the regex matches would be remembered.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 10000
Multiple of: 1
extensionConfig - File Extension rules
includedFileExtensions - array[string]
Set of file extensions to be fetched. If specified, all non-matching files will be skipped.
Default:
excludedFileExtensions - array[string]
A set of all file extensions to be skipped from the fetch.
Default:
regexCacheSize - number
The number of regex matches to cache when evaluating regular expressions. For example if you exclude files by filename, each filename's regex result will be cached so that if this same filename came up again, the regex matches would be remembered.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 10000
Multiple of: 1
requestConfig - Request Settings
Options to configure the client
pageSize - number
Maximum number of items per paginated request
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 1000
Multiple of: 1
documentsConfig - Indexing settings
Options to control how documents will be indexed
indexFolderMetadata - boolean
Enable indexing of folder metadata. Each folder will be represented by a document in the collection.
Default: false