AEM V2 Connector Configuration Reference
This connector retrieves data from an Adobe Experience Manager (AEM) repository. The AEM V2 connector is compatible with AEM version 6.5.
The AEM V2 connector supports the following:
-
Full crawling and recrawling of pages and assets in Adobe Experience Manager.
-
Basic authentication.
-
OAuth authenticaion.
-
Security trimming, to filter results based on user permissions.
-
Filter document crawling by including and excluding paths and configuring content properties when setting up the connector.
-
Specify wait time between fetch requests to throttle crawls, if necessary.
An Apache Sling based connector for AEM
description - string
Optional description
<= 125 characters
pipeline - stringrequired
Name of the IndexPipeline used for processing output.
>= 1 characters
Match pattern: ^[a-zA-Z0-9_-]+$
diagnosticLogging - boolean
Enable diagnostic logging; disabled by default
Default: false
parserId - string
The Parser to use in the associated IndexPipeline.
Match pattern: ^[a-zA-Z0-9_-]+$
coreProperties - Core Properties
Common behavior and performance settings.
fetchSettings - Fetch Settings
System level settings for controlling fetch behavior and performance.
numFetchThreads - number
Maximum number of fetch threads; defaults to 20.This setting controls the number of threads that call the Connectors fetch method.Higher values can, but not always, help with overall fetch performance.
>= 1
<= 500
exclusiveMinimum: false
exclusiveMaximum: false
Default: 20
Multiple of: 1
indexingThreads - number
Maximum number of indexing threads; defaults to 4.This setting controls the number of threads in the indexing service used for processing content documents emitted by this datasource.Higher values can sometimes help with overall fetch performance.
>= 1
<= 10
exclusiveMinimum: false
exclusiveMaximum: false
Default: 4
Multiple of: 1
pluginInstances - number
Maximum number of plugin instances for distributed fetching. Only specified number of plugin instanceswill do fetching. This is useful for distributing load between different instances.
<= 500
exclusiveMinimum: false
exclusiveMaximum: false
Default: 0
Multiple of: 1
fetchResponseScheduledTimeout - number
The maximum amount of time for a response to be scheduled. The task will be canceled if this setting is exceeded.
>= 1000
<= 500000
exclusiveMinimum: false
exclusiveMaximum: false
Default: 300000
Multiple of: 1
indexingInactivityTimeout - number
The maximum amount of time to wait for indexing results (in seconds). If exceeded, the job will fail with an indexing inactivity timeout.
>= 60
<= 691200
exclusiveMinimum: false
exclusiveMaximum: false
Default: 86400
Multiple of: 1
pluginInactivityTimeout - number
The maximum amount of time to wait for plugin activity (in seconds). If exceeded, the job will fail with a plugin inactivity timeout.
>= 60
<= 691200
exclusiveMinimum: false
exclusiveMaximum: false
Default: 600
Multiple of: 1
indexMetadata - boolean
When enabled the metadata of skipped items will be indexed to the content collection.
Default: false
indexContentFields - boolean
When enabled, content fields will be indexed to the crawl-db collection.
Default: false
asyncParsing - boolean
When enabled, content will be indexed asynchronously.
Default: false
id - stringrequired
A unique identifier for this Configuration.
>= 1 characters
Match pattern: ^[a-zA-Z0-9_-]+$
properties - Properties
Plugin specific properties.
aemBaseUrl - string
Base URL to AEM, e.g. http://localhost:4502
>= 1 characters
Default: http://localhost:4502
username - string
Username to use for authentication. The user should have sufficient permissions to read content paths and access Users/Group APIs in case Security Trimming is needed
password - string
Password to use for authentication.
authConfig - Authentication Settings
Select only one option
loginAuthentication - Login Settings
username - string
Username to use for authentication. The user should have sufficient permissions to read content paths and access Users/Group APIs in case Security Trimming is needed
password - string
Password to use for authentication.
oAuth - OAuth Settings
accessToken - string
Access Token
oAuthRefreshToken - string
Refresh Token will be used to refresh Access Token
jwtToken - string
JWT Token will be used to request new Access Token if Refresh Token is not set
clientId - string
Client Id
clientSecret - string
Client Secret
redirectUri - string
Redirect Uri
allowAllCertificates - boolean
If false, security checks will be performed on all SSL/TLS certificate signers and origins. This means self-signed certificates would not be supported.
Default: false
pageSize - number
Number of documents to fetch per page request. A higher value can make crawling faster, but memory usage is also increased.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 100
Multiple of: 1
nodeDepth - number
Number of levels you want the query to return
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 10
Multiple of: 1
threadWait - number
Time to wait, in milliseconds, between each page request
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 1000
Multiple of: 1
paths - array[string]
AEM paths that will be searched across for content.
Default: "/"
excludePathRegexes - array[string]
Java regular expressions for paths that should not be fetched
aemTypes - array[string]
AEM document type (jcr:primaryType) to include in the index. e.g. cq:Page, dam:Asset
Default: "cq:Page"
attachmentTypes - array[string]
Attachment extensions to index. By default all attachments are indexed.
maxSizeBytes - number
Maximum size, in bytes, of a document to fetch. If content is larger it will be trimmed to 'maxSizeBytes' size.
>= -9223372036854776000
<= 9223372036854776000
exclusiveMinimum: false
exclusiveMaximum: false
Default: 4194304
Multiple of: 1
requestProperties - Request Options
A set of options for configuring requests to AEM instance.
connectTimeout - number
The timeout in milliseconds until a connection is established.
A timeout value of zero is interpreted as an infinite timeout. A negative value is interpreted as undefined.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 10000
Multiple of: 1
socketTimeout - number
The socket timeout (SO_TIMEOUT) in milliseconds, which is the timeout for waiting for data or, put differently, a maximum period inactivity between two consecutive data packets).
A timeout value of zero is interpreted as an infinite timeout. A negative value is interpreted as undefined.
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 10000
Multiple of: 1
requestTimeout - number
The timeout in milliseconds used when requesting a connection from the connection manager.
A timeout value of zero is interpreted as an infinite timeout. A negative value is interpreted as undefined
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 10000
Multiple of: 1
retryProperties - Retry Options
A set of options for configuring requests retry behavior.
maxRetries - number
If request to AEM fails it will be retried this amount of times
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 3
Multiple of: 1
retryDelay - number
Time to wait, in milliseconds, between each retry
>= -2147483648
<= 2147483647
exclusiveMinimum: false
exclusiveMaximum: false
Default: 1000
Multiple of: 1
security - Security filtering configuration
enabled - boolean
Enable query-time security-trimming
Default: true
collectionId - string
Id of the collection to be used for storing ACL records. If not specified, ACL collection name will be generated automatically using pattern '<datasource_id>_access_control_hierarchy'.