Legacy Product

Fusion 5.10
    Fusion 5.10

    Configure AEM V2 Connector

    This document explains how to configure an AEM V2 connector to crawl data in Adobe Experience Manager. Refer to the AEM reference to learn more about how this connector works. This connector is compatible with Fusion 5.5.1 and later.

    Configure AEM Datasource

    1. Under Indexing > Datasources, click Add, then select AEM

    2. Enter a Configuration ID

    3. Enter the AEM URL (the URL used to access the AEM Admin UI) as well as the AEM username and password used to authenticate access to the QueryBuilder JSON Servlet.

      AEM authentication

    4. In the CRXDE UI, select a path to crawl. Enter this path into Fusion. Click Add to crawl multiple paths.

      CRXDE path AEM paths to crawl

    5. Optional: To exclude paths from the crawl, enter a Java Regular Expression (regex) that represents paths to exclude in the indexed content.

      paths to exclude

    6. Enter the AEM type to crawl. In the CRXDE UI, this is the jcr:primaryType. In this example, the AEM connector is configured to crawl the AEM Type cq:Page, which represents web content pages.

      CRXDE type AEM type

    7. To index assets with a particular file extension, locate a file type in CRXDE and enter the value of the jcr:primaryType into Fusion. In this example, the value of NY_FairHealth.pdf is dam:Asset.

      AEM attachments

    8. You can choose which content properties to include and exclude from the index. These parameter values are represented by Java regex. For example, to only include properties that start with “jcr” enter jcr:(.*).

    9. In Fusion, click Save when you’re done configuring the AEM datasource.

    Configuration settings

    Setting Notes

    AEM URL

    Required. This is the URL used to access the AEM Admin UI.

    AEM Username

    Required. The user should have sufficient permissions to read content paths and access Users/Group APIs in case Security Trimming is needed.

    AEM Password

    Required.

    Page Batch Size

    Number of documents to fetch per page request. A higher value can increase crawling speed but also increases memory usage.

    Thread wait (ms)

    Number of milliseconds to wait between fetch requests. This property can be used to throttle a crawl if necessary.

    Paths to search

    Required.

    Paths that should not be fetched

    Java regex for paths that should not be fetched.

    AEM Types

    Required. AEM document type jcr:primaryType to include in the index. Examples: cq:Page, dam:Asset.

    Attachment types

    File extensions to index.

    Content Property Include Regexes

    A list of regex strings of content properties to include in indexed documents. Example: jcr:.*.

    Content Property Exlude Regexes

    A list of regex strings of content properties to exclude from indexed documents. Example: sling:.*

    Enable Security Trimming

    Enable this setting for content filtering of results based on the user’s id passed in during query.

    Group Mappings

    AEM user groups mapped to indexed values in the security trimming field which are used to filter content based on user id passed in query.

    Cache Expire Time (m)

    Specifies how long a query is cached in minutes.

    Field data population

    There are multiple sources where AEM data is indexed. The /bin/querybuilder.json endpoint data is mandatory and must exist in order for a document to be indexed.

    Note the list of fields that can appear in an indexed document:

    Field Source Comments

    id

    <AEM_URL>/bin/querybuilder.json

    Field: path

    content_txt

    <AEM_URL>/bin/querybuilder.json

    Whole data in text format.

    <rest fields>

    <AEM_URL>/bin/querybuilder.json

    All top level fields of JSON object.

    body_t

    <AEM_URL>/crx/de/download.jsp

    Used if path ends with one of Attachment types OR path does not end with: /jcr:content.

    body_t

    <AEM_URL><id>

    Used if there is no jcr data. If response status code is something other than 200, Fusion assumes there is no file to download under that path.

    body_t

    [content_txt]

    Defaults to [content_txt] if body_t is empty.

    parentPage

    Id of document that contains attachment or link.

    Populated in case of attachment/link.

    type

    File extension of the path.

    Populated in case of attachment/link.

    file_size

    <AEM_URL>/bin/querybuilder.json

    :jcr:data; used if jcr data is not empty.

    file_size

    <AEM_URL>/bin/querybuilder.json

    dam:size; used if jcr data is empty.