Legacy Product

Fusion 5.10
    Fusion 5.10

    Solr V1 Connector Configuration Reference

    Table of Contents

    A Solr connector pulls documents from an external standalone Solr instance or SolrCloud cluster using Solr’s javabin response type and streaming response parser.

    V1 deprecation and removal notice

    Starting in Fusion 5.12.0, all V1 connectors are deprecated. This means they are no longer being actively developed and will be removed in Fusion 5.13.0.

    The replacement for this connector is in active development at this time and will be released at a future date.

    If you are using this connector, you must migrate to the replacement connector or a supported alternative before upgrading to Fusion 5.13.0. We recommend migrating to the replacement connector as soon as possible to avoid any disruption to your workflows.

    For Solr v4.7 and greater, cursorMark deep-paging is used. For earlier versions of Solr, standard paging (start+rows) is used.

    The following Solr components and parameters can be configured:

    • collection/core (also allows default/empty core)

    • query (*:* by default)

    • filter queries

    • query parser

    • request handler (defaults to /select)

    • stored fields to retrieve

    Also, since cursorMark deep paging should be used when possible:

    • sort spec (default: id asc)

    This connector can be configured to store information about datasources and the data ingested in a ConnectorDB crawldb instance.

    Limitations

    • Cannot do incremental crawls. (May be possible to do so in the future using source Solr docs' version field.)

    • Cannot do manual filtered deep paging.

    • Does not check that both sort spec and field list contain uniqueKey field.

    • Cannot handle encrypted connection to Solr.

    Configuration

    When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.

    A connector that pulls selected documents from an external Solr instance or cluster.

    id - stringrequired

    Unique name for this datasource.

    >= 1 characters

    Match pattern: ^[a-zA-Z0-9_-]+$

    pipeline - stringrequired

    Name of an existing index pipeline for processing documents.

    >= 1 characters

    description - string

    Optional description for this datasource.

    properties - Properties

    Datasource configuration properties

    db - Connector DB

    Type and properties for a ConnectorDB implementation to use with this datasource.

    type - string

    Fully qualified class name of ConnectorDb implementation.

    >= 1 characters

    Default: com.lucidworks.connectors.db.impl.MapDbConnectorDb

    inlinks - boolean

    Keep track of incoming links. This negatively impacts performance and size of DB.

    Default: false

    aliases - boolean

    Keep track of original URI-s that resolved to the current URI. This negatively impacts performance and size of DB.

    Default: false

    inv_aliases - boolean

    Keep track of target URI-s that the current URI resolves to. This negatively impacts performance and size of DB.

    Default: false

    solr_base_url - string

    If using a single Solr instance, enter the base URL, e.g., http://solrhost.example.com:8983/solr/.

    zk_host_string - string

    If using a SolrCloud instance, enter the ZooKeeper connect string, e.g., zkServerA:2181,zkServerB:2181,zkServerC:2181/solr.

    source_collection - string

    Collection or Core name in the source Solr instance/cluster. If not defined, the default core will be used.

    solr_query - string

    Query to select documents from the source Solr instance/cluster. If not defined, the default query *:* will be used.

    >= 1 characters

    Default: *:*

    solr_filter_queries - string

    Filter queries to select documents from the source Solr instance/cluster. Multiple filter queries should be separated with commas.

    solr_field_list - string

    Fields to fetch from the source Solr instance/cluster, which must be stored fields. Multiple field names should be separated with commas.

    >= 1 characters

    Default: *

    solr_page_size - integer

    Number of rows per request to Solr.

    Default: 100

    solr_sort_spec - string

    Sort order for the request. The uniqueKey field must be included as one of the sorted fields.

    >= 1 characters

    Default: id asc

    solr_request_handler - string

    The request handler to use for the request to the Solr instance/cluster.

    >= 1 characters

    Default: /select

    solr_query_parser - string

    The query parser to use for the request.

    commit_on_finish - boolean

    Set to true for a request to be sent to Solr after the last batch has been fetched to commit the documents to the index.

    Default: true

    verify_access - boolean

    Set to true to require successful connection to the filesystem before saving this datasource.

    Default: true

    initial_mapping - Initial field mapping

    Provides mapping of fields before documents are sent to an index pipeline.

    skip - boolean

    Set to true to skip this stage.

    Default: false

    label - string

    A unique label for this stage.

    <= 255 characters

    condition - string

    Define a conditional script that must result in true or false. This can be used to determine if the stage should process or not.

    reservedFieldsMappingAllowed - boolean

    Default: false

    mappings - array[object]

    List of mapping rules

    Default: {"source":"_version_","target":"external_version_s","operation":"move"}

    object attributes:{source required : {
     display name: Source Field
     type: string
    }
    target : {
     display name: Target Field
     type: string
    }
    operation : {
     display name: Operation
     type: string
    }
    }

    unmapped - Unmapped Fields

    If fields do not match any of the field mapping rules, these rules will apply.

    source - string

    The name of the field to be mapped.

    target - string

    The name of the field to be mapped to.

    operation - string

    The type of mapping to perform: move, copy, delete, add, set, or keep.

    Default: copy

    Allowed values: copymovedeletesetaddkeep