Solr V1 Connector Configuration Reference
A Solr connector pulls documents from an external standalone Solr instance or SolrCloud cluster using
Solr’s javabin response type and streaming response parser.
|
V1 deprecation and removal notice
Starting in Fusion 5.12.0, all V1 connectors are deprecated. This means they are no longer being actively developed and will be removed in Fusion 5.13.0.
The replacement for this connector is in active development at this time and will be released at a future date.
If you are using this connector, you must migrate to the replacement connector or a supported alternative before upgrading to Fusion 5.13.0. We recommend migrating to the replacement connector as soon as possible to avoid any disruption to your workflows.
|
For Solr v4.7 and greater, cursorMark deep-paging is used. For earlier versions of Solr, standard paging (start+rows) is used.
The following Solr components and parameters can be configured:
-
collection/core (also allows default/empty core)
-
query (*:* by default)
-
filter queries
-
query parser
-
request handler (defaults to /select)
-
stored fields to retrieve
Also, since cursorMark deep paging should be used when possible:
This connector can be configured to store information about datasources and the data ingested in a ConnectorDB crawldb instance.
-
Cannot do incremental crawls. (May be possible to do so in the future using source Solr docs' version field.)
-
Cannot do manual filtered deep paging.
-
Does not check that both sort spec and field list contain uniqueKey field.
-
Cannot handle encrypted connection to Solr.
|
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.
|
A connector that pulls selected documents from an external Solr instance or cluster.
id - stringrequired
Unique name for this datasource.
>= 1 characters
Match pattern: ^[a-zA-Z0-9_-]+$
pipeline - stringrequired
Name of an existing index pipeline for processing documents.
>= 1 characters
description - string
Optional description for this datasource.
properties - Properties
Datasource configuration properties
db - Connector DB
Type and properties for a ConnectorDB implementation to use with this datasource.
type - string
Fully qualified class name of ConnectorDb implementation.
>= 1 characters
Default: com.lucidworks.connectors.db.impl.MapDbConnectorDb
inlinks - boolean
Keep track of incoming links. This negatively impacts performance and size of DB.
Default: false
aliases - boolean
Keep track of original URI-s that resolved to the current URI. This negatively impacts performance and size of DB.
Default: false
inv_aliases - boolean
Keep track of target URI-s that the current URI resolves to. This negatively impacts performance and size of DB.
Default: false
solr_base_url - string
If using a single Solr instance, enter the base URL, e.g., http://solrhost.example.com:8983/solr/.
zk_host_string - string
If using a SolrCloud instance, enter the ZooKeeper connect string, e.g., zkServerA:2181,zkServerB:2181,zkServerC:2181/solr.
source_collection - string
Collection or Core name in the source Solr instance/cluster. If not defined, the default core will be used.
solr_query - string
Query to select documents from the source Solr instance/cluster. If not defined, the default query *:* will be used.
>= 1 characters
Default: *:*
solr_filter_queries - string
Filter queries to select documents from the source Solr instance/cluster. Multiple filter queries should be separated with commas.
solr_field_list - string
Fields to fetch from the source Solr instance/cluster, which must be stored fields. Multiple field names should be separated with commas.
>= 1 characters
Default: *
solr_page_size - integer
Number of rows per request to Solr.
Default: 100
solr_sort_spec - string
Sort order for the request. The uniqueKey field must be included as one of the sorted fields.
>= 1 characters
Default: id asc
solr_request_handler - string
The request handler to use for the request to the Solr instance/cluster.
>= 1 characters
Default: /select
solr_query_parser - string
The query parser to use for the request.
commit_on_finish - boolean
Set to true for a request to be sent to Solr after the last batch has been fetched to commit the documents to the index.
Default: true
verify_access - boolean
Set to true to require successful connection to the filesystem before saving this datasource.
Default: true
initial_mapping - Initial field mapping
Provides mapping of fields before documents are sent to an index pipeline.
skip - boolean
Set to true to skip this stage.
Default: false
label - string
A unique label for this stage.
<= 255 characters
condition - string
Define a conditional script that must result in true or false. This can be used to determine if the stage should process or not.
reservedFieldsMappingAllowed - boolean
Default: false
retentionMappings - array[object]
Fields that should be kept or deleted
Default:
object attributes:{field
required : {
display name: Field
type: string
}operation
: {
display name: Operation
type: string
}}
updateMappings - array[object]
Values that should be added to or set on a field. When a value is added, any values previously on the field will be retained. When a value is set, any values previously on the field will be overwritten.
Default:
object attributes:{field
required : {
display name: Field
type: string
}value
required : {
display name: Value
type: string
}operation
: {
display name: Operation
type: string
}}
translationMappings - array[object]
Fields that should be moved or copied to another field. When a field is moved, the values from the source field are moved over to the target field and the source field is removed. When a field is copied, the values from the source field are copied over to the target field and the source field is retained.
Default: {"source":"_version_","target":"external_version_s","operation":"move"}
object attributes:{source
required : {
display name: Source Field
type: string
}target
required : {
display name: Target Field
type: string
}operation
: {
display name: Operation
type: string
}}
unmappedRule - Unmapped Fields
Fields not mapped by the above rules. By default, any remaining fields will be kept on the document.
keep - boolean
Keep all unmapped fields
Default: true
delete - boolean
Delete all unmapped fields
Default: false
fieldToMoveValuesTo - string
Move all unmapped field values to this field
fieldToCopyValuesTo - string
Copy all unmapped field values to this field
valueToAddToUnmappedFields - string
Add this value to all unmapped fields
valueToSetOnUnmappedFields - string
Set this value on all unmapped fields