Field Mapping Index Stage
A Field Mapping stage is used to do customized mapping of the fields in an Index Pipeline document to fields in the Solr schema.
For examples of how to use this stage in the Fusion UI, see Part 2 of the Getting Started tutorial.
Field Mapping Stage Properties
A Field Mapping stage specification consists of three things:
-
a unique ID
-
a set of mapping rules that specify operations applied to named fields as a triple:
{ source, target, operation }
. -
a set of rules called "unmapped" rules which specify operations applied to fields whose name does not match any of the mapping rules, also a triple
{ source, target, operation }
.
Mapping Rules and Unmapped Rules
Each rule has the following properties:
Property | Description |
---|---|
source |
The name of the source field. This will be the name of the field in the Pipeline document that should be mapped to another field. Java regular expressions can be used in the source field by surrounding the regular expression with forward slashes ('/'). For example, |
target |
The name of the target field. If the value for the |
operation |
What to do with the field during mapping. Several options are available:
|
Field mapping rules are applied in the order in which they are defined within each operation type. Operation types are applied in this order:
-
Field Retention -
keep
ordelete
-
Field Value Updates -
add
orset
-
Field Translations -
copy
ormove
As a result, keep
and delete
rules are always applied before add
, set
, copy
, or move
rules. Likewise, add
and set
rules are always applied before copy
or move
rules.
In some cases, you may wish to delete fields after they are processed by other operations. To accomplish this, you can add another Field Mapping stage which deletes the fields.
Field Mapping Behavior
The field mapping rules are applied in a specific order.
-
A copy of the Pipeline document is prepared. All further operations are applied to this copy.
-
The rules are traversed only once, in the order of their declaration in the
mapping
property. This means it is possible to do multiple operations on a field. However, note that if fields are moved (renamed), further operations should reference the new field name. -
Before each rule is evaluated, the current list of field names is prepared and sorted in alphabetic ascending order.
-
The current rule is applied to field values for each matching name from the list of names prepared in step 3. New field names resulting from the current rule do not effect the snapshot list of field names; in order for a rule to be applied to a new field name, it will be included in a later round of the evaluation cycle.
-
The process is repeated for each rule, and a list of matching source fields is noted.
-
If the document contains any fields that were not affected by any mapping rule, the
renameUnknown
option is applied if it has been set to true. -
Finally, the resulting transformed document is returned to the next stage of the index pipeline.
Examples
Map several fields:
{
"id": "mapping-text",
"type": "field-mapping",
"mappings": [
{
"operation": "move",
"source": "plaintextcontent",
"target": "body"
},
{
"operation": "add",
"source": "content-length",
"target": "fileSize"
},
{
"operation": "move",
"source": "/file(.*)/",
"target": "lastModified"
},
{
"operation": "delete",
"source": "last-printed"
},
{
"operation": "copy",
"source": "mimetype",
"target": "content_type"
}
],
"unmapped": {
"source": "/(.*)/",
"target": "$1_ss",
"operation": "move"
},
"skip" : false
}
Set the urlX
field based on the value of the employee_id
field:
{
"id": "set-field",
"type": "field-mapping",
"mappings": [
{
"operation": "set",
"source": "urlX",
"target": "https://mydomain.com/<employee_id>"
}
],
"skip" : false
}
Configuration
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.
|