Exclude Documents Index Stage
The Exclude Documents stage drops all documents that match all of the specified rules (Boolean AND). If some field has multiple values then at least one value must match against the specified pattern. No further processing is done on any matching documents, thus they will not be indexed into a Fusion collection. All non-matching documents are passed to the next stage in the pipeline. Rules are defined using regular expression field matching.
Examples
Give the "simple-exclude" pipeline a stage that excludes certain document types:
curl -u USERNAME:PASSWORD -X POST -H "Content-type: application/json" 'http://localhost:8764/api/index-pipelines' -d '
{
"id" : "simple-exclude",
"stages" : [ {
"type" : "exclude-doc",
"matchRules" : [ {
"field" : "document_type",
"pattern" : "(xls|xlsx|xlst|doc|docx)"
}]
}]
}'
Send a text document through the "simple-exclude" pipeline:
curl -u USERNAME:PASSWORD 'http://localhost:8764/api/index-pipelines/simple-exclude/collections/logs/index?simulate=true&echo=true' -H 'Content-type: application/json' -d '
{
"document_type": "txt"
}'
The response is document metadata, indicating the document passed the stage:
[ {
"id" : "93da43ff-4218-4f24-a690-23b530926104",
"fields" : [ {
"name" : "document_type",
"value" : "txt",
"metadata" : { },
"annotations" : [ ]
} ]
} ]
Send an XLS document through the "simple-exclude" pipeline:
curl -u USERNAME:PASSWORD 'http://localhost:8764/api/index-pipelines/simple-exclude/collections/logs/index?simulate=true&echo=true' -H 'Content-type: application/json' -d '
{
"document_type": "xls"
}'
The empty response indicates the document was dropped:
[ ]
Configuration
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.
|