Include Documents Index Stage
This stage passes documents to the next stage in the pipeline if they match one or more of the specified rules (Boolean OR). If some field has multiple values then at least one value must match against specified pattern. All non-matching documents are dropped. Rules are defined using regular expression field matching.
Examples
Give the "simple-include" pipeline a stage that includes only certain document types:
curl -u USERNAME:PASSWORD -X POST -H "Content-type: application/json" 'http://localhost:8764/api/index-pipelines' -d '
{
"id" : "simple-include",
"stages" : [ {
"type" : "include-doc",
"matchRules" : [ {
"field" : "document_type",
"pattern" : "(xls|xlsx|xlst|doc|docx)"
}]
}]
}'
Response:
{
"id" : "simple-include",
"stages" : [ {
"type" : "include-doc",
"id" : "f701f96b-780e-4355-9dd3-6e53a89afe3e",
"matchRules" : [ {
"field" : "document_type",
"pattern" : "(xls|xlsx|xlst|doc|docx)"
} ],
"type" : "include-doc",
"skip" : false,
"label" : "include-doc"
} ],
"properties" : { }
}
Send a text document through the "simple-include" pipeline:
curl -u USERNAME:PASSWORD 'http://localhost:8764/api/index-pipelines/simple-include/collections/logs/index?simulate=true&echo=true' -H 'Content-type: application/json' -d '
{
"document_type": "txt"
}'
The empty response indicates the document was dropped:
[ ]
Send an XLS document through the pipeline:
curl -u USERNAME:PASSWORD 'http://localhost:8764/api/index-pipelines/simple-include/collections/logs/index?simulate=true&echo=true' -H 'Content-type: application/json' -d '
{
"document_type": "xls"
}'
The response is document metadata, indicating the document passed the stage:
{
"id" : "9e7d1c2e-343a-49de-bc6a-1d1fc25fa93f",
"fields" : [ {
"name" : "document_type",
"value" : "xls",
"metadata" : { },
"annotations" : [ ]
} ]
} ]
Configuration
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.
|