Machine Learning Index Stage

Table of Contents

Configuration

The Fusion machine learning indexing stage uses a trained machine learning model to analyze a field or fields of a PipelineDocument and stores the results of analysis in a new field of either the PipelineDocument or Context object.

In order to use the Machine Learning Stage, you must train a machine learning model. For more information on machine learning in Fusion, see:

Configuration

When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.

Invokes a machine learning model to make a prediction on a document during indexing.

skip - boolean

Set to true to skip this stage.

Default: false

label - string

A unique label for this stage.

<= 255 characters

condition - string

Define a conditional script that must result in true or false. This can be used to determine if the stage should process or not.

modelId - stringrequired

Model ID

storeInContext - booleanrequired

Flag to indicate if the result should be stored in Context rather than in pipeline Document. If this is set, the Context Key field should be populated.

Default: false

contextKey - string

Name of context key to store prediction

failOnError - boolean

Flag to indicate if this stage should throw an exception if an error occurs while generating a prediction for a document.

Default: false

inputScript - stringrequired

Javascript code that returns a HashMap contains fields and values to send to ML model service. Refer to examples.

Default: /* This script must contruct a HashMap containing fields and values to be sent to the ML model service. The field names and values will depend on the input schema of the model. Generally, you'll be reading fields and values from the request/context/response and placing them into a HashMap. Value types supported are: - String - Double - String[] - double[] - List<String> - List<Number> This script receives these objects and can be referenced in your script: - request - response - context - log (Logger useful for debugging) The last line of the script must be a reference to the HashMap object you created. Example 1: Single pipeline doc's field value to modelInput HashMap var modelInput = new java.util.HashMap() modelInput.put("input_1", doc.getFirstFieldValue("my_field")) modelInput Example 2: List of strings from pipeline doc's field to modelInput HashMap var modelInput = new java.util.HashMap() modelInput.put("input_1", doc.getFieldValues("my_field")) // doc.getValues returns a Collection modelInput Example 3: List of numeric values from the pipeline doc's fields to modelInput HashMap var modelInput = new java.util.HashMap() var list = new java.util.ArrayList() list.add(Double.parseDouble(doc.getFirstFieldValue("numeric_field_1"))) list.add(Double.parseDouble(doc.getFirstFieldValue("numeric_field_2"))) modelInput.put("input_1", list) modelInput Example 4: If you have created the model using Fusion ML Spark jobs, then use the following code var modelInput = new java.util.HashMap() modelInput.put("concatField", doc.getFieldValues("my_field")) modelInput */

outputScript - string

Javascript code that receives output from ML service as a HashMap called "modelOutput". Most of the time this is used to place prediction results in the request or context. Refer to examples.

Default: /* This output script receives the output prediction from the ML model service as a HashMap called "modelOutput". Most of the time this is used to place prediction results in the request or context for downstream pipeline stages to consume. This script receives these objects and can be referenced in your script: - modelOutput (a HashMap containing fields/values returned from ML model service) - doc - context - log (Logger useful for debugging) Example: Add predictedLabel (string) into pipeline doc as a field doc.addField("sentiment", modelOutput.get("predictedLabel")) */

storePredictedFields - boolean

Store any predictions as predicted_[predicted_field] in the response.

Default: true