Machine Learning Index Stage

Table of Contents

Configuration

The Fusion machine learning indexing stage uses a trained machine learning model to analyze a field or fields of a PipelineDocument and stores the results of analysis in a new field of either the PipelineDocument or Context object.

In order to use the Machine Learning Stage, you must train a machine learning model. There are two different ways to train a model:

Use a Fusion AI job that trains a model, like Logistic Regression or Random Forest.
Train a model using Spark’s MLlib API outside of Fusion, and upload this model into Fusion’s blob store. Complete details are available in Machine Learning Models in Fusion. TIP: When specifying field names, multiple field names are supported, in this format: field1:weight,field2:weight,field3:weight

Although this stage is available without a Fusion AI license, it is only effective after running the Fusion AI jobs mentioned above.

Configuration

When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.

Machine Learning Index Stage

Configuration

skip - boolean

label - string

condition - string

modelId - stringrequired

docFeatureFieldName - string

predictionFieldName - stringrequired

defaultValue - string

failOnError - boolean

storeInContext - boolean

includeRawPredictions - boolean