Legacy Product

Fusion 5.10
    Fusion 5.10

    Levenshtein Spell Checking Jobs

    Compute the edit distance between all values in a field.

    Levenshtein Spell Checking job is deprecated as of Fusion AI 4.1.0. Use the Token and Phrase Spell Correction job instead.

    Compute edit distance between all values in a field

    id - stringrequired

    The ID for this Spark job. Used in the API to reference this job

    <= 128 characters

    Match pattern: ^[A-Za-z0-9_\-]+$

    trainingCollection - stringrequired

    Solr Collection containing labeled training data

    >= 1 characters

    fieldToVectorize - stringrequired

    Solr field containing text training data for prediction/clustering instances,if want to analyze multiple fields with different weights please use the format field1:weight1,field2:weight2

    >= 1 characters

    dataFormat - string

    Spark-compatible format which training data comes in (like 'solr', 'hdfs', 'file', 'parquet' etc)

    Default: solr

    Allowed values: solrhdfsfileparquet

    trainingDataFrameConfigOptions - object

    Additional spark dataframe loading configuration options

    trainingDataFilterQuery - string

    Solr query to use when loading training data

    >= 3 characters

    Default: *:*

    trainingDataSamplingFraction - number

    Fraction of the training data to use

    <= 1

    exclusiveMaximum: false

    Default: 1

    randomSeed - integer

    For any deterministic pseudorandom number generation

    Default: 1234

    outputCollection - string

    Solr Collection to store model-labeled data to

    sourceFields - string

    Solr fields to load (comma-delimited). Leave empty to allow the job to select the required fields to load at runtime.

    maxDistance - integer

    The maximum distance between related queries you are interested in. If you want all the results set this to a very large number.

    >= 1

    exclusiveMinimum: false

    Default: 2

    headSize - integer

    The number of queries you want to include in the 'head' which all the other queries will be compared against. Note that this number should not be too large (probably not larger than 10000) because it will cause performance issues. Also note that this simply takes the top queries as in the ones that appear most frequently

    Default: 100

    lenScale - integer

    How you want to scale the returned distances with the size of the head and tail queries. The scaling is that the edit_dist <= query_length/length_scale. If you want all possible queries set this term to 1. If you want only very small distances set this term to be large.

    >= 1

    <= 10000

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 1

    type - stringrequired

    Default: levenshtein

    Allowed values: levenshtein