Legacy Product

Fusion 5.10
    Fusion 5.10

    Use this job to to batch compute query-query similarities using a co-occurrence based approach

    countField - string

    Solr field containing number of events (e.g., number of clicks).

    Default: count_i

    dataFormat - string

    Spark-compatible format which training data comes in (like 'solr', 'hdfs', 'file', 'parquet' etc)

    Default: solr

    Allowed values: solrhdfsfileparquet

    docIdField - stringrequired

    Solr field containing document id that user clicked.

    Default: doc_id_s

    fieldToVectorize - stringrequired

    Field containing queries.

    >= 1 characters

    Default: query_s

    id - stringrequired

    The ID for this Spark job. Used in the API to reference this job. Allowed characters: a-z, A-Z, dash (-) and underscore (_). Maximum length: 63 characters.

    <= 63 characters

    Match pattern: [a-zA-Z][_\-a-zA-Z0-9]*[a-zA-Z0-9]?

    maxQueryLength - integer

    Queries above this length will not be considered for generating recommendations.

    >= 1

    exclusiveMinimum: false

    Default: 50

    minPairOccCount - integer

    Minimum number of times a query pair must be generated to be considered valid.

    >= 1

    exclusiveMinimum: false

    Default: 2

    minQueryCount - integer

    The minimum number of clicked documents needed for comparing queries.

    >= 1

    exclusiveMinimum: false

    Default: 1

    minQueryLength - integer

    Queries below this length (in number of characters) will not be considered for generating recommendations.

    >= 1

    exclusiveMinimum: false

    Default: 3

    outputCollection - string

    Collection to store synonym and similar query pairs.

    overlapEnabled - boolean

    Maximize score for query pairs with overlapping tokens by setting score to 1.

    Default: true

    overlapThreshold - number

    The threshold above which query pairs are consider similar. Decreasing the value can fetch more pairs at the expense of quality.

    <= 1

    exclusiveMaximum: false

    Default: 0.3

    randomSeed - integer

    For any deterministic pseudorandom number generation

    Default: 1234

    sessionIdField - string

    If session id is not available, specify user id field instead. If this field is left blank, session based recommendations will be disabled.

    Default: session_id_s

    sparkConfig - array[object]

    Spark configuration settings.

    object attributes:{key required : {
     display name: Parameter Name
     type: string
    }
    value : {
     display name: Parameter Value
     type: string
    }
    }

    specialCharsFilterString - string

    String of special characters to be filtered from queries.

    Default: ~!@#$^%&*\(\)_+={}\[\]|;:"'<,>.?`/\\-

    stopwordsBlobName - string

    Name of the stopwords blob resource. This is a .txt file with one stopword per line. By default the file is called stopwords/stopwords_nltk_en.txt however a custom file can also be used. Check documentation for more details on format and uploading to blob store.

    Default: stopwords/stopwords_nltk_en.txt

    tokenOverlapValue - number

    Minimum amount of overlap to consider for boosting. To specify overlap in terms of ratio, specify a value in (0, 1). To specify overlap in terms of exact count, specify a value >= 1. If value is 0, boost will be applied if one query is a substring of its pair.Stopwords are ignored while counting overlaps.

    Default: 1

    trainingCollection - stringrequired

    Collection containing queries, document id and event counts. Can be either signal aggregation collection or raw signals collection.

    trainingDataFilterQuery - string

    Solr query to additionally filter the input collection.

    >= 3 characters

    Default: *:*

    trainingDataFrameConfigOptions - object

    Additional spark dataframe loading configuration options

    trainingDataSamplingFraction - number

    Fraction of the training data to use

    <= 1

    exclusiveMaximum: false

    Default: 1

    type - stringrequired

    Default: similar_queries

    Allowed values: similar_queries

    writeOptions - array[object]

    Options used when writing output to Solr.

    object attributes:{key required : {
     display name: Parameter Name
     type: string
    }
    value : {
     display name: Parameter Value
     type: string
    }
    }