Legacy Product

Fusion 5.10
    Fusion 5.10

    Query-to-Query Collaborative Similarity Job

    Table of Contents

    Train a collaborative filtering matrix decomposition recommender using SparkML’s Alternating Least Squares (ALS) to batch-compute query-query similarities. This can be used for items-for-query recommendations as well as queries-for-query recommendations.

    query
    count_i
    type
    timstamp_tdt
    user_id
    doc_id
    session_id
    fusion_query_id

    Required signals fields:

    required

    required

    required

    required

    required

    This job is deprecated in Fusion 5.3.x. Use the Query-to-Query Session-Based Similarity jobs for better performance and query coverage.

    Use this job to to batch compute query-query similarities using ALS.

    alwaysTrain - boolean

    Even if a model with this modelId exists, re-train if set true

    Default: true

    gridSearchWidth - integer

    Parameter grid search to be done centered around initial parameter guesses, exponential step size, this number of steps (if <= 0, no grid search)

    Default: 1

    id - stringrequired

    The ID for this Spark job. Used in the API to reference this job. Allowed characters: a-z, A-Z, dash (-) and underscore (_). Maximum length: 63 characters.

    <= 63 characters

    Match pattern: [a-zA-Z][_\-a-zA-Z0-9]*[a-zA-Z0-9]?

    implicitRatings - boolean

    Treat training preferences as implicit signals of interest (i.e. clicks or other actions) as opposed to explicit query ratings

    Default: true

    initialAlpha - number

    Confidence weight (between 0 and 1) to give the implicit preferences (or starting guess, if doing parameter grid search)

    Default: 0.5

    initialLambda - number

    Smoothing parameter to avoid overfitting (or starting guess, if doing parameter grid search). Slightly larger value needed for small data sets

    Default: 0.01

    initialRank - integer

    Number of user/item factors in the recommender decomposition (or starting guess for it, if doing parameter grid search)

    Default: 100

    itemIdField - string

    Solr field name containing stored item ids

    Default: item_id_s

    maxTrainingIterations - integer

    Maximum number of iterations to use when learning the matrix decomposition

    Default: 10

    modelCollection - string

    Collection to load and store the computed model (if absent, it won't be loaded or saved)

    modelId - string

    Identifier for the recommender model. Will be used as the unique key when storing the model in Solr.

    numItemsPerQuery - integer

    Batch compute and store this many item recommendations per query

    Default: 10

    numSims - integer

    Batch compute and store this many query similarities per query

    Default: 10

    outputItemsForQueriesCollection - string

    Collection to store batch-computed items-for-queries recommendations (if absent, none computed)

    outputQuerySimCollection - stringrequired

    Collection to store batch-computed query/query similarities (if absent, none computed)

    popularQueryMin - integer

    Items must have at least this # of unique users interacting with it to go into the sample

    Default: 2

    queryField - string

    Solr field name containing stored queries

    Default: query

    randomSeed - integer

    Pseudorandom determinism fixed by keeping this seed constant

    Default: 13

    saveModel - boolean

    Whether we should save the computed ALS model in Solr

    Default: false

    sparkConfig - array[object]

    Spark configuration settings.

    object attributes:{key required : {
     display name: Parameter Name
     type: string
    }
    value : {
     display name: Parameter Value
     type: string
    }
    }

    trainingCollection - stringrequired

    Item/Query preference collection (often a signals collection or signals aggregation collection)

    trainingDataFilterQuery - string

    Solr query to filter training data (e.g. downsampling or selecting based on min. pref values)

    Default: *:*

    trainingDataFrameConfigOptions - object

    Additional spark dataframe loading configuration options

    trainingSampleFraction - number

    Downsample preferences for items (bounded to at least 2) by this fraction

    <= 1

    exclusiveMaximum: false

    Default: 1

    type - stringrequired

    Default: query_similarity

    Allowed values: query_similarity

    weightField - string

    Solr field name containing stored weights (i.e. time decayed / position weighted counts) the item has for that query

    Default: weight_d

    writeOptions - array[object]

    Options used when writing output to Solr.

    object attributes:{key required : {
     display name: Parameter Name
     type: string
    }
    value : {
     display name: Parameter Value
     type: string
    }
    }