Query-to-Query Collaborative Similarity Job

Table of Contents

Configuration properties

Train a collaborative filtering matrix decomposition recommender using SparkML’s Alternating Least Squares (ALS) to batch-compute query-query similarities. This can be used for items-for-query recommendations as well as queries-for-query recommendations.

query

count_i

type

timstamp_tdt

user_id

doc_id

session_id

fusion_query_id

Required signals fields:

Configuration properties

Use this job to to batch compute query-query similarities using ALS. Deprecated as of Fusion 5.2.0 and will be removed in a future release; use the Query-to-Query Session Based Similarity job instead.

alwaysTrain - boolean

Even if a model with this modelId exists, re-train if set true

Default: true

gridSearchWidth - integer

Parameter grid search to be done centered around initial parameter guesses, exponential step size, this number of steps (if <= 0, no grid search)

Default: 1

id - stringrequired

The ID for this Spark job. Used in the API to reference this job. Allowed characters: a-z, A-Z, dash (-) and underscore (_). Maximum length: 63 characters.

<= 63 characters

Match pattern: [a-zA-Z][_\-a-zA-Z0-9]*[a-zA-Z0-9]?

implicitRatings - boolean

Treat training preferences as implicit signals of interest (i.e. clicks or other actions) as opposed to explicit query ratings

Default: true

initialAlpha - number

Confidence weight (between 0 and 1) to give the implicit preferences (or starting guess, if doing parameter grid search)

Default: 0.5

initialLambda - number

Smoothing parameter to avoid overfitting (or starting guess, if doing parameter grid search). Slightly larger value needed for small data sets

Default: 0.01

initialRank - integer

Number of user/item factors in the recommender decomposition (or starting guess for it, if doing parameter grid search)

Default: 100

itemIdField - string

Solr field name containing stored item ids

Default: item_id_s

maxTrainingIterations - integer

Maximum number of iterations to use when learning the matrix decomposition

Default: 10

modelCollection - string

Collection to load and store the computed model (if absent, it won't be loaded or saved)

modelId - string

Identifier for the recommender model. Will be used as the unique key when storing the model in Solr.

numItemsPerQuery - integer

Batch compute and store this many item recommendations per query

Default: 10

numSims - integer

Batch compute and store this many query similarities per query

Default: 10

outputItemsForQueriesCollection - string

Collection to store batch-computed items-for-queries recommendations (if absent, none computed)

outputQuerySimCollection - stringrequired

Collection to store batch-computed query/query similarities (if absent, none computed)

popularQueryMin - integer

Items must have at least this # of unique users interacting with it to go into the sample

Default: 2

queryField - string

Solr field name containing stored queries

Default: query

randomSeed - integer

Pseudorandom determinism fixed by keeping this seed constant

Default: 13

saveModel - boolean

Whether we should save the computed ALS model in Solr

Default: false

sparkConfig - array[object]

Spark configuration settings.

object attributes:{key required : {
display name: Parameter Name
type: string
}value : {
display name: Parameter Value
type: string
}}

trainingCollection - stringrequired

Item/Query preference collection (often a signals collection or signals aggregation collection)

trainingDataFilterQuery - string

Solr query to filter training data (e.g. downsampling or selecting based on min. pref values)

Default: *:*

trainingDataFrameConfigOptions - object

Additional spark dataframe loading configuration options

trainingSampleFraction - number

Downsample preferences for items (bounded to at least 2) by this fraction

<= 1

exclusiveMaximum: false

Default: 1

type - stringrequired

Default: query_similarity

Allowed values: query_similarity

weightField - string

Solr field name containing stored weights (i.e. time decayed / position weighted counts) the item has for that query

Default: weight_d

writeOptions - array[object]

Options used when writing output to Solr.

object attributes:{key required : {
display name: Parameter Name
type: string
}value : {
display name: Parameter Value
type: string
}}