Query-to-Query Similarity Job

Train a collaborative filtering matrix decomposition recommender using SparkML’s Alternating Least Squares (ALS) to batch-compute query-query similarities. This can be used for items-for-query recommendations as well as queries-for-query recommendations.

Use this job to to batch compute query-query similarities using ALS.

id - stringrequired

The ID for this Spark job. Used in the API to reference this job. Allowed characters: a-z, A-Z, dash (-) and underscore (_)

<= 128 characters

Match pattern: ^[A-Za-z0-9_\-]+$

modelId - string

Identifier for the recommender model. Will be used as the unique key when storing the model in Solr.

modelCollection - string

Collection to load and store the computed model (if absent, it won't be loaded or saved)

saveModel - boolean

Whether we should save the computed ALS model in Solr

Default: false

trainingCollection - stringrequired

Item/Query preference collection (often a signals collection or signals aggregation collection)

trainingDataFilterQuery - string

Solr query to filter training data (e.g. downsampling or selecting based on min. pref values)

Default: *:*

popularQueryMin - integer

Items must have at least this # of unique users interacting with it to go into the sample

Default: 2

trainingSampleFraction - number

Downsample preferences for items (bounded to at least 2) by this fraction

<= 1

exclusiveMaximum: false

Default: 1

outputQuerySimCollection - stringrequired

Collection to store batch-computed query/query similarities (if absent, none computed)

outputItemsForQueriesCollection - string

Collection to store batch-computed items-for-queries recommendations (if absent, none computed)

queryField - string

Solr field name containing stored queries

Default: query

itemIdField - string

Solr field name containing stored item ids

Default: item_id_s

weightField - string

Solr field name containing stored weights (i.e. time decayed / position weighted counts) the item has for that query

Default: weight_d

numSims - integer

Batch compute and store this many query similarities per query

Default: 10

numItemsPerQuery - integer

Batch compute and store this many item recommendations per query

Default: 10

initialRank - integer

Number of user/item factors in the recommender decomposition (or starting guess for it, if doing parameter grid search)

Default: 100

maxTrainingIterations - integer

Maximum number of iterations to use when learning the matrix decomposition

Default: 10

initialAlpha - number

Confidence weight (between 0 and 1) to give the implicit preferences (or starting guess, if doing parameter grid search)

Default: 0.5

initialLambda - number

Smoothing parameter to avoid overfitting (or starting guess, if doing parameter grid search). Slightly larger value needed for small data sets

Default: 0.01

gridSearchWidth - integer

Parameter grid search to be done centered around initial parameter guesses, exponential step size, this number of steps (if <= 0, no grid search)

Default: 1

randomSeed - integer

Pseudorandom determinism fixed by keeping this seed constant

Default: 13

implicitRatings - boolean

Treat training preferences as implicit signals of interest (i.e. clicks or other actions) as opposed to explicit query ratings

Default: true

alwaysTrain - boolean

Even if a model with this modelId exists, re-train if set true

Default: true

trainingDataFrameConfigOptions - object

Additional spark dataframe loading configuration options

type - stringrequired

Default: query_similarity

Allowed values: query_similarity