Query-to-Query Similarity Job
Use this job to to batch compute query-query similarities using ALS.
id - stringrequired
The ID for this Spark job. Used in the API to reference this job. Allowed characters: a-z, A-Z, dash (-) and underscore (_)
<= 128 characters
Match pattern: ^[A-Za-z0-9_\-]+$
modelId - string
Identifier for the recommender model. Will be used as the unique key when storing the model in Solr.
modelCollection - string
Collection to load and store the computed model (if absent, it won't be loaded or saved)
saveModel - boolean
Whether we should save the computed ALS model in Solr
Default: false
trainingCollection - stringrequired
Item/Query preference collection (often a signals collection or signals aggregation collection)
trainingDataFilterQuery - string
Solr query to filter training data (e.g. downsampling or selecting based on min. pref values)
Default: *:*
popularQueryMin - integer
Items must have at least this # of unique users interacting with it to go into the sample
Default: 2
trainingSampleFraction - number
Downsample preferences for items (bounded to at least 2) by this fraction
<= 1
exclusiveMaximum: false
Default: 1
outputQuerySimCollection - stringrequired
Collection to store batch-computed query/query similarities (if absent, none computed)
outputItemsForQueriesCollection - string
Collection to store batch-computed items-for-queries recommendations (if absent, none computed)
queryField - string
Solr field name containing stored queries
Default: query
itemIdField - string
Solr field name containing stored item ids
Default: item_id_s
weightField - string
Solr field name containing stored weights (i.e. time decayed / position weighted counts) the item has for that query
Default: weight_d
numSims - integer
Batch compute and store this many query similarities per query
Default: 10
numItemsPerQuery - integer
Batch compute and store this many item recommendations per query
Default: 10
initialRank - integer
Number of user/item factors in the recommender decomposition (or starting guess for it, if doing parameter grid search)
Default: 100
maxTrainingIterations - integer
Maximum number of iterations to use when learning the matrix decomposition
Default: 10
initialAlpha - number
Confidence weight (between 0 and 1) to give the implicit preferences (or starting guess, if doing parameter grid search)
Default: 0.5
initialLambda - number
Smoothing parameter to avoid overfitting (or starting guess, if doing parameter grid search). Slightly larger value needed for small data sets
Default: 0.01
gridSearchWidth - integer
Parameter grid search to be done centered around initial parameter guesses, exponential step size, this number of steps (if <= 0, no grid search)
Default: 1
randomSeed - integer
Pseudorandom determinism fixed by keeping this seed constant
Default: 13
implicitRatings - boolean
Treat training preferences as implicit signals of interest (i.e. clicks or other actions) as opposed to explicit query ratings
Default: true
alwaysTrain - boolean
Even if a model with this modelId exists, re-train if set true
Default: true
trainingDataFrameConfigOptions - object
Additional spark dataframe loading configuration options
type - stringrequired
Default: query_similarity
Allowed values: query_similarity