Use this job to to batch compute query-query similarities using a co-occurrence based approach
countField - string
Solr field containing number of events (e.g., number of clicks).
Default: count_i
dataFormat - string
Spark-compatible format which training data comes in (like 'solr', 'hdfs', 'file', 'parquet' etc)
Default: solr
Allowed values: solrhdfsfileparquet
docIdField - stringrequired
Solr field containing document id that user clicked.
Default: doc_id_s
fieldToVectorize - stringrequired
Field containing queries.
>= 1 characters
Default: query_s
id - stringrequired
The ID for this Spark job. Used in the API to reference this job. Allowed characters: a-z, A-Z, dash (-) and underscore (_). Maximum length: 63 characters.
<= 63 characters
Match pattern: [a-zA-Z][_\-a-zA-Z0-9]*[a-zA-Z0-9]?
maxQueryLength - integer
Queries above this length will not be considered for generating recommendations.
>= 1
exclusiveMinimum: false
Default: 50
minPairOccCount - integer
Minimum number of times a query pair must be generated to be considered valid.
>= 1
exclusiveMinimum: false
Default: 2
minQueryCount - integer
The minimum number of clicked documents needed for comparing queries.
>= 1
exclusiveMinimum: false
Default: 1
minQueryLength - integer
Queries below this length (in number of characters) will not be considered for generating recommendations.
>= 1
exclusiveMinimum: false
Default: 3
outputCollection - string
Collection to store synonym and similar query pairs.
overlapEnabled - boolean
Maximize score for query pairs with overlapping tokens by setting score to 1.
Default: true
overlapThreshold - number
The threshold above which query pairs are consider similar. Decreasing the value can fetch more pairs at the expense of quality.
<= 1
exclusiveMaximum: false
Default: 0.3
randomSeed - integer
For any deterministic pseudorandom number generation
Default: 1234
sessionIdField - string
If session id is not available, specify user id field instead. If this field is left blank, session based recommendations will be disabled.
Default: session_id_s
sparkConfig - array[object]
Spark configuration settings.
object attributes:{key
required : {
display name: Parameter Name
type: string
}value
: {
display name: Parameter Value
type: string
}}
specialCharsFilterString - string
String of special characters to be filtered from queries.
Default: ~!@#$^%&*\(\)_+={}\[\]|;:"'<,>.?`/\\-
stopwordsBlobName - string
Name of the stopwords blob resource. This is a .txt file with one stopword per line. By default the file is called stopwords/stopwords_nltk_en.txt however a custom file can also be used. Check documentation for more details on format and uploading to blob store.
Default: stopwords/stopwords_nltk_en.txt
tokenOverlapValue - number
Minimum amount of overlap to consider for boosting. To specify overlap in terms of ratio, specify a value in (0, 1). To specify overlap in terms of exact count, specify a value >= 1. If value is 0, boost will be applied if one query is a substring of its pair.Stopwords are ignored while counting overlaps.
Default: 1
trainingCollection - stringrequired
Collection containing queries, document id and event counts. Can be either signal aggregation collection or raw signals collection.
trainingDataFilterQuery - string
Solr query to additionally filter the input collection.
>= 3 characters
Default: *:*
trainingDataFrameConfigOptions - object
Additional spark dataframe loading configuration options
trainingDataSamplingFraction - number
Fraction of the training data to use
<= 1
exclusiveMaximum: false
Default: 1
type - stringrequired
Default: similar_queries
Allowed values: similar_queries
writeOptions - array[object]
Options used when writing output to Solr.
object attributes:{key
required : {
display name: Parameter Name
type: string
}value
: {
display name: Parameter Value
type: string
}}