id - stringrequired
The ID for this Spark job. Used in the API to reference this job. Allowed characters: a-z, A-Z, dash (-) and underscore (_). Maximum length: 63 characters.
<= 63 characters
Match pattern: [a-zA-Z][_\-a-zA-Z0-9]*[a-zA-Z0-9]?
sparkConfig - array[object]
Spark configuration settings.
object attributes:{key
required : {
display name: Parameter Name
type: string
}value
: {
display name: Parameter Value
type: string
}}
trainingCollection - stringrequired
Solr Collection containing labeled training data
>= 1 characters
fieldToVectorize - string
Fields to extract from Solr (not used for other formats)
>= 1 characters
dataFormat - stringrequired
Spark-compatible format that contains training data (like 'solr', 'parquet', 'orc' etc)
>= 1 characters
Default: solr
trainingDataFrameConfigOptions - object
Additional spark dataframe loading configuration options
trainingDataFilterQuery - string
Solr query to use when loading training data if using Solr
Default: *:*
sparkSQL - string
Use this field to create a Spark SQL query for filtering your input data. The input data will be registered as spark_input
Default: SELECT * from spark_input
trainingDataSamplingFraction - number
Fraction of the training data to use
<= 1
exclusiveMaximum: false
Default: 1
randomSeed - integer
For any deterministic pseudorandom number generation
Default: 1234
outputCollection - string
Solr Collection to store model-labeled data to
dataOutputFormat - string
Spark-compatible output format (like 'solr', 'parquet', etc)
>= 1 characters
Default: solr
sourceFields - string
Solr fields to load (comma-delimited). Leave empty to allow the job to select the required fields to load at runtime.
partitionCols - string
If writing to non-Solr sources, this field will accept a comma-delimited list of column names for partitioning the dataframe before writing to the external output
writeOptions - array[object]
Options used when writing output to Solr or other sources
object attributes:{key
required : {
display name: Parameter Name
type: string
}value
: {
display name: Parameter Value
type: string
}}
readOptions - array[object]
Options used when reading input from Solr or other sources.
object attributes:{key
required : {
display name: Parameter Name
type: string
}value
: {
display name: Parameter Value
type: string
}}
refTimeRange - integerrequired
Number of reference days: number of days to use as baseline to find trends (calculated from today)
targetTimeRange - integerrequired
Number of target days: number of days to use as target to find trends (calculated from today)
numWeeksRef - number
If using filter queries for reference and target time ranges, enter the value of (reference days / target days) here (if not using filter queries, this will be calculated automatically)
sparkPartitions - integer
Spark will re-partition the input to have this number of partitions. Increase for greater parallelism
Default: 200
countField - stringrequired
Field containing the number of times an event (e.g. click) occurs for a particular query; count_i in the raw signal collection or aggr_count_i in the aggregated signal collection.
>= 1 characters
Default: aggr_count_i
referenceTimeFilterQuery - string
Add a Spark SQL filter query here for greater control of time filtering
targetFilterTimeQuery - string
Add a Spark SQL filter query here for greater control of time filtering
typeField - stringrequired
Enter type field (default is type)
Default: aggr_type_s
timeField - stringrequired
Enter time field (default is timestamp_tdt)
Default: timestamp_tdt
docIdField - stringrequired
Enter document id field (default is doc_id)
Default: doc_id_s
types - stringrequired
Enter a comma-separated list of event types to filter on
Default: click,add
recsCount - integerrequired
Maximum number of recs to generate (or -1 for no limit)
Default: 500
type - stringrequired
Default: trending-recommender
Allowed values: trending-recommender