Legacy Product

Fusion 5.10
    Fusion 5.10

    Use this job to build training data for query classification by joining signals with catalog.

    id - stringrequired

    The ID for this Spark job. Used in the API to reference this job. Allowed characters: a-z, A-Z, dash (-) and underscore (_). Maximum length: 63 characters.

    <= 63 characters

    Match pattern: [a-zA-Z][_\-a-zA-Z0-9]*[a-zA-Z0-9]?

    sparkConfig - array[object]

    Spark configuration settings.

    object attributes:{key required : {
     display name: Parameter Name
     type: string
    }
    value : {
     display name: Parameter Value
     type: string
    }
    }

    fieldToVectorize - stringrequired

    Field containing query strings.

    >= 1 characters

    Default: query_s

    dataFormat - string

    Spark-compatible format that contains training data (like 'solr', 'parquet', 'orc' etc)

    >= 1 characters

    Default: solr

    trainingDataFrameConfigOptions - object

    Additional spark dataframe loading configuration options

    trainingDataFilterQuery - string

    Solr query to additionally filter signals. For non-solr data source use SPARK SQL FILTER QUERY under Advanced to filter results

    Default: *:*

    sparkSQL - string

    Use this field to create a Spark SQL query for filtering your input data. The input data will be registered as spark_input

    Default: SELECT * from spark_input

    trainingDataSamplingFraction - number

    Fraction of the training data to use

    <= 1

    exclusiveMaximum: false

    Default: 1

    randomSeed - integer

    For any deterministic pseudorandom number generation

    Default: 1234

    dataOutputFormat - string

    Spark-compatible output format (like 'solr', 'parquet', etc)

    >= 1 characters

    Default: solr

    partitionCols - string

    If writing to non-Solr sources, this field will accept a comma-delimited list of column names for partitioning the dataframe before writing to the external output

    writeOptions - array[object]

    Options used when writing output to Solr or other sources

    object attributes:{key required : {
     display name: Parameter Name
     type: string
    }
    value : {
     display name: Parameter Value
     type: string
    }
    }

    readOptions - array[object]

    Options used when reading input from Solr or other sources.

    object attributes:{key required : {
     display name: Parameter Name
     type: string
    }
    value : {
     display name: Parameter Value
     type: string
    }
    }

    catalogPath - stringrequired

    Catalog collection or cloud storage path which contains item categories.

    catalogFormat - stringrequired

    Spark-compatible format that contains catalog data (like 'solr', 'parquet', 'orc' etc)

    signalsPath - stringrequired

    Signals collection or cloud storage path which contains item categories.

    outputPath - stringrequired

    Output collection or cloud storage path which contains item categories.

    categoryField - stringrequired

    Item category field in catalog.

    catalogIdField - stringrequired

    Item Id field in catalog, which will be used to join with signals

    itemIdField - stringrequired

    Item Id field in signals, which will be used to join with catalog.

    Default: doc_id_s

    countField - stringrequired

    Count Field in raw or aggregated signals.

    Default: aggr_count_i

    topCategoryProportion - number

    Proportion of the top category has to be among all categories.

    Default: 0.5

    topCategoryThreshold - integer

    Minimum number of query,category pair counts.

    >= 1

    exclusiveMinimum: false

    Default: 1

    analyzerConfig - stringrequired

    The style of text analyzer you would like to use.

    Default: { "analyzers": [{ "name": "StdTokLowerStop","charFilters": [ { "type": "htmlstrip" } ],"tokenizer": { "type": "standard" },"filters": [{ "type": "lowercase" }] }],"fields": [{ "regex": ".+", "analyzer": "StdTokLowerStop" } ]}

    type - stringrequired

    Default: build-training

    Allowed values: build-training