Legacy Product

Fusion 5.10
    Fusion 5.10

    Collection Analysis Jobs

    Use this job when you want to compute basic metrics about your collection, like average word length, phrase percentages, and outlier documents (with very many or very few documents).

    Compute basic metrics about a collection and write back to an output collection

    id - stringrequired

    The ID for this Spark job. Used in the API to reference this job

    <= 128 characters

    Match pattern: ^[A-Za-z0-9_\-]+$

    trainingCollection - stringrequired

    Solr Collection containing labeled training data

    >= 1 characters

    fieldToVectorize - stringrequired

    Solr field containing text training data for prediction/clustering instances,if want to analyze multiple fields with different weights please use the format field1:weight1,field2:weight2

    >= 1 characters

    dataFormat - string

    Spark-compatible format which training data comes in (like 'solr', 'hdfs', 'file', 'parquet' etc)

    Default: solr

    Allowed values: solrhdfsfileparquet

    trainingDataFrameConfigOptions - object

    Additional spark dataframe loading configuration options

    trainingDataFilterQuery - string

    Solr query to use when loading training data

    >= 3 characters

    Default: *:*

    trainingDataSamplingFraction - number

    Fraction of the training data to use

    <= 1

    exclusiveMaximum: false

    Default: 1

    randomSeed - integer

    For any deterministic pseudorandom number generation

    Default: 1234

    outputCollection - string

    Solr Collection to store model-labeled data to

    sourceFields - string

    Solr fields to load (comma-delimited). Leave empty to allow the job to select the required fields to load at runtime.

    numDeviations - integerrequired

    Number of standard deviations away from the mean we deem acceptable for this collection.If you want all the documents set this to be high.

    exclusiveMinimum: false

    dateField - string

    The field that corresponds to the date field you will be using

    type - stringrequired

    Default: collection_analysis

    Allowed values: collection_analysis