Collection Analysis Jobs
Use this job when you want to compute basic metrics about your collection, like average word length, phrase percentages, and outlier documents (with very many or very few documents).
Legacy Product
Use this job when you want to compute basic metrics about your collection, like average word length, phrase percentages, and outlier documents (with very many or very few documents).
Compute basic metrics about a collection and write back to an output collection
The ID for this Spark job. Used in the API to reference this job
<= 128 characters
Match pattern: ^[A-Za-z0-9_\-]+$
Solr Collection containing labeled training data
>= 1 characters
Solr field containing text training data for prediction/clustering instances,if want to analyze multiple fields with different weights please use the format field1:weight1,field2:weight2
>= 1 characters
Spark-compatible format which training data comes in (like 'solr', 'hdfs', 'file', 'parquet' etc)
Default: solr
Allowed values: solrhdfsfileparquet
Additional spark dataframe loading configuration options
Solr query to use when loading training data
>= 3 characters
Default: *:*
Fraction of the training data to use
<= 1
exclusiveMaximum: false
Default: 1
For any deterministic pseudorandom number generation
Default: 1234
Solr Collection to store model-labeled data to
Solr fields to load (comma-delimited). Leave empty to allow the job to select the required fields to load at runtime.
Number of standard deviations away from the mean we deem acceptable for this collection.If you want all the documents set this to be high.
exclusiveMinimum: false
The field that corresponds to the date field you will be using
Default: collection_analysis
Allowed values: collection_analysis