Collection Analysis Jobs
Use this job when you want to compute basic metrics about your collection, like average word length, phrase percentages, and outlier documents (with very many or very few documents).
Legacy Product
Use this job when you want to compute basic metrics about your collection, like average word length, phrase percentages, and outlier documents (with very many or very few documents).
Use this job when you want to compute basic metrics about your collection such as average word length, phrase percentages, and outlier documents (with large or few number of terms)
The ID for this Spark job. Used in the API to reference this job. Allowed characters: a-z, A-Z, dash (-) and underscore (_)
<= 128 characters
Match pattern: ^[A-Za-z0-9_\-]+$
Collection you want to analyze.
>= 1 characters
Field you want to analyze the lengths of. For example: description fields.
>= 1 characters
Spark-compatible format which training data comes in (like 'solr', 'hdfs', 'file', 'parquet' etc)
Default: solr
Allowed values: solrhdfsfileparquet
Additional spark dataframe loading configuration options
Solr query to use when loading training data
>= 3 characters
Default: *:*
Fraction of the training data to use
<= 1
exclusiveMaximum: false
Default: 1
For any deterministic pseudorandom number generation
Default: 1234
Solr Collection to store model-labeled data to
Solr fields to load (comma-delimited). Leave empty to allow the job to select the required fields to load at runtime.
Number of standard deviations away from the mean we deem acceptable for obtaining outlier from this collection. If you want all the documents, set this to a larger number.
exclusiveMinimum: false
The field that corresponds to the date field you will be using
Default: collection_analysis
Allowed values: collection_analysis