Smart Answers Evaluate Pipeline
Evaluate the performance of a Smart Answers pipeline.
See Evaluate a Smart Answers Query Pipeline for configuration instructions.
Legacy Product
Evaluate the performance of a Smart Answers pipeline.
See Evaluate a Smart Answers Query Pipeline for configuration instructions.
Evaluates performance of a configured pipeline
The ID for this job. Used in the API to reference this job. Allowed characters: a-z, A-Z, dash (-) and underscore (_)
<= 63 characters
Match pattern: [a-zA-Z][_\-a-zA-Z0-9]*[a-zA-Z0-9]?
Provide additional key/value pairs to be injected into the training JSON map at runtime. Values will be inserted as-is, so use " to surround string values
object attributes:{key
required : {
display name: Parameter Name
type: string
}value
: {
display name: Parameter Value
type: string
}}
Options used when writing output to Solr or other sources
object attributes:{key
required : {
display name: Parameter Name
type: string
}value
: {
display name: Parameter Value
type: string
}}
Options used when reading input from Solr or other sources.
object attributes:{key
required : {
display name: Parameter Name
type: string
}value
: {
display name: Parameter Value
type: string
}}
Cloud storage path or Solr collection to pull labeled data for use in evaluation
>= 1 characters
The format of the input data - solr, parquet etc.
>= 1 characters
Default: solr
Cloud storage path or Solr collection to store evaluation results (recommended collection is job_reports)
>= 1 characters
If writing to non-Solr sources, this field will accept a comma-delimited list of column names for partitioning the dataframe before writing to the external output
If writing to solr, this field defines the batch size for documents to be pushed to solr.
The format of the output data - solr, parquet etc.
>= 1 characters
Default: solr
Name of the secret used to access cloud storage as defined in the K8s namespace
>= 1 characters
Solr or SQL query to filter training data. Use solr query when solr collection is specified in Training Path. Use SQL query when cloud storage location is specified. The table name for SQL is `spark_input`
The proportion of data to be sampled from the full dataset. Use a value between 0 and 1 for a proportion (e.g. 0.5 for 50%), or for a specific number of examples, use an integer larger than 1. Leave blank for no sampling
Random seed for sampling
Default: 12345
Defines the field in the collection containing the test question
Default: question
Field which contains id or text of the ground truth answer in the evaluation collection
Default: answer_id
Field name in Fusion which contains answer id or text for matching ground truth answer id or text in the evaluation collection
Default: doc_id
Fusion app where indexed documents or QA pairs live.
Configured query pipeline name that should be used for evaluation
Fusion collection where indexed documents or QA pairs live
Additional query parameters to pass to return resultsfrom Fusion. Please specify in dictionary format: e.g. { "rowsFromSolrToRerank": 20,"fq": "type:answer" }"
Fields (comma-separated) that should be returned from the main collection (e.g. question, answer). The job will add them to the output evaluation
Score to be used for ranking and evaluation
Default: ensemble_score
List of metrics that should be computed during evaluation. e.g.["recall","precision","map","mrr"]
Default: ["recall","map","mrr"]
The k retrieval position that will be used to compute for each metric
Default: [1,3,5]
Whether to perform grid search to find the best weights combination for ranking scores for query pipeline's Compute Mathematical Expression stage"
Default: false
Function used in the pipeline to scale Solr scores. E.g., scale by max Solr score retrieved (max), scale by log with base 10 (log10) or take squre root of score (pow0.5)
Default: max
Ranking scores (comma-separated) used for ensemble in the query pipeline's Compute Mathematical Expression stage. The job will perform weights selection for the listed scores
Default: score,vectors_distance
Target ranking metric to optimize during weights selection
Default: mrr@3
Check this to determine similar questions and similar answers via labeling resolution and graph connected components. Does not work well with signals data.
Default: false
Check this option if you want to make concurrent queries to Fusion. It will greatly speed up the job at the cost of increased load on Fusion. Use with caution.
Default: false
Default: argo-qna-evaluate
Allowed values: argo-qna-evaluate