Legacy Product

Fusion 5.10
    Fusion 5.10

    Evaluates performance of a configured pipeline

    id - stringrequired

    The ID for this job. Used in the API to reference this job. Allowed characters: a-z, A-Z, dash (-) and underscore (_)

    <= 63 characters

    Match pattern: [a-zA-Z][_\-a-zA-Z0-9]*[a-zA-Z0-9]?

    sparkConfig - array[object]

    Provide additional key/value pairs to be injected into the training JSON map at runtime. Values will be inserted as-is, so use " to surround string values

    object attributes:{key required : {
     display name: Parameter Name
     type: string
    }
    value : {
     display name: Parameter Value
     type: string
    }
    }

    writeOptions - array[object]

    Options used when writing output to Solr or other sources

    object attributes:{key required : {
     display name: Parameter Name
     type: string
    }
    value : {
     display name: Parameter Value
     type: string
    }
    }

    readOptions - array[object]

    Options used when reading input from Solr or other sources.

    object attributes:{key required : {
     display name: Parameter Name
     type: string
    }
    value : {
     display name: Parameter Value
     type: string
    }
    }

    inputEvaluationCollection - stringrequired

    Cloud storage path or Solr collection to pull labeled data for use in evaluation

    >= 1 characters

    trainingFormat - stringrequired

    The format of the input data - solr, parquet etc.

    >= 1 characters

    Default: solr

    outputEvaluationCollection - stringrequired

    Cloud storage path or Solr collection to store evaluation results (recommended collection is job_reports)

    >= 1 characters

    partitionFields - string

    If writing to non-Solr sources, this field will accept a comma-delimited list of column names for partitioning the dataframe before writing to the external output

    batchSize - string

    If writing to solr, this field defines the batch size for documents to be pushed to solr.

    outputFormat - stringrequired

    The format of the output data - solr, parquet etc.

    >= 1 characters

    Default: solr

    secretName - string

    Name of the secret used to access cloud storage as defined in the K8s namespace

    >= 1 characters

    trainingDataFilterQuery - string

    Solr or SQL query to filter training data. Use solr query when solr collection is specified in Training Path. Use SQL query when cloud storage location is specified. The table name for SQL is `spark_input`

    trainingSampleFraction - number

    The proportion of data to be sampled from the full dataset. Use a value between 0 and 1 for a proportion (e.g. 0.5 for 50%), or for a specific number of examples, use an integer larger than 1. Leave blank for no sampling

    seed - integer

    Random seed for sampling

    Default: 12345

    testQuestionFieldInFile - string

    Defines the field in the collection containing the test question

    Default: question

    matchFieldInFile - string

    Field which contains id or text of the ground truth answer in the evaluation collection

    Default: answer_id

    matchFieldInFusion - string

    Field name in Fusion which contains answer id or text for matching ground truth answer id or text in the evaluation collection

    Default: doc_id

    appName - stringrequired

    Fusion app where indexed documents or QA pairs live.

    queryPipelineName - stringrequired

    Configured query pipeline name that should be used for evaluation

    collectionName - stringrequired

    Fusion collection where indexed documents or QA pairs live

    additionalParams - string

    Additional query parameters to pass to return resultsfrom Fusion. Please specify in dictionary format: e.g. { "rowsFromSolrToRerank": 20,"fq": "type:answer" }"

    returnFields - stringrequired

    Fields (comma-separated) that should be returned from the main collection (e.g. question, answer). The job will add them to the output evaluation

    rankingScoreField - string

    Score to be used for ranking and evaluation

    Default: ensemble_score

    metricsList - string

    List of metrics that should be computed during evaluation. e.g.["recall","precision","map","mrr"]

    Default: ["recall","map","mrr"]

    kList - string

    The k retrieval position that will be used to compute for each metric

    Default: [1,3,5]

    doWeightsSelection - boolean

    Whether to perform grid search to find the best weights combination for ranking scores for query pipeline's Compute Mathematical Expression stage"

    Default: false

    solrScaleFunc - string

    Function used in the pipeline to scale Solr scores. E.g., scale by max Solr score retrieved (max), scale by log with base 10 (log10) or take squre root of score (pow0.5)

    Default: max

    scoreListForWeights - string

    Ranking scores (comma-separated) used for ensemble in the query pipeline's Compute Mathematical Expression stage. The job will perform weights selection for the listed scores

    Default: score,vectors_distance

    targetRankingMetric - string

    Target ranking metric to optimize during weights selection

    Default: mrr@3

    useLabelingResolution - boolean

    Check this to determine similar questions and similar answers via labeling resolution and graph connected components. Does not work well with signals data.

    Default: false

    useConcurrentQuerying - boolean

    Check this option if you want to make concurrent queries to Fusion. It will greatly speed up the job at the cost of increased load on Fusion. Use with caution.

    Default: false

    type - stringrequired

    Default: argo-qna-evaluate

    Allowed values: argo-qna-evaluate