Legacy Product

Fusion 5.10
    Fusion 5.10

    Trains Smart Answers model on a cold start (unsupervised) basis with with pre-trained or trained embeddings and deploys the trained model to the ML Model Service

    id - stringrequired

    The ID for this job. Used in the API to reference this job. Allowed characters: a-z, A-Z, dash (-) and underscore (_)

    <= 63 characters

    Match pattern: [a-zA-Z][_\-a-zA-Z0-9]*[a-zA-Z0-9]?

    sparkConfig - array[object]

    Provide additional key/value pairs to be injected into the training JSON map at runtime. Values will be inserted as-is, so use " to surround string values

    object attributes:{key required : {
     display name: Parameter Name
     type: string
    }
    value : {
     display name: Parameter Value
     type: string
    }
    }

    writeOptions - array[object]

    Options used when writing output to Solr or other sources

    object attributes:{key required : {
     display name: Parameter Name
     type: string
    }
    value : {
     display name: Parameter Value
     type: string
    }
    }

    readOptions - array[object]

    Options used when reading input from Solr or other sources.

    object attributes:{key required : {
     display name: Parameter Name
     type: string
    }
    value : {
     display name: Parameter Value
     type: string
    }
    }

    trainingCollection - stringrequired

    Solr collection or cloud storage path where training data is present.

    >= 1 characters

    trainingFormat - stringrequired

    The format of the training data - solr, parquet etc.

    >= 1 characters

    Default: solr

    secretName - string

    Name of the secret used to access cloud storage as defined in the K8s namespace

    >= 1 characters

    trainingDataFilterQuery - string

    Solr or SQL query to filter training data. Use solr query when solr collection is specified in Training Path. Use SQL query when cloud storage location is specified. The table name for SQL is `spark_input`

    textColName - stringrequired

    Field which contains the documents that will be used to learn about the vocabulary. If multiple fields, please separate them by comma, e.g. question,answer.

    deployModelName - stringrequired

    Name of the model to be used for deployment (must be a valid lowercased DNS subdomain with no underscores).

    <= 30 characters

    Match pattern: ^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$

    modelBase - stringrequired

    Specify one of these custom embeddings: ['word_custom', 'bpe_custom'] or choose one of the included pre-trained embeddings / models.

    Default: word_en_300d_2M

    Allowed values: word_custombpe_customword_en_300d_2Mbpe_en_300d_10Kbpe_en_300d_200Kbpe_ja_300d_100Kbpe_ko_300d_100Kbpe_zh_300d_50Kbpe_multi_300d_320Kdistilbert_endistilbert_multibiobert_v1.1

    modelReplicas - integer

    How many replicas of the model should be deployed by Seldon Core

    Default: 1

    w2vEpochs - integer

    Number of epochs to train custom Word2Vec embeddings

    Default: 15

    w2vVectorSize - integer

    Word-vector dimensionality to represent text (suggested dimension ranges: 100~300)

    Default: 150

    w2vWindowSize - integer

    The window size (context words from [-window, window]) for Word2Vec

    Default: 8

    trainingSampleFraction - number

    The proportion of data to be sampled from the full dataset. Use a value between 0 and 1 for a proportion (e.g. 0.5 for 50%), or for a specific number of examples, use an integer larger than 1. Leave blank for no sampling

    minTokensNum - integer

    Drop document if the total words is lower than this value

    >= 1

    exclusiveMinimum: false

    Default: 1

    maxTokensNum - integer

    Drop document if the total words is greater than this value

    >= 1

    exclusiveMinimum: false

    Default: 5000

    lowerCases - boolean

    Whether to lower case all words in training, i.e. whether to treat upper case and lower case words equally. Only utilized for custom embeddings or for the default model base: word_en_300d_2M.

    Default: true

    maxVocabSize - integer

    Maximum number of words in vocabulary, words will be trimmed if frequency is too low

    >= 1

    exclusiveMinimum: false

    Default: 100000

    maxLen - integer

    Max length of question/answer by number of tokens

    infBatch - integer

    The batch size used for encoding during the training

    numClusters - integer

    DEPRECATED: please, consider using Milvus for fast dense vector similarity search. Number of clusters to be used for fast dense vector retrieval. Note no clustering will be applied if this is set to 0. If left blank, cluster count will be inferred by the job depending on the data

    Default: 0

    topKClusters - integer

    How many closest clusters the model can find for each query. At retrieval time, all answers in top k nearest clusters will be returned and reranked

    Default: 10

    unidecode - boolean

    Use Unidecode library to transform Unicode input into ASCII transliterations. Only utilized for custom embeddings or for the default model base: word_en_300d_2M

    Default: true

    globalPoolType - string

    Determines how token vectors should be aggregated to obtain final content vector. Must be one of: [avg, max].

    Default: avg

    Allowed values: avgmax

    type - stringrequired

    Default: argo-qna-coldstart

    Allowed values: argo-qna-coldstart