QnA Coldstart Training
Train a Smart Answers model on a cold start (unsupervised) basis, with pre-trained or trained embeddings, and deploy the trained model to the ML Model Service.
Trains QnA model on a cold start (unsupervised) basis with with pre-trained or trained embeddings and deploys the trained model to the ML Model Service
aMaxLen - integer
Average length of question by number of tokens
deployModelName - stringrequired
Name of the model to be used for deployment (must be a valid DNS subdomain with no underscores)
<= 30 characters
Match pattern: [a-zA-Z][\-a-zA-Z0-9]*[a-zA-Z0-9]?
extraTrainingArgs - string
Add any additional arguments for the Python training scripts in this field
id - stringrequired
The ID for this job. Used in the API to reference this job. Allowed characters: a-z, A-Z, dash (-) and underscore (_)
<= 63 characters
Match pattern: [a-zA-Z][_\-a-zA-Z0-9]*[a-zA-Z0-9]?
infBatch - integer
The batch size used in the validation loop
Default: 512
lowerCases - boolean
Whether to lower case all words in training, i.e. whether to treat upper case and lower case words equally.
Default: true
maxLen - integer
Max length of question/answer by number of tokens
maxTokensNum - integer
Drop document if the total words is greater than this value
>= 1
exclusiveMinimum: false
Default: 5000
maxVocabSize - integer
Maximum number of words in vocabulary, words will be trimmed if frequency is too low
>= 1
exclusiveMinimum: false
Default: 100000
minTokensNum - integer
Drop document if the total words is lower than this value
>= 1
exclusiveMinimum: false
Default: 1
modelReplicas - integer
How many replicas of the model should be deployed by Seldon Core
Default: 1
numClusters - integer
Number of clusters to be used for fast dense vector retrieval. Note no clustering will be applied if this is set to 0. If left blank, cluster count will be inferred by the job depending on the data
qMaxLen - integer
Average length of question by number of tokens
samplingProportion - number
The proportion of data to be sampled from the full dataset. Use a value between 0 and 1 for a proportion (e.g. 0.5 for 50%), or for a specific number of examples, use an integer larger than 1. Leave blank for no sampling
sparkConfig - array[object]
Provide additional key/value pairs to be injected into the training JSON map at runtime. Values will be inserted as-is, so use " to surround string values
object attributes:{key
required : {
display name: Parameter Name
type: string
}value
: {
display name: Parameter Value
type: string
}}
textColName - stringrequired
Field which contains the documents that will be used to learn about the vocabulary. If multiple fields, please separate them by comma, e.g. question,answer.
topKClusters - integer
How many closest clusters the model can find for each query. At retrieval time, all answers in top k nearest clusters will be returned and reranked
Default: 10
trainingCollection - stringrequired
Solr Collection containing content documents
>= 1 characters
type - stringrequired
Default: argo-qna-coldstart
Allowed values: argo-qna-coldstart
useCustomEmbeddings - boolean
Choose this option when there are many uncommon words or jargons in data. NOTE: please look at log for warning about percentage of covered vocabulary words, if this proportion is less than 80%, please set this parameter to true and do not use the pre-trained embeddings shipped with our package
Default: false
w2vEpochs - integer
Number of epochs to train custom Word2Vec embeddings
Default: 15
w2vVectorSize - integer
Word-vector dimensionality to represent text (suggested dimension ranges: 100~150
Default: 150
w2vWindowSize - integer
The window size (context words from [-window, window]) for Word2Vec
Default: 8