QnA Supervised Training
Train a Smart Answers model on a supervised basis, with pre-trained or trained embeddings, and deploy the trained model to the ML Model Service.
See Train A Smart Answers FAQ Model for configuration instructions.
Legacy Product
Train a Smart Answers model on a supervised basis, with pre-trained or trained embeddings, and deploy the trained model to the ML Model Service.
See Train A Smart Answers FAQ Model for configuration instructions.
Trains QnA model on a supervised basis with with pre-trained or trained embeddings and deploys the trained model to the ML Model Service
Average length of question by number of tokens
Name of the field containing answers
>= 1 characters
Base learning rate used in cyclical training
Use GPU for training if available (recommended NVIDIA GPU with 8Gb or more memory)
Default: false
Name of the model to be used for deployment (must be a valid DNS subdomain with no underscores)
<= 30 characters
Match pattern: [a-zA-Z][\-a-zA-Z0-9]*[a-zA-Z0-9]?
Fraction of input to drop with Dropout layer (from 0-1)
Default: 0.15
Add any additional arguments for the Python training scripts in this field
The ID for this job. Used in the API to reference this job. Allowed characters: a-z, A-Z, dash (-) and underscore (_)
<= 63 characters
Match pattern: [a-zA-Z][_\-a-zA-Z0-9]*[a-zA-Z0-9]?
Batch size during validation. If left blank, this will be set automatically based on the input data
The k retrieval position that will be used to compute for each metric
Default: [1,3,5]
Whether to lower case all words in training, i.e. whether to treat upper case and lower case words equally.
Default: false
Maximum learning rate used in cyclical training
Drop document if the total words is greater than this value
>= 1
exclusiveMinimum: false
Default: 5000
Maximum number of words in vocabulary, words will be trimmed if frequency is too low
>= 1
exclusiveMinimum: false
Default: 100000
Drop document if the total words is lower than this value
>= 1
exclusiveMinimum: false
Default: 1
How many replicas of the model should be deployed by Seldon Core
Default: 1
The metric that is chosen among all possible metrics at k to be used to decide when to stop training
Default: mrr@3
List of evaluation metrics on validation data that will be printed in the log at the end of each epoch
Default: ["map", "mrr", "precision", "recall", "roc_auc"]
Stop training if no improvement in metrics by this number of epochs
Number of clusters to be used for fast dense vector retrieval. Note no clustering will be applied if this is set to 0. If left blank, cluster count will be inferred by the job depending on the data
Number of non-matching answers randomly sampled for each question to be used as negative examples when constructing
Default: 15
Number of answers to be used for each question when constructing validation data
Default: 5
Average length of question by number of tokens
Name of the field containing questions
>= 1 characters
List of layers of RNNs can be used, with possible values of lstm, gru. E.g. ["lstm", "lstm"]. This value will be automatically decided based on data if left blank
List of RNN layer units numbers, corresponding to RNN function list. E.g. 150, 150. This value will be automatically decided based on data if left blank
The proportion of data to be sampled from the full dataset. Use a value between 0 and 1 for a proportion (e.g. 0.5 for 50%), or for a specific number of examples, use an integer larger than 1. Leave blank for no sampling
Provide additional key/value pairs to be injected into the training JSON map at runtime. Values will be inserted as-is, so use " to surround string values
object attributes:{key
required : {
display name: Parameter Name
type: string
}value
: {
display name: Parameter Value
type: string
}}
How many closest clusters the model can find for each query. At retrieval time, all answers in top k nearest clusters will be returned and reranked
Default: 10
Batch size during training. If left blank, this will be set automatically based on the input data
Solr Collection containing question and answer pairs
>= 1 characters
Default: argo-qna-supervised
Allowed values: argo-qna-supervised
Automatically tune hyperparameters (will take longer to train)
Default: false
Choose this option when there are many uncommon words or jargons in data. NOTE: please look at log for warning about percentage of covered vocabulary words, if this proportion is less than 80%, please set this parameter to true and do not use the pre-trained embeddings shipped with our package
Default: false
Proportion of the original data to be used as validation sample
>= 0.001
exclusiveMinimum: false
Default: 0.1
Number of epochs to train custom word2vec embeddings
Default: 15
Which fields in the Word2Vec training collection to use in Word2Vec vocabulary embedding training. If multiple fields, please separate them by comma, e.g. description_t,title_t.
Name of the collection which contains the documents that will be used to train Word2Vec if pre-trained word2vec embeddings won't be used.
Word-vector dimensionality to represent text (suggested dimension ranges: 100~150
Default: 150
The window size (context words from [-window, window]) for Word2Vec
Default: 8
L2 penalty used in Adam optimization. Bigger values will provide stronger regularization
Default: 0.0001