Extract Short Answers from Longer Documents

Table of Contents

Hardware recommendations
1. Deploy the model in Fusion
2. Configure the Machine Learning query stage
Model output

This topic explains how to deploy and configure the transformer-based deep learning model for short answer extraction with Smart Answers. This model is useful for analyzing long documents and extracting just a paragraph, a sentence, or a few words that answer the question.

This model is trained on the SQuAD2.0 dataset which consists of questions about Wikipedia articles and answers gleaned from those articles. Therefore, this model is most effective with Wikipedia-like content and may produce uneven results when applied to more informal content such as message boards.

The out-of-the-box (OOTB) model only supports English content.

Hardware recommendations

When creating a nodePool to perform short answer extraction, use a configuration that meets these guidelines in order to achieve the best performance:

It is strongly recommended to use the latest possible Intel CPU architecture; Intel CascadeLake or higher architectures are recommended.
Large core count is also recommended: 12-16 cores with 32G of RAM.

1. Deploy the model in Fusion

Navigate to Collections > Jobs.
Select New > Create Seldon Core Model Deployment.
Configure the job as follows:
- Job ID. the ID for this job, such as deploy-answer-extractor.
- Model Name. model name of the Seldon Core deployment that will be referenced in the Machine Learning pipeline stage configurations, such as answer-extractor.
- Docker Repository. lucidworks
- Image Name. answer-extractor:v1.1
- Kubernetes Secret Name for Model Repo. (empty)
- Output Column Names for Model. [answer,score,start,end]
Click Save.
Click Run > Start.

2. Configure the Machine Learning query stage

This model provides the best results when used with one of the question-answering query pipelines.

In Fusion 5.3 and later, the default query pipeline is called APP_NAME-smart-answers.
In Fusion 5.1 and 5.2, there are two: APP_NAME-question-answering and APP_NAME-question-answering-dual-fields.

Starting with one of those pipelines, add a new Machine Learning stage to the end of the pipeline and configure it as described below.

How to configure short answer extraction in the query pipeline

Make sure you have performed the basic configuration of your query pipeline.
In the query pipeline, click Add a Stage > Machine Learning.
In the Model ID field, enter the model name you configured above, such as answer-extractor.

In the Model input transformation script field, enter the following:

var textFieldToExtract = "answer_t"
var numDocsToExtract = 3
responses = new java.util.ArrayList();

var docs = response.get().getInnerResponse().getDocuments();
for (var i=0; i<numDocsToExtract; i++) {
  responses.add(docs[i].getField(textFieldToExtract))
}

var modelInput = new java.util.HashMap()
modelInput.put("question", request.getFirstParam("q"))
modelInput.put("context", responses)
modelInput.put("topk", 3)
modelInput.put("handle_impossible_answer", 'false')
modelInput

Configure the parameters in the script as follows:

question (Required). The name of the field containing the questions.

Make sure that the question is provided as it was originally entered by user. If you have previous stages that augments question (like stopwords removing or synonyms expansion), it is better to copy original question and use it for the answer extraction without additional modifications.

context (Required). A string or list of contexts; by default this is the first num_docs_to_extract documents in the output of the previous stage in the pipeline.

If only one question is present with multiple contexts, that question will be applied to every context and vice versa for 1 context and multiple questions. If a list of questions and contexts is passed, a 1:1 mapping of questions and contexts will be created in the order in which they’re passed.
topk. The number of answers to return (will be chosen by order of likelihood). Default: 1
handle_impossible_answer. Whether or not to deal with a question that has no answer in the context.

If true, an empty string is returned. If false, the most probable (topk) answer(s) are returned regardless of how low the probability score is. Default: True

Experiment with this parameter to see what value returns the most acceptable answers.

For advanced use cases, you can add the following parameters to the script to override their defaults:

batch_size. How many samples to process at a time. Reducing this number will reduce memory usage but increase execution time, while increasing it will increase memory usage and decrease execution time to a certain extent. Default: 8
max_context_len. If set to greater than 0, truncate contexts to this length in characters. Default: 5000
max_answer_len. The maximum length of predicted answers (for example, only answers with a shorter length are considered). Default: 15
max_question_len. The maximum length of the question after tokenization. It will be truncated if needed. Default: 64
doc_stride. If the context is too long to fit with the question for the model, it will be split in several chunks with some overlap. This argument controls the size of that overlap. Default: 128
max_seq_len. The maximum length of the total sentence (context + question) after tokenization. The context will be split in several chunks (using doc_stride) if needed. Default: 384

In the Model output transformation script field, enter the following:

// Parse raw output from model
var jsonOutput = JSON.parse(modelOutput.get("_rawJsonResponse"))

var parsedOutput = {};
for (var i=0; i<jsonOutput["names"].length;i++){
  parsedOutput[jsonOutput["names"][i]] = jsonOutput["ndarray"][i]
}

// Get response documents
var docs = response.get().getInnerResponse().getDocuments();
var ndocs = new java.util.ArrayList();

// Add extracted answers to the response docs
for (var i=0; i < parsedOutput["answer"].length;i++){
  var doc = docs[i];
  doc.putField("extracted_answer", new java.util.ArrayList(parsedOutput["answer"][i]))
  doc.putField("extracted_score", new java.util.ArrayList(parsedOutput["score"][i]))
  doc.putField("extracted_start", new java.util.ArrayList(parsedOutput["start"][i]))
  doc.putField("extracted_end", new java.util.ArrayList(parsedOutput["end"][i]))
  ndocs.add(doc);
}
response.get().getInnerResponse().updateDocuments(ndocs);

Save the pipeline.

Model output

The model adds the following fields to the query pipeline output:

answer. The short answer extracted from the context. This may be blank if handle_impossible_answers=True and topk=1.
score. The score for the extracted answers.
start. The start index of the extracted answer in the provided context.
end. The end index of the extracted answer in the provided context.