Develop and Deploy a Machine Learning Model
This article describes the high-level process to deploy models to Fusion 5.x.x releases using Seldon Core, and replaces the Source-to-Image (s2i) model deployment method. Seldon Core deploys your model as a Docker image in Kubernetes which you can scale up or down like other Fusion services.
The procedure detailed in this topic deploys an OpenAI pre-trained, Python-based example model. OpenAI uses embeddings. What are embeddings?
For information about how to wrap models in R, Java, JavaScript, or Go, see the Seldon Core documentation. |
Install Seldon Core
Install the Seldon Core Python package using pip
or another Python package manager, such as conda
:
pip install seldon-core
There are no restrictions on other libraries or frameworks, as your environment is wrapped inside a Docker container for deployment.
Create an example model: semantic vector search with OpenAI
As an example of using Seldon Core with Fusion, we will create a simple embedding model using a REST API call to OpenAI’s API. However, there are no restrictions on what you use for your models; Keras, TensorFlow, JAX, scikit-learn, or any other Python libraries are supported.
Create inference class
Use Seldon Core to create an inference class wrapper around models for deployment into Fusion. This requires a class with at least two methods, __init__()
and predict()
, which are used by Seldon Core when deploying the model and serving predictions.
The method __init__()
is called by Seldon Core when the model’s Docker container begins to start. This is where you should initialize your model and any other associated details you may need for inference. At this time, it is recommended that you include your model’s trained parameters directly into the Docker container rather than reaching out to external storage inside __init__
.
The method predict()
is executed whenever the model is called to give a prediction. It receives three parameters:
Parameter | Description |
---|---|
|
A |
|
An iterable set of column names. |
|
An optional dictionary of metadata. |
In Fusion, only the first two parameters are used. Due to the way that Fusion sends input to the Seldon Core model, you should zip
the X
and names
parameters together and then assign your inputs from this resulting Dict
by referencing the keys you placed in the modelInput
HashMap in the Machine Learning Stage. We also recommend raising a ValueError
if a required key is not found in the input, as this will help with debugging.
Here is the complete code for our sentiment analysis model’s wrapper class. Note that this inference class can be easily unit-tested with any Python testing framework and requires no Fusion-specific libraries.
import logging
import os
import sys
from typing import Any, List, Iterable
import numpy as np
import openai
INPUT_COLUMN = "text"
log = logging.getLogger()
# NOTE! Please add
# export DOCKER_DEFAULT_PLATFORM=linux/amd64
# to your ~/.zshrc
# Otherwise, it may be build for the architecture you're currently on
class OpenAIModel():
def __init__(self):
log.info("env: %s", str(os.environ))
openai.api_key = os.getenv("OPENAI_API_KEY", "api key not set")
def get_embedding(self, text, engine="text-similarity-ada-001"):
# replace newlines, which can negatively affect performance.
text = text.replace("\n", " ")
return openai.Embedding.create(input=[text], engine=engine)[
"data"][0]["embedding"]
def predict(self, X: np.ndarray, names: Iterable[str]) -> List[Any]:
log.info("in predict")
model_input = dict(zip(names, X))
log.info("input fields: %s", model_input)
engine = model_input.get("engine", "text-similarity-ada-001")
# Initialize embedding so we know when try failed
embedding = [-1]
text = model_input["text"]
if len(text) > 2000:
log.warn("Input text too long, truncating to 2000 characters")
text = text[0:2000]
try:
embedding = self.get_embedding(text, engine=engine)
except Exception as e: # work on python 3.x
log.info("Failed calling API: %", str(e))
return [embedding]
Create model image
Now that we have a class for our model’s inference, the next step is to create a Docker image to make it ready for deployment. We recommend packaging a Python model to manually create an image for the model.
Build image
DOCKER_DEFAULT_PLATFORM=linux/amd64 docker build . -t [DOCKERHUB USERNAME]/fusion-seldon-openai:latest
Alternatively, DOCKER_DEFAULT_PLATFORM
may be exported from .zshrc
.
Push image
You can deploy your model from either a private registry or Docker Hub. Here is how we push to Docker Hub:
docker push [DOCKERHUB USERNAME]/fusion-seldon-openai:latest
Replace the Dockerhub repo, version, and other relevant fields as needed. If using a private Dockerhub repo, you must obtain the secret and put it into Seldon Deployment Job. |
Deploy to Fusion
Now that your model is tested and Dockerized, you are ready to deploy it within Fusion.
-
In the Fusion UI, navigate to Collections > Jobs.
-
Select Add > Create Seldon Core Model Deployment.
-
Configure the following parameters in the job configuration panel:
Parameter Description Job ID
A string used by the Fusion API to reference the job after its creation.
Model name
A name for the deployed model. This is used to generate the deployment name in Seldon Core. It is also the name that you reference as a
model-id
when making predictions with the ML Service.Model replicas
The number of load-balanced replicas of the model to deploy.
Docker Repository
The public or private repository where the Docker image is located. If you’re using Docker Hub, fill in the Docker Hub username here.
Image name
The name of the image with an optional tag. If no tag is given,
latest
is used.Kubernetes secret
If you’re using a private repository, supply the name of the Kubernetes secret used for access.
Output columns
A list of column names that the model’s
predict
method returns. -
Click Run > Start to run the model deployment job.
Once the job reports success, you can reference your model name in the Machine Learning index pipeline stage.
After deployment
-
After deploying your model, create and modify an
openai_sdep.yaml
file. In the first line,kubectl get sdep
gets the details for the currently running Seldon Deployment job and saves those details to a YAML file.kubectl apply -f open_sdep.yaml
adds the OpenAI key to the Seldon Deployment job the next time it launches.kubectl get sdep openai -o yaml > openai_sdep.yaml # Modify openai_sdep.yaml to add - env: - name: OPENAI_API_KEY value: "your-openai-api-key-here" kubectl apply -f openai_sdep.yaml
-
Delete
sdep
before redeploying the model. The currently running Seldon Deployment job does not have the OpenAI key applied to it. Delete it before redeploying and the new job will have the key.kubectl delete sdep openai
-
Lastly, you can encode into Milvus.
Examples
requirements.txt
Copy and paste the following into a file called requirements.txt
:
openai
seldon-core
Dockerfile
FROM python:3.7-slim
WORKDIR /app
# Install python packages
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
# Copy source code
COPY . .
# Port for GRPC
EXPOSE 5000
# Port for REST
EXPOSE 9000
# Define environment variables
ENV MODEL_NAME OpenAIModel
ENV SERVICE_TYPE MODEL
# Changing folder to default user
RUN chown -R 8888 /app
CMD exec seldon-core-microservice $MODEL_NAME --service-type $SERVICE_TYPE
jsGlue
/* globals Java, logger*/
(function () {
"use strict";
var isDebug = false // turn off or on debug statements for this file's code
//function logIfDebug(m){if(isDebug && m)logger.info(m, Array.prototype.slice.call(arguments).slice(1));}
return function main(request,response , ctx, collection, solrServer, solrServerFactory) {
var vector = ctx.getOrDefault("Test_Vector",[]);
//&q={!knn f=vector_v topK=10}[1.0, 2.0, 3.0, 4.0...]
var q = "{!knn f=vector_v topK=10}" + JSON.stringify(vector);
request.putSingleParam("q",q) ;
};
})();
openai_sdep.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"machinelearning.seldon.io/v1","kind":"SeldonDeployment","metadata":{"annotations":{},"name":"openai","namespace":"YOUR_NAMESPACE"},"spec":{"annotations":{"seldon.io/headless-svc":"true"},"name":"openai","predictors":[{"componentSpecs":[{"spec":{"containers":[{"image":"yourimage/fusion-seldon-openai:0.0.9","imagePullPolicy":"IfNotPresent","name":"openai","resources":{},"volumeMounts":[{"mountPath":"/etc/secrets","name":"my-secret","readOnly":true}]}],"imagePullSecrets":[{"name":"{{MODEL_DOCKER_SECRET}}"}],"nodeSelector":{},"tolerations":[],"volumes":[{"name":"my-secret","projected":{"sources":[{"serviceAccountToken":{"expirationSeconds":3600,"path":"service-account-key"}},{"secret":{"items":[{"key":"sa","path":"service-account-key"}],"name":"service-account-key"}}]}}]}}],"graph":{"endpoint":{"type":"GRPC"},"name":"openai","type":"MODEL"},"labels":{"app.kubernetes.io/component":"ml-service-workflow","app.kubernetes.io/instance":"yourinstancehere-argo-common-workflows","app.kubernetes.io/name":"seldon","app.kubernetes.io/part-of":"fusion","version":"v1666118120"},"name":"openai","replicas":1}]}}
creationTimestamp: "2022-10-18T18:32:29Z"
generation: 2
name: openai
namespace: yournamespace
resourceVersion: "1485955230"
uid: 8d79389d-be76-4a4d-89db-3233d2f12b72
spec:
annotations:
seldon.io/headless-svc: "true"
name: openai
predictors:
- componentSpecs:
- spec:
containers:
- env:
- name: OPENAI_API_KEY
value: "your-openai-api-key-here"
image: yourimage/fusion-seldon-openai:0.0.9
imagePullPolicy: IfNotPresent
name: openai
resources: {}
volumeMounts:
- mountPath: /etc/secrets
name: my-secret
readOnly: true
imagePullSecrets:
- name: '{{MODEL_DOCKER_SECRET}}'
nodeSelector: {}
tolerations: []
volumes:
- name: my-secret
projected:
sources:
- serviceAccountToken:
expirationSeconds: 3600
path: service-account-key
- secret:
items:
- key: sa
path: service-account-key
name: service-account-key
graph:
endpoint:
type: GRPC
name: openai
type: MODEL
labels:
app.kubernetes.io/component: ml-service-workflow
app.kubernetes.io/instance: yourinstancehere-argo-common-workflows
app.kubernetes.io/name: seldon
app.kubernetes.io/part-of: fusion
version: v1666118120
name: openai
replicas: 1
status:
address:
url: http://openai-openai.yourinstancehere.svc.cluster.local:8000/api/v1.0/predictions
conditions:
- lastTransitionTime: "2022-10-18T18:32:55Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: DeploymentsReady
- lastTransitionTime: "2022-10-18T18:32:30Z"
reason: No HPAs defined
status: "True"
type: HpasReady
- lastTransitionTime: "2022-10-18T18:32:30Z"
reason: No KEDA resources defined
status: "True"
type: KedaReady
- lastTransitionTime: "2022-10-18T18:32:30Z"
reason: No PDBs defined
status: "True"
type: PdbsReady
- lastTransitionTime: "2022-10-18T18:36:45Z"
status: "True"
type: Ready
- lastTransitionTime: "2022-10-18T18:36:45Z"
reason: All services created
status: "True"
type: ServicesReady
- lastTransitionTime: "2022-10-18T18:32:55Z"
reason: No VirtualServices defined
status: "True"
type: istioVirtualServicesReady
deploymentStatus:
openai-openai-0-openai:
availableReplicas: 1
replicas: 1
replicas: 1
state: Available