Develop and Deploy a Machine Learning Model
This article describes the high-level process to deploy models to Fusion 5.x.x releases using Seldon Core, and replaces the Source-to-Image (s2i) model deployment method. Seldon Core deploys your model as a Docker image in Kubernetes which you can scale up or down like other Fusion services.
The procedure detailed in this topic deploys an OpenAI pre-trained, Python-based example model. OpenAI uses embeddings. What are embeddings?
| For information about how to wrap models in R, Java, JavaScript, or Go, see the Seldon Core documentation. | 
Install Seldon Core
Install the Seldon Core Python package using pip or another Python package manager, such as conda:
pip install seldon-coreThere are no restrictions on other libraries or frameworks, as your environment is wrapped inside a Docker container for deployment.
Create an example model: semantic vector search with OpenAI
As an example of using Seldon Core with Fusion, we will create a simple embedding model using a REST API call to OpenAI’s API. However, there are no restrictions on what you use for your models; Keras, TensorFlow, JAX, scikit-learn, or any other Python libraries are supported.
Create inference class
Use Seldon Core to create an inference class wrapper around models for deployment into Fusion. This requires a class with at least two methods, __init__() and predict(), which are used by Seldon Core when deploying the model and serving predictions.
The method __init__() is called by Seldon Core when the model’s Docker container begins to start. This is where you should initialize your model and any other associated details you may need for inference. At this time, it is recommended that you include your model’s trained parameters directly into the Docker container rather than reaching out to external storage inside __init__.
The method predict() is executed whenever the model is called to give a prediction. It receives three parameters:
| Parameter | Description | 
|---|---|
| 
 | A  | 
| 
 | An iterable set of column names. | 
| 
 | An optional dictionary of metadata. | 
In Fusion, only the first two parameters are used. Due to the way that Fusion sends input to the Seldon Core model, you should zip the X and names parameters together and then assign your inputs from this resulting Dict by referencing the keys you placed in the modelInput HashMap in the Machine Learning Stage. We also recommend raising a ValueError if a required key is not found in the input, as this will help with debugging.
Here is the complete code for our sentiment analysis model’s wrapper class. Note that this inference class can be easily unit-tested with any Python testing framework and requires no Fusion-specific libraries.
import logging
import os
import sys
from typing import Any, List, Iterable
import numpy as np
import openai
INPUT_COLUMN = "text"
log = logging.getLogger()
# NOTE!  Please add
# export DOCKER_DEFAULT_PLATFORM=linux/amd64
# to your ~/.zshrc
# Otherwise, it may be build for the architecture you're currently on
class OpenAIModel():
    def __init__(self):
        log.info("env: %s", str(os.environ))
        openai.api_key = os.getenv("OPENAI_API_KEY", "api key not set")
    def get_embedding(self, text, engine="text-similarity-ada-001"):
        # replace newlines, which can negatively affect performance.
        text = text.replace("\n", " ")
        return openai.Embedding.create(input=[text], engine=engine)[
            "data"][0]["embedding"]
    def predict(self, X: np.ndarray, names: Iterable[str]) -> List[Any]:
        log.info("in predict")
        model_input = dict(zip(names, X))
        log.info("input fields: %s", model_input)
        engine = model_input.get("engine", "text-similarity-ada-001")
        # Initialize embedding so we know when try failed
        embedding = [-1]
        text = model_input["text"]
        if len(text) > 2000:
            log.warn("Input text too long, truncating to 2000 characters")
            text = text[0:2000]
        try:
            embedding = self.get_embedding(text, engine=engine)
        except Exception as e:  # work on python 3.x
            log.info("Failed calling API: %", str(e))
        return [embedding]Create model image
Now that we have a class for our model’s inference, the next step is to create a Docker image to make it ready for deployment. We recommend packaging a Python model to manually create an image for the model.
Build image
DOCKER_DEFAULT_PLATFORM=linux/amd64 docker build . -t [DOCKERHUB USERNAME]/fusion-seldon-openai:latestAlternatively, DOCKER_DEFAULT_PLATFORM may be exported from .zshrc.
Push image
You can deploy your model from either a private registry or Docker Hub. Here is how we push to Docker Hub:
docker push [DOCKERHUB USERNAME]/fusion-seldon-openai:latest| Replace the Dockerhub repo, version, and other relevant fields as needed. If using a private Dockerhub repo, you must obtain the secret and put it into Seldon Deployment Job. | 
Deploy to Fusion
Now that your model is tested and Dockerized, you are ready to deploy it within Fusion.
- 
In the Fusion UI, navigate to Collections > Jobs. 
- 
Select Add > Create Seldon Core Model Deployment.  
- 
Configure the following parameters in the job configuration panel: Parameter Description Job ID A string used by the Fusion API to reference the job after its creation. Model name A name for the deployed model. This is used to generate the deployment name in Seldon Core. It is also the name that you reference as a model-idwhen making predictions with the ML Service.Model replicas The number of load-balanced replicas of the model to deploy. Docker Repository The public or private repository where the Docker image is located. If you’re using Docker Hub, fill in the Docker Hub username here. Image name The name of the image with an optional tag. If no tag is given, latestis used.Kubernetes secret If you’re using a private repository, supply the name of the Kubernetes secret used for access. Output columns A list of column names that the model’s predictmethod returns.
- 
Click Run > Start to run the model deployment job. 
Once the job reports success, you can reference your model name in the Machine Learning index pipeline stage.
After deployment
- 
After deploying your model, create and modify an openai_sdep.yamlfile. In the first line,kubectl get sdepgets the details for the currently running Seldon Deployment job and saves those details to a YAML file.kubectl apply -f open_sdep.yamladds the OpenAI key to the Seldon Deployment job the next time it launches.kubectl get sdep openai -o yaml > openai_sdep.yaml # Modify openai_sdep.yaml to add - env: - name: OPENAI_API_KEY value: "your-openai-api-key-here" kubectl apply -f openai_sdep.yaml
- 
Delete sdepbefore redeploying the model. The currently running Seldon Deployment job does not have the OpenAI key applied to it. Delete it before redeploying and the new job will have the key.kubectl delete sdep openai
- 
Lastly, you can encode into Milvus. 
Examples
requirements.txt
Copy and paste the following into a file called requirements.txt:
openai
seldon-coreDockerfile
FROM python:3.7-slim
WORKDIR /app
# Install python packages
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
# Copy source code
COPY . .
# Port for GRPC
EXPOSE 5000
# Port for REST
EXPOSE 9000
# Define environment variables
ENV MODEL_NAME OpenAIModel
ENV SERVICE_TYPE MODEL
# Changing folder to default user
RUN chown -R 8888 /app
CMD exec seldon-core-microservice $MODEL_NAME --service-type $SERVICE_TYPEjsGlue
/* globals Java, logger*/
(function () {
    "use strict";
    var isDebug = false // turn off or on debug statements for this file's code
    //function logIfDebug(m){if(isDebug && m)logger.info(m, Array.prototype.slice.call(arguments).slice(1));}
    return function main(request,response , ctx, collection, solrServer, solrServerFactory) {
      var vector = ctx.getOrDefault("Test_Vector",[]);
      //&q={!knn f=vector_v topK=10}[1.0, 2.0, 3.0, 4.0...]
      var q = "{!knn f=vector_v topK=10}" + JSON.stringify(vector);
      request.putSingleParam("q",q)   ;
    };
})();openai_sdep.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"machinelearning.seldon.io/v1","kind":"SeldonDeployment","metadata":{"annotations":{},"name":"openai","namespace":"YOUR_NAMESPACE"},"spec":{"annotations":{"seldon.io/headless-svc":"true"},"name":"openai","predictors":[{"componentSpecs":[{"spec":{"containers":[{"image":"yourimage/fusion-seldon-openai:0.0.9","imagePullPolicy":"IfNotPresent","name":"openai","resources":{},"volumeMounts":[{"mountPath":"/etc/secrets","name":"my-secret","readOnly":true}]}],"imagePullSecrets":[{"name":"{{MODEL_DOCKER_SECRET}}"}],"nodeSelector":{},"tolerations":[],"volumes":[{"name":"my-secret","projected":{"sources":[{"serviceAccountToken":{"expirationSeconds":3600,"path":"service-account-key"}},{"secret":{"items":[{"key":"sa","path":"service-account-key"}],"name":"service-account-key"}}]}}]}}],"graph":{"endpoint":{"type":"GRPC"},"name":"openai","type":"MODEL"},"labels":{"app.kubernetes.io/component":"ml-service-workflow","app.kubernetes.io/instance":"yourinstancehere-argo-common-workflows","app.kubernetes.io/name":"seldon","app.kubernetes.io/part-of":"fusion","version":"v1666118120"},"name":"openai","replicas":1}]}}
  creationTimestamp: "2022-10-18T18:32:29Z"
  generation: 2
  name: openai
  namespace: yournamespace
  resourceVersion: "1485955230"
  uid: 8d79389d-be76-4a4d-89db-3233d2f12b72
spec:
  annotations:
    seldon.io/headless-svc: "true"
  name: openai
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - env:
          - name: OPENAI_API_KEY
            value: "your-openai-api-key-here"
          image: yourimage/fusion-seldon-openai:0.0.9
          imagePullPolicy: IfNotPresent
          name: openai
          resources: {}
          volumeMounts:
          - mountPath: /etc/secrets
            name: my-secret
            readOnly: true
        imagePullSecrets:
        - name: '{{MODEL_DOCKER_SECRET}}'
        nodeSelector: {}
        tolerations: []
        volumes:
        - name: my-secret
          projected:
            sources:
            - serviceAccountToken:
                expirationSeconds: 3600
                path: service-account-key
            - secret:
                items:
                - key: sa
                  path: service-account-key
                name: service-account-key
    graph:
      endpoint:
        type: GRPC
      name: openai
      type: MODEL
    labels:
      app.kubernetes.io/component: ml-service-workflow
      app.kubernetes.io/instance: yourinstancehere-argo-common-workflows
      app.kubernetes.io/name: seldon
      app.kubernetes.io/part-of: fusion
      version: v1666118120
    name: openai
    replicas: 1
status:
  address:
    url: http://openai-openai.yourinstancehere.svc.cluster.local:8000/api/v1.0/predictions
  conditions:
  - lastTransitionTime: "2022-10-18T18:32:55Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: DeploymentsReady
  - lastTransitionTime: "2022-10-18T18:32:30Z"
    reason: No HPAs defined
    status: "True"
    type: HpasReady
  - lastTransitionTime: "2022-10-18T18:32:30Z"
    reason: No KEDA resources defined
    status: "True"
    type: KedaReady
  - lastTransitionTime: "2022-10-18T18:32:30Z"
    reason: No PDBs defined
    status: "True"
    type: PdbsReady
  - lastTransitionTime: "2022-10-18T18:36:45Z"
    status: "True"
    type: Ready
  - lastTransitionTime: "2022-10-18T18:36:45Z"
    reason: All services created
    status: "True"
    type: ServicesReady
  - lastTransitionTime: "2022-10-18T18:32:55Z"
    reason: No VirtualServices defined
    status: "True"
    type: istioVirtualServicesReady
  deploymentStatus:
    openai-openai-0-openai:
      availableReplicas: 1
      replicas: 1
  replicas: 1
  state: Available