Legacy Product

Fusion 5.4

Develop and Deploy a Machine Learning Model

This article describes the high-level process to deploy models to Fusion 5.x.x releases using Seldon Core, and replaces the Source-to-Image (s2i) model deployment method. Seldon Core deploys your model as a Docker image in Kubernetes which you can scale up or down like other Fusion services.

The procedure detailed in this topic deploys an OpenAI pre-trained, Python-based example model. OpenAI uses embeddings. What are embeddings?

For information about how to wrap models in R, Java, JavaScript, or Go, see the Seldon Core documentation.

Install Seldon Core

Install the Seldon Core Python package using pip or another Python package manager, such as conda:

pip install seldon-core

There are no restrictions on other libraries or frameworks, as your environment is wrapped inside a Docker container for deployment.

Create an example model: semantic vector search with OpenAI

As an example of using Seldon Core with Fusion, we will create a simple embedding model using a REST API call to OpenAI’s API. However, there are no restrictions on what you use for your models; Keras, TensorFlow, JAX, scikit-learn, or any other Python libraries are supported.

Create inference class

Use Seldon Core to create an inference class wrapper around models for deployment into Fusion. This requires a class with at least two methods, __init__() and predict(), which are used by Seldon Core when deploying the model and serving predictions.

The method __init__() is called by Seldon Core when the model’s Docker container begins to start. This is where you should initialize your model and any other associated details you may need for inference. At this time, it is recommended that you include your model’s trained parameters directly into the Docker container rather than reaching out to external storage inside __init__.

The method predict() is executed whenever the model is called to give a prediction. It receives three parameters:

Parameter Description


A numpy array containing the input to the model.


An iterable set of column names.


An optional dictionary of metadata.

In Fusion, only the first two parameters are used. Due to the way that Fusion sends input to the Seldon Core model, you should zip the X and names parameters together and then assign your inputs from this resulting Dict by referencing the keys you placed in the modelInput HashMap in the Machine Learning Stage. We also recommend raising a ValueError if a required key is not found in the input, as this will help with debugging.

Here is the complete code for our sentiment analysis model’s wrapper class. Note that this inference class can be easily unit-tested with any Python testing framework and requires no Fusion-specific libraries.

import logging
import os
import sys

from typing import Any, List, Iterable
import numpy as np
import openai


log = logging.getLogger()

# NOTE!  Please add
# export DOCKER_DEFAULT_PLATFORM=linux/amd64
# to your ~/.zshrc
# Otherwise, it may be build for the architecture you're currently on

class OpenAIModel():

    def __init__(self):
        log.info("env: %s", str(os.environ))
        openai.api_key = os.getenv("OPENAI_API_KEY", "api key not set")

    def get_embedding(self, text, engine="text-similarity-ada-001"):

        # replace newlines, which can negatively affect performance.
        text = text.replace("\n", " ")

        return openai.Embedding.create(input=[text], engine=engine)[

    def predict(self, X: np.ndarray, names: Iterable[str]) -> List[Any]:
        log.info("in predict")

        model_input = dict(zip(names, X))
        log.info("input fields: %s", model_input)

        engine = model_input.get("engine", "text-similarity-ada-001")

        # Initialize embedding so we know when try failed
        embedding = [-1]
        text = model_input["text"]
        if len(text) > 2000:
            log.warn("Input text too long, truncating to 2000 characters")
            text = text[0:2000]


            embedding = self.get_embedding(text, engine=engine)

        except Exception as e:  # work on python 3.x
            log.info("Failed calling API: %", str(e))

        return [embedding]

Create model image

Now that we have a class for our model’s inference, the next step is to create a Docker image to make it ready for deployment. We recommend packaging a Python model to manually create an image for the model.

Build image

DOCKER_DEFAULT_PLATFORM=linux/amd64 docker build . -t [DOCKERHUB USERNAME]/fusion-seldon-openai:latest

Alternatively, DOCKER_DEFAULT_PLATFORM may be exported from .zshrc.

Push image

You can deploy your model from either a private registry or Docker Hub. Here is how we push to Docker Hub:

docker push [DOCKERHUB USERNAME]/fusion-seldon-openai:latest
Replace the Dockerhub repo, version, and other relevant fields as needed. If using a private Dockerhub repo, you must obtain the secret and put it into Seldon Deployment Job.

Deploy to Fusion

Now that your model is tested and Dockerized, you are ready to deploy it within Fusion.

  1. In the Fusion UI, navigate to Collections > Jobs.

  2. Select Add > Create Seldon Core Model Deployment.

    create seldon deployment job0

    The job configuration panel opens.

    create seldon deployment job1

  3. Configure the following parameters:

    Parameter Description

    Job ID

    A string used by the Fusion API to reference the job after its creation.

    Model name

    A name for the deployed model. This is used to generate the deployment name in Seldon Core. It is also the name that you reference as a model-id when making predictions with the ML Service.

    Model replicas

    The number of load-balanced replicas of the model to deploy.

    Docker Repository

    The public or private repository where the Docker image is located. If you’re using Docker Hub, fill in the Docker Hub username here.

    Image name

    The name of the image with an optional tag. If no tag is given, latest is used.

    Kubernetes secret

    If you’re using a private repository, supply the name of the Kubernetes secret used for access.

    Output columns

    A list of column names that the model’s predict method returns.

  4. Click Run > Start to run the model deployment job.

Once the job reports success, you can reference your model name in the Machine Learning index pipeline stage.

After deployment

  1. After deploying your model, modify openai_sdep.yaml as follows:

    kubectl get sdep openai -o yaml > openai_sdep.yaml
    # Modify openai_sdep.yaml to add
            - env:
              - name: OPENAI_API_KEY
                value: "your-openai-api-key-here"
    kubectl apply -f openai_sdep.yaml
  2. Delete the modified sdep before redeploying the model:

    kubectl delete sdep openai
  3. Lastly, you can encode into Milvus.



Copy and paste the following into a file called requirements.txt:



FROM python:3.7-slim

# Install python packages
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

# Copy source code
COPY . .

# Port for GRPC
# Port for REST

# Define environment variables

# Changing folder to default user
RUN chown -R 8888 /app

CMD exec seldon-core-microservice $MODEL_NAME --service-type $SERVICE_TYPE


/* globals Java, logger*/
(function () {
    "use strict";
    var isDebug = false // turn off or on debug statements for this file's code
    //function logIfDebug(m){if(isDebug && m)logger.info(m, Array.prototype.slice.call(arguments).slice(1));}
    return function main(request,response , ctx, collection, solrServer, solrServerFactory) {
      var vector = ctx.getOrDefault("Test_Vector",[]);
      //&q={!knn f=vector_v topK=10}[1.0, 2.0, 3.0, 4.0...]
      var q = "{!knn f=vector_v topK=10}" + JSON.stringify(vector);
      request.putSingleParam("q",q)   ;


apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
    kubectl.kubernetes.io/last-applied-configuration: |
  creationTimestamp: "2022-10-18T18:32:29Z"
  generation: 2
  name: openai
  namespace: yournamespace
  resourceVersion: "1485955230"
  uid: 8d79389d-be76-4a4d-89db-3233d2f12b72
    seldon.io/headless-svc: "true"
  name: openai
  - componentSpecs:
    - spec:
        - env:
          - name: OPENAI_API_KEY
            value: "your-openai-api-key-here"
          image: yourimage/fusion-seldon-openai:0.0.9
          imagePullPolicy: IfNotPresent
          name: openai
          resources: {}
          - mountPath: /etc/secrets
            name: my-secret
            readOnly: true
        - name: '{{MODEL_DOCKER_SECRET}}'
        nodeSelector: {}
        tolerations: []
        - name: my-secret
            - serviceAccountToken:
                expirationSeconds: 3600
                path: service-account-key
            - secret:
                - key: sa
                  path: service-account-key
                name: service-account-key
        type: GRPC
      name: openai
      type: MODEL
      app.kubernetes.io/component: ml-service-workflow
      app.kubernetes.io/instance: yourinstancehere-argo-common-workflows
      app.kubernetes.io/name: seldon
      app.kubernetes.io/part-of: fusion
      version: v1666118120
    name: openai
    replicas: 1
    url: http://openai-openai.yourinstancehere.svc.cluster.local:8000/api/v1.0/predictions
  - lastTransitionTime: "2022-10-18T18:32:55Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: DeploymentsReady
  - lastTransitionTime: "2022-10-18T18:32:30Z"
    reason: No HPAs defined
    status: "True"
    type: HpasReady
  - lastTransitionTime: "2022-10-18T18:32:30Z"
    reason: No KEDA resources defined
    status: "True"
    type: KedaReady
  - lastTransitionTime: "2022-10-18T18:32:30Z"
    reason: No PDBs defined
    status: "True"
    type: PdbsReady
  - lastTransitionTime: "2022-10-18T18:36:45Z"
    status: "True"
    type: Ready
  - lastTransitionTime: "2022-10-18T18:36:45Z"
    reason: All services created
    status: "True"
    type: ServicesReady
  - lastTransitionTime: "2022-10-18T18:32:55Z"
    reason: No VirtualServices defined
    status: "True"
    type: istioVirtualServicesReady
      availableReplicas: 1
      replicas: 1
  replicas: 1
  state: Available