Customize scikit-learn model prediction logic

Learn how to customize prediction logic in your models

If you’re publishing Scikit-Learn models with the dotscience-python library, by default the published model will read and write JSON in a specific format. If you want to customize that logic, you can include additional files to customize how the model runs.

Note: This feature is only supported on latest Dotscience runners. Please make sure you have a relatively recent install of Dotscience. For installs using Dotscience terraform you would require v.0.10.0 and above.

Custom prediction endpoints

The default prediction endpoint takes in a JSON object with a key "instances" that should map to an array of numbers. It returns a JSON object with a key "predictions" and values that are the result of calling predict_proba() on the Scikit-Learn model you published.

You can customize this logic in two steps.

First, create a new file called custom_predict.py with a function predict(model, query). It will be called with two arguments, the model and the JSON query sent to the prediction endpoint, and it must return a Python object that can be serialized to JSON.

For example, if you just want to take a list and return a list, you could write a custom_predict.py that looks like this:

def predict(model, query):
    inputs = np.array(query)
    return model.predict_proba(inputs).tolist()

Put that file in the same directory as the pickled model you’re storing.

Second, in your code that uses the dotscience-python library to generate and publish the model. This feature requires the following - Sklearn models are place in a directory model.joblib - The file custom_predict.py and, if needed, runtime-requirements.txt are copied to the model directory above.

An example notebook and files are available to download here

import dotscience as ds
import os
import shutil
import sklearn
from sklearn import svm
from sklearn import datasets
from joblib import dump
import json

clf = svm.SVC(gamma='scale', probability=True)
iris = datasets.load_iris()
X, y = iris.data, iris.target
clf.fit(X, y)

ds.start()

MODEL_DIR = "./model.joblib"

print('export_path = {}\n'.format(MODEL_DIR))

if os.path.isdir(MODEL_DIR):
  print('\nAlready saved a model, cleaning up\n')
  shutil.rmtree(MODEL_DIR)

os.mkdir(MODEL_DIR)

dump(clf, os.path.join(MODEL_DIR, 'model.joblib'))


with open(os.path.join(MODEL_DIR, 'classes.json'), "w") as f:
    f.write(json.dumps({
        "0": "Iris Setosa",
        "1": "Iris Versicolour",
        "2": "Iris Virginica",
    }))

shutil.copyfile("custom_predict.py", os.path.join(MODEL_DIR, "custom_predict.py"))
shutil.copyfile("runtime-requirements.txt", os.path.join(MODEL_DIR, "runtime-requirements.txt"))

ds.model(sklearn, "irisCustom", ds.output(MODEL_DIR), classes="./model.joblib/classes.json")

ds.publish("trained iris model", deploy=True)

Custom Python libraries

If you want to install additional libraries that can be used by your custom prediction logic, you can do so in two steps.

First, add a runtime-requirements.txt listing your additional Python requirements. Ideally they should be pinned, e.g. sklearn-pandas==1.1.0. Put this file in the same directory as your put your pickled model and custom_predict.py.