Deploying scikit-learn models

In this section we'll demonstrate how you can build and deploy scikit-learn models with Dotscience.

You can either deploy models to production by using ds.connect() and ds.deploy() in a python script (Dotscience Anywhere) or by using our hosted JupyterLab notebook on https://cloud.dotscience.com/. This tutorial runs through both options.

Deploying a scikit-learn model using ds.connect() and ds.deploy() in Dotscience Anywhere mode.

Open your preferred Python code editor and create a file called train-sklearn.py with the following

import dotscience as ds
import os
import shutil
import sklearn
from sklearn import svm
from sklearn import datasets
from pickle import dump
import json

ds.connect(
    os.getenv("DOTSCIENCE_USERNAME"),
    os.getenv("DOTSCIENCE_APIKEY"),
    os.getenv("DOTSCIENCE_PROJECT_NAME"),
    os.getenv("DOTSCIENCE_URL")
)

clf = svm.SVC(gamma='scale', probability=True)
iris = datasets.load_iris()
X, y = iris.data, iris.target
clf.fit(X, y)

dump(clf, open("model.joblib", "wb"))
with open("classes.json", "w") as f:
    f.write(json.dumps({"0": "Setosa",
                        "1": "Versicolour",
                        "2": "Virginica"}))

ds.model(sklearn, "iris", ds.output("model.joblib"), classes=ds.output("classes.json"))

ds.publish("trained iris model", deploy=True)

Set up your environment by exporting the following variables. You will need your Dotscience account credentials. Click here to create a free Dotscience account, or if you already have an account, you can sign into it here. You can find your API key under Account > Keys

export DOTSCIENCE_USERNAME="bob"
export DOTSCIENCE_APIKEY="XXX"
export DOTSCIENCE_PROJECT_NAME="scikitlearn"

Install the dependencies - scikit-learn and dotscience python libraries.

pip3 install dotscience sklearn joblib

Run the script

python3 train-sklearn.py
Checking connection... connected!                                                                                                      
You have not called ds.start() yet, so I'm doing it for you!                                                                           
=== Dotscience remote publish ===                                                                                                      
Created new project sklearn-070101 as it did not exist.                                                                                
*  Uploading output/model files.                                                                                                       
 done                                                                                                                                  
   -> Dotscience run: https://cloud.dotscience.com/project/55e978ed-ac1b-460a-842c-c9a5645edc60/runs/metric/90bec15c-999c-4670-966b-828

*  Building docker image................ done                                                                                          
   -> Docker image: quay.io/dotscience-playground/models:795b2da1-859f-46fb-8502-a45e23ff3038-59c2103e-be92-4387-a012-bc7de04503e2     
                                                                                                                                       
*  Deploying to Kubernetes................ done                                                                                                                                   
   -> Endpoint: https://007a6c42.app.cloud.dotscience.net/v1/models/model:predict

*  Creating Grafana dashboard... done
   -> Dashboard: https://playground-grafana.dotscience.com/d/f69c0730-69fa-4f04-be91-c21f4c711292/monitoring-model-sklearn070101

Waiting for model endpoint to become active............................................................
Seems to be taking a long time, waiting one more minute
....................... done
=== Dotscience publish complete ===

This creates a simple scikit-learn model as a joblib file, tracks the file in Dotscience and deploys the model. Under the hood, Dotscience creates a dockerised service and deploys this model into a Kubernetes cluster. You can make requests to this model by using the following curl command. Replace the model URL with your specific model’s URL from your output.

    curl --request POST \
    --url https://your-model-id.app.cloud.dotscience.net/v1/models/model:predict  \
    --header 'content-type: application/json' \
    --data '{
    "instances": [
        [6.8,  2.8,  4.8,  1.4],
        [6.0,  3.4,  4.5,  1.6]
    ]
    }'

Dotscience Models can be monitored with a Grafana dashboard that is automatically created with each deployment. Once you’ve sent the query, visit the prototype Grafana URL linked above to see an example of monitoring. The credentials for prototype Grafana dashboard are:

Username: playground
Password: password

If you’re interested in monitoring sklearn models please get in touch with the Dotscience team on slack.

Deploying a scikit-learn model using notebook based development

Create a Dotscience account

Click here to create a free Dotscience account, or if you already have an account, you can sign into it here.

Set up the project

  1. Follow the tutorial on notebook based development with Dotscience to create a project, and launch JupyterLab.

  2. Add files to the project

    In this demo we’re going to look at working through a JupyterLab notebook for a data science project that builds a scikit-learn model. All the source files for this demo can be found here. Clone the repo locally with the following command on your local workstation

    git clone https://github.com/dotmesh-io/demos.git

    You will notice a notebook named sklearn-models.ipynb in the folder demos > sklearn-models. Navigate back to Dotscience on a browser and upload the notebook into JupyterLab using the ‘upload’ button within the JupyterLab file explorer.

    Run all cells in the notebook.

    Navigating to the models tab you will the newly registered sklearn model called ‘iris’. Clicking through build and deploy will create a service layer automatically and deploy the model into a production environment. More information on build and deploy can be found on the tutorial on building models with notebook based development

    You will find the model URL on the deployments page. Click to copy the model URL.

    Each sklearn model has two endpoints, by default

    GET https://your-model-id.app.cloud.dotscience.net/v1/healthcheck

    POST https://your-model-id.app.cloud.dotscience.net/v1/models/model:predict which takes a JSON body.

    You can make requests to this model by using the following curl command. Replace the model URL with your specific models URL.

    curl --request POST \
    --url https://your-model-id.app.cloud.dotscience.net/v1/models/model:predict  \
    --header 'content-type: application/json' \
    --data '{
    "instances": [
        [6.8,  2.8,  4.8,  1.4],
        [6.0,  3.4,  4.5,  1.6]
    ]
    }'