In this section we'll cover Jupyter notebook-based development with Dotscience, which can be done entirely within the Jupyter environment provided within the platform.
This tutorial demonstrates how you can use our hosted JupyterLab environment for interacting with Dotscience. For more information on the different modes in which Dotscience can be used, see the reference section on Dotscience modes.
This tutorial features a basic notebook based development using JupyterLab hosted in Dotscience.
Using this approach you can:
- Track data engineering and data provenance by publishing data runs, with built in data versioning
- Track model development and model provenance by publishing model runs
- Track metrics and share them with your colleagues
- Publish a model to the model library
- Deploy a model from the model library to the Kubernetes cluster
- Set up a monitoring dashboard for a model that’s running in the cluster
Create a Dotscience account
Set up the project
Create a new project
All Jupyter notebooks in Dotscience are associated with a project. The first step is to create a new project for the tutorial.
Navigate to the Projects page -> Add New
Projects are assigned a randomly generated unique name and ID. The project name can be edited by clicking the ✎ next to the project name.
A hosted JupyterLab session can be launched by clicking on the JupyterLab icon. Dotscience automatically provides a managed runner for JupyterLab.
Notice the Dotscience tab within the JupyterLab panel, as we add new runs and work through the demo notebook below, you will notice that this shows important information on runs, commits and the metadata that went into them.
Add files to the project
In this demo we’re going to look at working through a JupyterLab notebook for a data science project that predicts roadsigns. All the source files for this demo can be found at https://github.com/dotmesh-io/dotscience-roadsigns. Clone the repo locally with the following command
git clone https://github.com/dotmesh-io/dotscience-roadsigns
Upload the files
get-data.ipynbfrom the cloned folder into JupyterLab using the ‘upload’ button within the JupyterLab file explorer.
After the upload, you will notice from the Dotscience tab that the three new files are now tracked.
Get input data
Open the file
get-data.ipynb and notice that we use the dotscience python library in interactive mode, download data, and publish the outputs of the job. More information about the dotscience python library can be found here
Run the notebook
get-data.ipynb either cell by cell or with run -> run all cells
Notice that the run downloads input data, and it is now tracked as a part of the project. The Dotscience tab on JupyterLab should show the new run.
Clicking on the ‘Explore’ button takes you to a provenance tracker, where you can click through the different files in the project and look at the run that was associated with it. This gives you a view of the provenance of each file.
The provenance graph shows that test.p, train.p and valid.p are outputs of a run on notebook get-data.ipynb.
Train a machine learning model
Next, run the notebook roadsigns.ipynb. You will notice the following
- The provenance of the entire run can be tracked. Clicking on Explore and going to the Run tab shows the latest run, clicking on the run details opens up a provenance graph the run, this displays the inputs, outputs and the run metadata.
- The metrics for the run can be tracked. This is useful for optimising the parameters of the run. More about the usage of the dotscience-python library to mark run metrics is available here
- Tensorflow models can be annotated with
ds.model()and they can be tracked and deployed from the Dotscience Hub. Navigate to the models tab to see a list of models generated and deploy them.
Notice the final step in the roadsigns.ipynb, which uses
ds.model() to annotate the model directory.
- Navigating to the models tab with Explore -> Models you will notice the model you generated with options to build and deploy.
Build the model
With an available runner, you can build a model from the UI. Clicking on Build on an entry in the models tab will start the model build process. Under the hood, the model directory is copied into a runner, and a docker image of the tensorflow model is built and uploaded to a registry. The logs for this task will be available to view when the task completes.
Clicking on the model name in the model tab, takes you to the provenance view of the model.
Deploy the model into production
When a model has been successfully built, it can be deployed to an internet facing endpoint within Dotscience and this used for testing and production use cases. Dotscience uses Kubernetes for its model deployments.
All deployed models appear in the Deployments tab, from which the models can be monitored on a Grafana dashboard.
Monitor the behaviour of the model in production
On the deployments tab, find the model you just deployed and open the monitoring dashboard for it by clicking ‘Monitor’. The monitoring dashboard for your model will track requests to your model and its behaviour based on real world data.
This is a prototype to demonstrate monitoring. For enterprise and other use cases, please contact us so we can enable monitoring at a user/project level. The credentials for prototype Grafana dashboard are:
Username: playground Password: password
Initially the monitoring dashboard will be empty, as there are no requests being sent to the model.
For the convenience of this tutorial, we have a demo app to send requests to the model at https://deploy-demo.dotscience.com/. You will need the model deployment URL, found under heading ‘Host’ for your model at https://cloud.dotscience.com/models/deployments. Copy the URL by clicking ‘Copy to clipboard’ (note: that the entire URL is not displayed).
Navigate to https://deploy-demo.dotscience.com/ and select the demo app (in this case, it’s Road Sign Predictor) and paste the model deployment URL into the app and send requests to it by clicking the roadsigns. Observe requests being sent to the model and its behaviour on the monitoring dashboard above.
More information about collaboration, deployment and monitoring can be found in further sections of the tutorial.