Install Dotscience on GCP

Dotscience installation on Google Cloud Platform

Prerequisites

Clone repo

Start by cloning our open-source terraform repo:

git clone https://github.com/dotmesh-io/dotscience-tf
cd dotscience-tf

Choose the Google Cloud Platform version:

cd gcp

Set up Terraform

Init your terraform, this will ensure the gcloud plugin is installed:

terraform init

Ensure you are authenticated to GCP:

gcloud auth application-default login

(or use another supported authentication mechanism, like a GCP service account key file)

Set variables

Open inputs.tfvars in your favorite text editor, and put the following in it:

project                = "<your google project id>"
admin_password         = "<your choice of password>"
grafana_admin_password = "<your choice of password>"
license_key            = "<your license key from licensing.dotscience.com>"
hub_ingress_cidr       = "0.0.0.0/0"
ssh_access_cidr        = "0.0.0.0/0"
letsencrypt_mode       = "production"
hub_volume_size        = 100

Get your license key from our Licensing service.

Terraforming!

Now deploy the Dotscience stack:

terraform apply -var-file inputs.tfvars

It should print out the hostname you can access your stack on!

Wait a few minutes for the hub to set itself up. (To observe progress, tail -f /var/log/syslog on the VM, and look out for INFO startup-script log lines – the startup script is run via cloud-init).

Now log in and do some data science!

Runners - where training happens

Runners are where model training happens, such as within Jupyter notebooks or via ds run.

Managed runners are VMs which are be auto-provisioned when users create them through Dotscience. They are by default are an n1-standard-1. They will be destroyed automatically when idle to save money.

You can also attach non-managed runners, such as on-prem physical hardware, which can include GPUs. Simply go to menu (top-right) in the app and click Runners, and Add New Runner, and you’ll be given a docker run command to execute on your runner (e.g. DGX server).

Deployers - where inference and monitoring happens

Deployers are where models run.

You can also attach non-managed deployers, such as an on-prem Kubernetes cluster. Simply go to menu (top-right) in the app and click Deployers, and Add New Deployer, and you’ll be given a kubectl apply command to execute on your Kubernetes cluster.

Roadmap

We plan to improve the GCP Terraform stack in the following ways:

  • Pre-configure the stack with standard GCP instance types as runner profiles.
  • Support auto-provisioned GPU runners.
  • Automatically deploy a GKE cluster as a managed deployer and install the monitoring stack on it.
  • Document how to set up your own domain and hostname in DNS and having Let’s Encrypt work for it.

Need help?

Jump on our Slack or contact a sales rep.