Dotscience pipelines

Pipelines allow you to execute arbitrary shell commands after a dotscience run.

Introducing .dotscience.yml for configuration

Pipelines help you automate steps in your ML/Data Science software delivery process, such as initiating model builds, uploading artifacts to S3, using various Dotscience plugins, and deploying to a staging or production environment. These are configured by placing a .dotscience.yml file in the root of either your workspace or in the root of your git repository. For example, a git repository my-ml-repo is cloned to my-ml-repo/ directory and our runner will check for .dotscience.yml at ./dotscience.yml and my-ml-repo/.dotscience.yml).

The yaml syntax is designed to be easy to read and expressive so that anyone viewing the repository can understand the workflow. Pipeline execution is triggered by a ds run command.

The steps are configured in YAML and follow the generic format below.

kind: pipeline

after:
- name: update-file           # Step name
  image: ubuntu:latest        # Image to use
  pull: always                # Pull policy, images get cached on the runners  
  runPolicy: always           # Indicates when to run this step, can be set to 'on_success', 'on_failure' or 'always'
  commands:               
  - ./prepare-artifacts.sh    # commands to run
  - echo "test"              
  environment:                # one or more environment variables
    TOKEN: my-token

Example workflows:

Example pipeline configuration that:

  1. Uses ubuntu:latest image in a step and runs a script
  2. Triggers a CircleCI job at some other project (downstream build)
  3. Sends notification to Slack if the pipeline finishes successfully or encounters an error
kind: pipeline

after:
- name: update-file           # Step name
  image: ubuntu:latest        # Image to use
  pull: always                # Pull policy, images get cached on the runners  
  commands:               
  - ./prepare-artifacts.sh    # commands to run
  - ./upload-artifacts.sh 
  environment:                # one or more environment variables
    TOKEN: my-token
- name: circleci              # a plugin to start CircleCI job builds 
  image: dotscience/dotscience-circleci-plugin:latest # Dotscience-specific plugin
  runPolicy: on-success
  settings:                   # plugin settings
    token: your-circle-ci-token  
    username: rusenask
    project: dotscience-pipeline-demo
- name: notify                # notification plugin
  image: dotscience/dotscience-slack-plugin:latest
  pull: always
  runPolicy: always
  settings:
    slackUrl: https://hooks.slack.com/services/xx/xx/xx

Pipeline steps can use any Docker images as long as the runner can pull them. You can view individual Dotscience plugin configurations in their repositories:

At the moment pipelines only have “after” configuration that runs after the main ds run command finishes.

Feature improvements on our roadmap:

  • Source code repository triggered events, where webhooks from Github, Gitlab and other popular services will be able to trigger pipelines.
  • Support for the storing generic secrets in Dotscience projects, that can be referenced in the pipelines and in ds run scripts.