Using pipelines

Pipelines help you automate steps in your ML/Data Science software delivery process, such as initiating model builds, uploading artifacts to S3, using various Dotscience plugins, and deploying to a staging or production environment.

Pipeline execution is triggered by a ds run command. On the roadmap we have source code repository triggered events, where webhooks from Github, Gitlab and other popular services will be able to trigger pipelines.

Pipelines are configured by placing a .dotscience.yml file in the root of either your workspace or in the root of your git repository (git repository my-ml-repo is cloned to my-ml-repo/ directory and our runner will check for .dotscience.yml at ./dotscience.yml and my-ml-repo/.dotscience.yml). The yaml syntax is designed to be easy to read and expressive so that anyone viewing the repository can understand the workflow.

Example pipeline configuration that:

  1. Uses ubuntu:latest image in a step and runs a script
  2. Triggers a CircleCI job at some other project (downstream build)
  3. Sends notification to Slack if the pipeline finishes successfully or encounters an error
kind: pipeline

- name: update-file           # Step name
  image: ubuntu:latest        # Image to use
  pull: always                # Pull policy, images get cached on the runners  
  - ./    # commands to run
  - ./ 
  environment:                # one or more environment variables
    TOKEN: my-token
- name: circleci              # a plugin to start CircleCI job builds 
  image: dotscience/dotscience-circleci-plugin:latest # Dotscience-specific plugin
  runPolicy: on-success
  settings:                   # plugin settings
    token: your-circle-ci-token  
    username: rusenask
    project: dotscience-pipeline-demo
- name: notify                # notification plugin
  image: dotscience/dotscience-slack-plugin:latest
  pull: always
  runPolicy: always

Pipeline steps can use any Docker images as long as the runner can pull them. You can view individual Dotscience plugin configurations in their repositories:

At the moment pipelines only have “after” configuration that runs after the main ds run command finishes.