Skip to content

Tutorial: Create an MLCube

Interested in getting started with MLCube? Follow the instructions in this tutorial.

Step 1: Setup

Get MLCube, MLCube examples and MLCube Templates, and CREATE a Python environment.

# You can clone the mlcube examples and templates from GtiHub
git clone https://github.com/mlcommons/mlcube_examples

# Create a python environment
virtualenv -p python3 ./env && source ./env/bin/activate

# Install mlcube, mlcube-docker and cookiecutter 
pip install mlcube mlcube-docker cookiecutter 

Step 2: Configure MLCube using the mlcube_cookiecutter

Let's use the 'matmult' example, that we downloaded in the previous step, to illustrate how to make an MLCube. Matmul is a simple matrix multiply example written in Python with TensorFlow. When you create an MLCube for your own model you will use your own code, data and dockerfile.

cd mlcube_examples
# rename matmul reference implementaion from matmul to matmul_reference

mv ./matmul ./matmul_reference

# create a mlcube directory using mlcube template(note: do not use quotes in your input to cookiecutter): name = matmul, author = MLPerf Best Practices Working Group  
cookiecutter https://github.com/mlcommons/mlcube_cookiecutter.git

# copy the matmul.py,Dockerfile and requirements.txt to your mlcube_matmul/build directory
cp -R  matmul_reference/build  matmul

# copy input file for matmul to workspace directory
cp -R  matmul_reference/workspace  matmul

Edit the template files

Start by looking at the mlcube.yaml file that has been generated by cookiecutter.

cd ./matmul

Cookiecutter has modified the lines shown in bold in the mlcube.yaml file shown here:

 
# This YAML file marks a directory to be an MLCube directory. When running MLCubes with runners, MLCube path is
# specified using `--mlcube` runner command line argument.
# The most important parameters that are defined here are (1) name, (2) author and (3) list of MLCube tasks.
schema_version: 1.0.0
schema_type: mlcube_root

# MLCube name (string). Replace it with your MLCube name (e.g. "matmul" as shown here).
name: matmul
# MLCube author (string). Replace it with your MLBox name (e.g. "MLPerf Best Practices Working Group").
author: MLPerf Best Practices Working Group

version: 0.1.0
mlcube_spec_version: 0.1.0

# List of MLCube tasks supported by this MLBox (list of strings). Every task:
#    - Has a unique name (e.g. "download").
#    - Is defined in a YAML file in the `tasks` sub-folder (e.g. "tasks/download.yaml").
#    - Task name is passed to an MLBox implementation file as the first argument (e.g. "python mnist.py download ...").
# Every task is described by lists of input and output parameters. Every parameter is a file system path (directory or
# file) characterized by two fields - name and value.
# By default, if a file system path is a relative path (i.e. does not start with `/`), it is considered to be relative
# to the `workspace` sub-folder.
# Once all tasks are listed below, create a YAML file for each task in the 'tasks' sub-folder and change them
# appropriately.
# NEXT: study `tasks/task_name.yaml`, note: in the case of matmul we only need one task.
tasks:
  - tasks/matmul.yaml

Now we will look at file ./matmul/tasks/matmul.yaml.

cd ./tasks
Cookiecutter has modified the lines shown in bold in the matmul.yaml file shown here:

 
# This YAML file defines the task that this MLCube supports. A task is a piece of functionality that MLCube can run. Task
# examples are `download data`, `pre-process data`, `train a model`, `test a model` etc. MLCube runtime invokes MLCube
# entry point and provides (1) task name as the first argument, (2) task input/output parameters (--name=value) in no
# particular order. Inputs, outputs or both can be empty lists. For instance, when MLCube runtime runs an MLCube task:
#            python my_mlcube_entry_script.py download --data_dir=DATA_DIR_PATH --log_dir=LOG_DIR_PATH
#    - `download` is the task name.
#    - `data_dir` is the output parameter with value equal to DATA_DIR_PATH.
#    - `log_dir` is the output parameter with value equal to LOG_DIR_PATH.
# This file only defines parameters, and does not provide parameter values. This is internal MLCube file and is not
# exposed to users via command line interface.
schema_version: 1.0.0
schema_type: mlcube_task

# List of input parameters (list of dictionaries).
inputs:
    - name: parameters_file
      type: file 

# List of output parameters (list of dictionaries). Every parameter is a dictionary with two mandatory fields - `name`
# and `type`. The `name` must have value that can be used as a command line parameter name (--data_dir, --log_dir). The
# `type` is a categorical parameter that can be either `directory` or `file`. Every intput/output parameter is always
# a file system path.
# Only parameters with their types are defined in this file. Run configurations defined in the `run` sub-folder
# associate parameter names and their values. There can be multiple run configurations for one task. One example is
# 1-GPU and 8-GPU training configuration for some `train` task.
# NEXT: study `run/task_name.yaml`.
outputs:
    - name: output_file 
      type: file 

Our input file shapes.yaml that we have copied previously into the mlcube workspace contains input parameters to set matrix dimensions. We need to remove the automatically generated parameters file.

rm ../workspace/parameters_file.yaml

Now we will edit file ./matmul/run/matmul.yaml.

cd ../run

The lines you need to edit are shown in bold in the matmul.yaml file shown here:


# A run configuration assigns values to task parameters. Since there can be multiple run configurations for one
# task (i.e., 1-GPU and 8-GPU training), run configuration files do not necessarily have to have the same name as their
# tasks. Three sections need to be updated in this file - `task_name`, `input_binding` and `output_binding`.
# Users use task configuration files to ask MLCube runtime run specific task using `--task` command line argument.
schema_type: mlcube_invoke
schema_version: 1.0.0

# Name of a task.
# task_name: task_name
task_name: matmul

# Dictionary of input bindings (dictionary mapping strings to strings). Parameters must correspond to those in task
# file (`inputs` section). If not parameters are provided, the binding section must be an empty dictionary.
input_binding:
        parameters_file: $WORKSPACE/shapes.yaml 

# Dictionary of output bindings (dictionary mapping strings to strings). Parameters must correspond to those in task
# file (`outputs` section). Every parameter is a file system path (directory or a file name). Paths can be absolute
# (starting with `/`) or relative. Relative paths are assumed to be relative to MLCube `workspace` directory.
# Alternatively, a special variable `$WORKSPACE` can be used to explicitly refer to the MLCube `workspace` directory.
# MLCube root directory (`--mlcube`) and run configuration file (`--task`) define MLCube task to run. One step left is
# to specify where MLCube runs - on a local machine, remote machine in the cloud etc. This is done by providing platform
# configuration files located in the MLCube `platforms` sub-folder.
# NEXT: study `platforms/docker.yaml`.
output_binding:
        output_file: $WORKSPACE/matmul_output.txt 

Now we will edit file ./matmul/platforms/docker.yaml

cd ../platforms
Edit the docker image name in docker.yaml. Change "image: "mlcube/matmul:0.0.1" to "mlcommons/matmul:v1.0"

 
# Platform configuration files define where and how runners run MLCubes. This configuration file defines a Docker
# runtime for MLCubes. One field need to be updated here - `container.image`. This platform file defines local docker
# execution environment.
# MLCube Docker runner uses image name to either `pull` or `build` a docker image. The rule is the following:
#   - If the following file exists (`build/Dockerfile`), Docker image will be built.
#   - Else, docker runner will pull a docker image with the specified name.
# Users provide platform files using `--platform` command line argument.
schema_type: mlcube_platform
schema_version: 0.1.0

platform:
  name: "docker"
  version: ">=18.01"
container:   
   image: "mlcommons/matmul:v1.0" 

Step 3. Create a Dockerfile for your model container image

You will need a docker image to create an MLCube. We will use the Dockerfile for 'matmul' to create a docker container image:
Note: the last line of the Dockerfile must be
"ENTRYPOINT ["python3", "/workspace/your_mlcube_name.py"]" as shown below.

Now we will edit the my_mlcube/build/Dockerfile

cd ../build 

 
# Sample Dockerfile for matmul (Matrix Multiply)
FROM ubuntu:18.04
MAINTAINER MLPerf MLBox Working Group

WORKDIR /workspace

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
            software-properties-common \
            python3-dev \
            curl && \
    rm -rf /var/lib/apt/lists/*

RUN curl -fSsL -O https://bootstrap.pypa.io/get-pip.py && \
    python3 get-pip.py && \
    rm get-pip.py

COPY requirements.txt /requirements.txt
RUN pip3 install --no-cache-dir -r /requirements.txt

COPY matmul.py /workspace/matmul.py

ENTRYPOINT ["python3", "/workspace/matmul.py"]

Step 4: Build Docker container Image

cd ..
mlcube_docker configure --mlcube=. --platform=platforms/docker.yaml

Step 5: Test your MLCube

mlcube_docker run --mlcube=. --platform=platforms/docker.yaml --task=run/matmul.yaml
ls ./workspace
cat ./workspace/matmul_output.txt