Keepsake Version control for machine learning

This guide will help you learn how Keepsake works by building a simple model.

If you prefer working in notebooks, follow our notebook tutorial on Colab.

We're going to make a model that classifies Iris plants, trained on the Iris dataset. It's an intentionally simple model that trains really fast, just so we can show you how Keepsake works.

Install dependencies

First, let's make a directory to work in:

mkdir iris-classifier
cd iris-classifier

Keepsake is a Python package, and we need a few other Python packages to make the model run.

Create requirements.txt to define the Python packages to install:

keepsake==0.4.2
scikit-learn==0.23.1
torch==1.4.0

Then, install them:

pip install -r requirements.txt

You might want to use a Virtualenv or Conda so these packages don't collide with others on your computer.

Write a model

Copy and paste this code into train.py:

import argparse
import keepsake
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
import torch
from torch import nn
from torch.autograd import Variable
def train(learning_rate, num_epochs):
    # Create an "experiment". This represents a run of your training script.
    # It saves the training code at the given path and any hyperparameters.
    experiment = keepsake.init(
        path=".",
        params={"learning_rate": learning_rate, "num_epochs": num_epochs},
    )
    print("Downloading data set...")
    iris = load_iris()
    train_features, val_features, train_labels, val_labels = train_test_split(
        iris.data,
        iris.target,
        train_size=0.8,
        test_size=0.2,
        random_state=0,
        stratify=iris.target,
    )
    train_features = torch.FloatTensor(train_features)
    val_features = torch.FloatTensor(val_features)
    train_labels = torch.LongTensor(train_labels)
    val_labels = torch.LongTensor(val_labels)
    torch.manual_seed(0)
    model = nn.Sequential(nn.Linear(4, 15), nn.ReLU(), nn.Linear(15, 3),)
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
    criterion = nn.CrossEntropyLoss()
    for epoch in range(num_epochs):
        model.train()
        optimizer.zero_grad()
        outputs = model(train_features)
        loss = criterion(outputs, train_labels)
        loss.backward()
        optimizer.step()
        with torch.no_grad():
            model.eval()
            output = model(val_features)
            acc = (output.argmax(1) == val_labels).float().sum() / len(val_labels)
        print(
            "Epoch {}, train loss: {:.3f}, validation accuracy: {:.3f}".format(
                epoch, loss.item(), acc
            )
        )
        torch.save(model, "model.pth")
        # Create a checkpoint within the experiment.
        # This saves the metrics at that point, and makes a copy of the file
        # or directory given, which could weights and any other artifacts.
        experiment.checkpoint(
            path="model.pth",
            step=epoch,
            metrics={"loss": loss.item(), "accuracy": acc},
            primary_metric=("loss", "minimize"),
        )
if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--learning_rate", type=float, default=0.01)
    parser.add_argument("--num_epochs", type=int, default=100)
    args = parser.parse_args()
    train(args.learning_rate, args.num_epochs)

Notice there are two highlighted lines that call Keepsake. They don't affect the behavior of the training – they just save data in Keepsake to keep track of what is going on.

The first is keepsake.init(). This creates an experiment, which represents a run of your training script. The experiment records the hyperparameters you pass to it and makes a copy of the given path to save your training code.

The second is experiment.checkpoint(). This creates a checkpoint within the experiment. The checkpoint saves the metrics at that point, and makes a copy of the file or directory you pass to it, which could include weights and any other artifacts.

Each experiment contains multiple checkpoints. You typically save your model periodically during training, because the best result isn't necessarily the most recent one. A checkpoint is created just after you save your model, so Keepsake can keep track of versions of your saved model.

Define a repository

We need to tell Keepsake where to store your experiments. Create keepsake.yaml with this content:

repository: "file://.keepsake"

This will store your experiments in the .keepsake directory relative to this file. You can also store them on Amazon S3 or Google Cloud Storage if you want to store data in the cloud. Learn more about this in our guide to usage cloud storage.

Train the model

We're now going to train this model a couple of times with different parameters to see what we can do with Keepsake.

First, train it with default parameters:

$ python train.py
Epoch 0, train loss: 1.184, validation accuracy: 0.333
Epoch 1, train loss: 1.117, validation accuracy: 0.333
Epoch 2, train loss: 1.061, validation accuracy: 0.467
...
Epoch 97, train loss: 0.121, validation accuracy: 1.000
Epoch 98, train loss: 0.119, validation accuracy: 1.000
Epoch 99, train loss: 0.118, validation accuracy: 1.000

Next, run the training with a different learning rate:

$ python train.py --learning_rate=0.2
Epoch 0, train loss: 1.184, validation accuracy: 0.333
Epoch 1, train loss: 1.161, validation accuracy: 0.633
Epoch 2, train loss: 1.124, validation accuracy: 0.667
...
Epoch 97, train loss: 0.057, validation accuracy: 0.967
Epoch 98, train loss: 0.057, validation accuracy: 0.967
Epoch 99, train loss: 0.056, validation accuracy: 0.967

Experiments and checkpoints

The calls to the keepsake Python library have saved your experiments locally. You can use keepsake ls to list them:

$ keepsake ls
EXPERIMENT  STARTED         STATUS   PARAMS              BEST CHECKPOINT    LATEST CHECKPOINT
b90ad56     12 seconds ago  stopped  learning_rate=0.01  4941495 (step 99)  4941495 (step 99)
                                                         loss=0.1176        loss=0.1176
9cce006     3 seconds ago   stopped  learning_rate=0.2   a122e85 (step 99)  a122e85 (step 99)
                                                         loss=0.056486      loss=0.056486

The --filter flag allows you to narrow in on a subset of experiments:

$ keepsake ls --filter "learning_rate = 0.2"
EXPERIMENT  STARTED         STATUS   PARAMS              BEST CHECKPOINT    LATEST CHECKPOINT
9cce006     3 seconds ago   stopped  learning_rate=0.2   a122e85 (step 99)  a122e85 (step 99)
                                                         loss=0.056486      loss=0.056486

As a reminder, this is a list of experiments which represents runs of the train.py script. They store a copy of the code as it was when the script was started.

Within experiments are checkpoints, which are created every time you call experiment.checkpoint() in your training script. The checkpoint contains your weights, Tensorflow logs, and any other artifacts you want to save.

To list the checkpoints within an experiment, you can use keepsake show. Run this, replacing b90ad56 with an experiment ID from your output of keepsake ls:

$ keepsake show b90ad56
Experiment: b90ad56a755371548ae2ab98c9d40a85911fd6198254880e600cdf00f55a18ca
Created:        Wed, 02 Sep 2020 20:44:51 PDT
Status:         stopped
Host:           107.133.144.125
User:           ben
Command:        train.py
Params
learning_rate:  0.01
num_epochs:     100
Checkpoints
ID       STEP  CREATED        ACCURACY  LOSS
9ed04a2  0     5 minutes ago  0.33333   1.1836
b37e01d  1     5 minutes ago  0.33333   1.1173
c74e9c6  2     5 minutes ago  0.46667   1.0611
7ba5b47  3     5 minutes ago  0.63333   1.0138
886f612  4     5 minutes ago  0.7       0.97689
667fdba  5     5 minutes ago  0.9       0.9496
...
cd1223c  95    5 minutes ago  1         0.12417
510eb98  96    5 minutes ago  1         0.12244
59129de  97    5 minutes ago  1         0.12076
e301a55  98    5 minutes ago  1         0.11915
4941495  99    5 minutes ago  1         0.1176 (best)

You can also use keepsake show on a checkpoint to get all the information about it. Run this, replacing 494 with a checkpoint ID from the experiment:

$ keepsake show 494
Checkpoint: 49414952394edfdf7923edd6bfb4aabe5558a6276a02a71a5965e1622ee7b9fd
Created:            Wed, 02 Sep 2020 20:44:52 PDT
Path:               model.pth
Step:               99
Experiment
ID:                 b90ad56a755371548ae2ab98c9d40a85911fd6198254880e600cdf00f55a18ca
Created:            Wed, 02 Sep 2020 20:44:51 PDT
Status:             stopped
Host:               107.133.144.125
User:               ben
Command:            train.py
Params
learning_rate:      0.01
num_epochs:         100
Metrics
accuracy:           1
loss:               0.11759971082210541 (primary, minimize)

Notice you can pass a prefix to keepsake show, and it'll automatically find the experiment that starts with just those characters. Saves a few keystrokes.

Compare checkpoints

Let's compare the last checkpoints from the two experiments we ran. Run this, replacing 4941495 and a122e85 with the two checkpoint IDs from the LATEST CHECKPOINT column in keepsake ls:

$ keepsake diff 4941495 a122e85
Experiment
ID:                       b90ad56                        9cce006
Command:                  train.py                       train.py --learning_rate=0.2
Created:                  Wed, 02 Sep 2020 20:44:51 PDT  Wed, 02 Sep 2020 20:45:01 PDT
Params
learning_rate:            0.01                           0.2
Checkpoint
ID:                       4941495                        a122e85
Created:                  Wed, 02 Sep 2020 20:44:55 PDT  Wed, 02 Sep 2020 20:45:04 PDT
Metrics
accuracy:                 1                              0.9666666388511658
loss:                     0.11759971082210541            0.056485891342163086

keepsake diff works a bit like git diff, except in addition to the code, it compares all of the metadata that Keepsake is aware of: params, metrics, dependencies, and so on.

keepsake diff compares checkpoints, because that is the thing that actually has all the results.

You can also pass an experiment ID, and it will pick the best or latest checkpoint from that experiment.

Check out a checkpoint

At some point you might want to get back to some point in the past. Maybe you've run a bunch of experiments in parallel, and you want to choose one that works best. Or, perhaps you've gone down a line of exploration and it's not working, so you want to get back to where you were a week ago.

The keepsake checkout command will copy the code and weights from a checkpoint into your working directory. Run this, replacing 4941495 with a checkpoint ID you passed to keepsake diff:

$ keepsake checkout 4941495
═══╡ The directory "/Users/ben/p/tmp/iris-classifier" is not empty.
═══╡ This checkout may overwrite existing files. Make sure you've committed everything to Git so it's safe!
Do you want to continue? (y/N) y
═══╡ Checked out 4941495 to "/Users/ben/p/tmp/iris-classifier"

The model file in your working directory is now the model saved in that checkpoint:

$ ls -lh model.pth
-rw-r--r--  1 ben  staff   8.3K Aug  7 16:42 model.pth

This is useful for getting the trained model out of a checkpoint, but it also copies all of the code from the experiment that checkpoint is part of. If you made a change to the code and didn't commit to Git, keepsake checkout will allow you get back the exact code from an experiment.

This means you don't have to remember to commit to Git when you're running experiments. Just try a bunch of things, then when you've found something that works, use Keepsake to get back to the exact code that produced those results and formally commit it to Git.

Neat, huh? Keepsake is keeping track of everything in the background so you don't have to.

The workflow so far

With these tools, let's recap what the workflow looks like:

Add experiment = keepsake.init() and experiment.checkpoint() to your training code.
Run several experiments by running the training script as usual, with changes to the hyperparameters or code.
See the results of our experiments with keepsake ls and keepsake show.
Compare the differences between experiments with keepsake diff.
Get the code from the best experiment with keepsake checkout.
Commit that code cleanly to Git.

You don't have to keep track of what you changed in your experiments, because Keepsake does that automatically for you. You can also safely change things without committing to Git, because keepsake checkout will always be able to get you back to the exact environment the experiment was run in.

What's next

Next, you might want to:

Read our guide about storing experiment data in the cloud so you can train on multiple machines and collaborate with others.
Learn how to version training data.
Learn how to do analysis & vizualization in notebooks.

If something doesn't make sense, doesn't work, or you just have some questions, please email us: team@replicate.ai. We love hearing from you!