Keepsake Version control for machine learning

Keepsake versions all of the models you train and stores them on Amazon S3 or Google Cloud Storage, so you can pull down those models into inference systems.

Load models within Python

Using the Keepsake Python API, you can load a model directly from within your inference script. For example, if you did this in your training script:

import torch
import keepsake
def train():
experiment = keepsake.init(path=".", params={...})
for epoch in range(num_epochs):
# ...
torch.save(model, "model.pth")
experiment.checkpoint(
path="model.pth",
metrics={"loss": loss},
primary_metric=("loss", "minimize")
)

Then you can use this in your inference script to get the model back:

import keepsake
experiment = keepsake.experiments.get("e510303")
checkpoint = experiment.best()
model = torch.load(checkpoint.open("model.pth"))
This also works inside notebooks if you want to do visualization of your model's output.

Load models from the CLI

You can also get files using the command-line interface. This might be useful if you want the model weights on disk, or if you're building a Docker image with the weights inside.

For example, if you run this for the example training script above:

keepsake checkout e510303 -o weights/

Then the model weights will be written to weights/model.pth.

Note: Either an experiment ID or checkpoint ID can be passed to keepsake checkout. The checkpoint ID makes a better versioning identifier because it specifies a specific version of your model weights.

You can only use an experiment ID in the Python API, currently. Support for checkpoint IDs is being worked on. See this GitHub issue for more details.

Let’s build this together

Everyone uses version control for software, but it’s much less common in machine learning.

This causes all sorts of problems: people are manually keeping track of things in spreadsheets, model weights are scattered on S3, and nothing is reproducible. It's hard enough getting your own model from a month ago running, let alone somebody else's.

So why isn’t everyone using Git? Git doesn’t work well with machine learning. It can’t handle large files, it can’t handle key/value metadata like metrics, and it can’t record information automatically from inside a training script. There are some solutions for these things, but they feel like band-aids.

We want to make a small, lightweight, native version control system for ML. Something that does one thing well and combines with other tools to produce the system you need.

We need your help to make this a reality. If you’ve built this for yourself, or are just interested in this problem, join us to help build a better system for everyone.

Join our Discord chat  or  Get involved on GitHub


Sign up for occasional email updates about the project and the community:

A project from Replicate.

```