Keepsake Version control for machine learning

This is an experimental feature so the API may change in the future.

Keepsake works with any machine learning framework, but it includes a callback that makes it easier to use with PyTorch Lightning.

KeepsakeCallback behaves like PyTorch Lightning's ModelCheckpoint callback, but in addition to exporting a model at the end of each epoch, it also:

  1. Calls keepsake.init() at the start of training to create an experiment, and
  2. Calls experiment.checkpoint() after saving the model at on_validation_end. If no validation is defined, then the checkpoint is saved at on_epoch_end. All metrics that have been logged during training with self.log() are saved to the Keepsake checkpoint.

Here is a simple example:

import torch
from torch.nn import functional as F
import pytorch_lightning as pl
from torch.utils.data import DataLoader, random_split, Subset
from torchvision.datasets import MNIST
from torchvision import transforms
from keepsake.pl_callback import KeepsakeCallback
class MyModel(pl.LightningModule):
def __init__(self):
super().__init__()
self.layer_1 = torch.nn.Linear(28 * 28, 128)
self.layer_2 = torch.nn.Linear(128, 10)
self.batch_size = 8
def forward(self, x):
batch_size = x.size()[0]
x = x.view(batch_size, -1)
x = F.relu(self.layer_1(x))
x = self.layer_2(x)
return F.log_softmax(x, dim=1)
def prepare_data(self):
# download only
MNIST(
"/tmp/keepsake-test-mnist",
train=True,
download=True,
transform=transforms.ToTensor(),
)
def setup(self, stage):
# transform
transform = transforms.Compose([transforms.ToTensor()])
mnist_train = MNIST(
"/tmp/keepsake-test-mnist", train=True, download=False, transform=transform
)
mnist_train = Subset(mnist_train, range(100))
# train/val split
mnist_train, mnist_val = random_split(mnist_train, [80, 20])
# assign to use in dataloaders
self.train_dataset = mnist_train
self.val_dataset = mnist_val
def train_dataloader(self):
return DataLoader(self.train_dataset, batch_size=self.batch_size)
def training_step(self, batch, batch_idx):
x, y = batch
logits = self(x)
loss = F.nll_loss(logits, y)
self.log("train_loss", x, on_step=True, on_epoch=True, logger=False)
return loss
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=1e-3)
dense_size = 784
learning_rate = 0.1
model = MyModel()
trainer = pl.Trainer(
checkpoint_callback=False,
callbacks=[
KeepsakeCallback(
params={"dense_size": dense_size, "learning_rate": learning_rate,},
primary_metric=("train_loss", "minimize"),
period=5,
)
],
max_epochs=100,
)
trainer.fit(model)

The KeepsakeCallback class takes the following arguments, all optional:

  • filepath: The path where the exported model is saved. This path is also saved by experiment.checkpoint() at the end of each epoch. If it is None, the model is not saved, and the callback just gathers code and metrics. Default: model.hdf5
  • params: A dictionary of hyperparameters that will be recorded to the experiment at the start of training.
  • primary_metric: A tuple in the format (metric_name, goal), where goal is either minimize, or maximize. For example, ("mean_absolute_error", "minimize").
  • period: The callback saves the model at end of this many epochs. Default: 1
  • save_weights_only: If True, then only the model’s weights will be saved, else the full model is saved. Default: False