Everyone uses version control for software, but it’s much less common in machine learning.
This causes all sorts of problems: people are manually keeping track of things in spreadsheets, model weights are scattered on S3, and nothing is reproducible. It's hard enough getting your own model from a month ago running, let alone somebody else's.
So why isn’t everyone using Git? Git doesn’t work well with machine learning. It can’t handle large files, it can’t handle key/value metadata like metrics, and it can’t record information automatically from inside a training script. There are some solutions for these things, but they feel like band-aids.
We want to make a small, lightweight, native version control system for ML. Something that does one thing well and combines with other tools to produce the system you need.
We need your help to make this a reality. If you’ve built this for yourself, or are just interested in this problem, join us to help build a better system for everyone.
Join our Discord chat or Get involved on GitHub