🤯 How?

We know how painful the process of debugging, improving, and maintaining models can be. Previously, we worked as ML engineers at Siri, where we had the unique experience of tackling a wide range of interesting NLP (natural language processing) use cases. Based on what we learnt, we created a rigorous process for building models.
Our process looks like this:
1. 🚀 Upload models & datasets — In ML development, the artifacts you need usually live on one engineer’s computer. This is the first pain point we thought to tackle, by introducing a model and dataset registry visible to all engineers, managers, and other members of your team. Track versions across time, so your experiment is reproducible even months later.
2. 🏃‍♀️ Run datasets or one-off requests through models to start identifying failures — Error analysis sits at the core of Unbox’s process. We make it easy to explore all the errors made by a model, sorted by the most common clusters. Additionally, we make it easy to group and tag similar rows so a team can prioritize the next iteration of data collection or model development.
3. 👩‍🔬 Get row-level annotations that explain why these failures are happening — We use explainability techniques to show you which tokens support or challenge your prediction.
4. 🧪 Generate new data from templates or by augmenting existing data — We provide augmentation tools so you can improve your model by adding the data it needs to get it to the next level. This is also a great way to probe for further biases in your model, which you may otherwise have missed.
5. ✅ Create unit tests on failing cases to “raise the bar” for the next version — We provide testing mechanisms so the same mistakes won’t happen again.
6. 🏋️ Finally, use these insights to inform data collection and re-training — The insights you get from Unbox can drastically reduce labeling costs by helping you focus on the most effective data to guide the next cycle of model training. We took the sentiment model you see in screenshots (from Kaggle) and improved it by 10% (86 => 96% accuracy) in < 1hr of playing around in Unbox.
We want to make the process of model development collaborative and transparent. Think git for ML. It should be easy for all team members to track and drill into any model or dataset in development. Additionally, we want you to be able to show off your work to stakeholders (managers, PMs, etc.) through beautifully designed reports and an intuitive inference playground that lets anyone try out your model themselves.
Last modified 5mo ago
Copy link