Filtering by data distribution

Improving models one slice of data at a time

We want models that perform well for not only whole datasets but also for every potential edge case it might encounter out in the wild. The problem when we strive to achieve that goal, it is easy to be overwhelmed by the number of possibilities and suffer from analysis paralysis.

A better and more realistic approach would be to increase the model’s performance one slice of data at a time.

The data distribution tab on the error analysis panel helps us identify what are the most common mistakes our models are making. A good idea is, then, to focus on improving model performance on these error classes iteratively for the next rounds of ML development.

Data distribution

When you click on the Data distribution tab, the Error analysis panel is divided in two.

On the left-hand part, you see the labels for your task. For our banking chatbot, we see the 62 classes there, such as declined_card_payment, pending_transfer, and many more. Furthermore, right beneath each tag, we see the performance, measured by aggregate metrics per class. Using aggregate metrics per class is particularly important when working with unbalanced datasets, where the model performance on the majority class might distort some of the metrics.

On the right-hand part, we see the different error classes. This is a flattened confusion matrix.


Error classes

Looking at the Error analysis panel, can you spot the most common mistake our model makes? Can you filter the data to have a closer look?

The most common error class our model is making is predicting a message is about a declined cash withdrawal (class declined_cash_withdrawal) when in fact the users are talking about a cash withdrawal that is still pending (class pending_cash_withdrawal). It might be challenging to distinguish between these two categories, and maybe, for the next quarter, we would like the team to focus on improving the model performance in this error class.


Documenting error classes

Can you filter the dataset to show only samples our model predicted as Refund_not_showing_up but that the label was request_refund and tag them with the name Q4 (so that the whole team knows that this is the priority for the next quarter)?


Actionable insight

  • Focus on one digestible chunk of the data at a time, and systematically improve the model’s performance gradually.

Did this page help you?