When you run a dataset through a model, you get a run report. The run report gives high-level quantitative metrics (accuracy, F1, precision, and recall) as well as row-level insights that explain why the model made a given a prediction.
Get right to the errors by filtering through mispredicted prediction ↔ label clusters. This is essentially a flattened confusion matrix, with the major diagonal removed. For each error group, browse and filter by the top mispredictive tokens.
It's not enough to see a row is failing or succeeding. Get more insights as to why. Which tokens were most influential towards the prediction?
Red tokens (negative scores) indicatehow influential the token was away from the prediction, whereas blue tokens (positive scores) indicate how influential it was towards the prediction.