In the previous parts of the tutorial, we’ve been exploring the Report, which contains powerful tools for model validation and debugging. Now, we will step back and return to the project page to explore another aspect of error analysis: testing.
Returning to the Project page
Please, return to the project page to continue the tutorial.
Let’s briefly talk about testing.
Test-driven development is common practice in software engineering. In ML, a field not that far away, tests are not as common as they should be.
Testing in ML (if done at all) is usually comprised of a single engineer writing a script to test a few cases that came up during a sloppy error analysis procedure. However, thorough testing goes a long way in ensuring model quality, helping practitioners catch mistakes proactively rather than retroactively.
Unbox offers a test suite to stress-test your models in multiple ways.
In this part of the tutorial, we will create an adversarial test. To check out the other testing possibilities, refer to the testing page.
In the project page, notice that there is a block with all the tests created for that particular project. If you have been following the tutorial, yours will not display any tests yet.
To create your first test, click on the Create tests button in the upper right corner of the Tests block.
You will be redirected to the test creation page. The first thing you’ll see, at the top of the page is the test category to select from. For now, for tabular data, we offer Augmentation, Adversarial, and Confidence tests.
️ Reach out
If you would like to use other testing frameworks, feel free to reach out so that we can accommodate your needs!
Your first test will be an adversarial test for our churn classification model.
Adversarial tests are extremely powerful. The idea is that by manipulating certain feature values we can strive to flip the model’s predictions.
First, select Adversarial on the Category panel in the Test page. After selecting it, the parameters control panel should appear below it.
Let’s understand each adversarial test parameter for our first test:
- Target label: the label we will try to make or model output. For example, if we select
Exitedas the target label, we will get samples that were originally predicted as
Retainedand try to make the model flip its prediction to
- Feature: the feature that will be perturbed as we try to flip the model’s predictions. For example, if we select this as
Age, we will vary the age while keeping the other features constant and see if the model changes its prediction to the target label;
- Sample size: controls the number of tests that will be generated.
- number of tests: there are two options for this parameter. One possibility is defining an integer, which corresponds to the number of samples that we will draw from our dataset at random to apply the perturbations. Alternatively, we can specify a tag at the bottom of the page, directly specifying which samples will be used;
- number of tests per row: the number of perturbations for each row selected.
Creating an adversarial test
Create an adversarial test where we perturb the age and try to flip the model’s prediction from Retained to Exited. The test should be based on 30 data samples drawn at random from our dataset and we should apply 3 perturbations per data sample.
Once you’ve defined the test parameters, you can click on Create on the right-hand part of the page.
After creating a test you will notice that on the test panel on the test page, our newly created test is added to the table. Click on Run to run the test.
After the test finished running, you will be able to see information such as the number of passes and the number of failures, which indicate the number of data samples for which your model maintained its predictions despite the input perturbations and the number of data samples for which your model flipped its prediction, respectively.
We notice that our churn classifier only passed 47% of the tests, meaning that there were age perturbations that made our model change its prediction. Click on Open to have a look at the results.
What we see is that by perturbing the age, while keeping the remaining feature values constant, we were able to flip the model’s predictions multiple times.
After running such a test, one possibility is downloading all the generated rows, merging them with the training set and retraining the model. To download the data generated for a test, you can click on Download test, in the lower right corner. You will be taken to a screen where you can inspect the data to double-check if it makes sense and then confirm the download.
The age example we used is merely illustrative, because maybe the model should indeed flip its prediction if we changed the age. However, there are many real-life situations where the model’s predictions should be invariant to certain input feature variations.
For example, in a model that assesses the credit risk of a user, the model’s predictions should change depending on the user’s income. But should it vary for different users’ genders, all other features being equal? What about for distinct users’ ethnicities?
It certainly shouldn’t.
Playing with adversarial tests
Feel free to play with adversarial tests. Create new tests and apply perturbation to other input features to see if you can detect undesirable behaviors of our churn classifier.
- Catch bugs before shipping model, while the of fixing such bugs is significantly lower;
- Increase stakeholders trust in the model;
- Augment training set to increase model robustness.
Updated 2 months ago