In the previous parts of the tutorial, we’ve been exploring the Report, which contains powerful tools for model validation and debugging. Now, we will step back and return to the project page to explore another aspect of error analysis: testing.
Returning to the Project page
Please, return to the project page to continue the tutorial.
Let’s briefly talk about testing.
Test-driven development is common practice in software engineering. In ML, a field not that far away, tests are not as common as they should be.
Testing in ML (if done at all) is usually comprised of a single engineer writing a script to test a few cases that came up during a sloppy error analysis procedure. However, thorough testing goes a long way in ensuring model quality, helping practitioners catch mistakes proactively rather than retroactively.
Unbox offers a test suite to stress-test your models in multiple ways.
In this part of the tutorial, we will create an invariance test. To check out the other testing possibilities, refer to the testing page.
In the project page, notice that there is a block with all the tests created for that particular project. If you have been following the tutorial, yours will not display any tests yet.
To create your first test, click on the Create tests button in the upper right corner of the Tests block.
You will be redirected to the test creation page. The first thing you’ll see, at the top of the page is the test category to select from. For now, for tabular data, we offer Invariance and Confidence tests.
️ Reach out
If you would like to use other testing frameworks, feel free to reach out so that we can accommodate your needs!
Your first test will be an invariance test for our banking chatbot model.
Invariance tests are extremely powerful. Models should remain invariant under certain scenarios, i.e., their predictions should not change across some data instances. Invariance tests leverage the power of synthetic data to verify model robustness.
First, select Invariance on the Category panel in the Test page. After selecting it, the parameters control panel should appear below it.
Let’s understand each adversarial test parameter for our first test:
- Type: the crucial step in invariance tests is generating synthetic data that manifests the kind of invariance that the model should exhibit. Invariance tests at Unbox are based on the CheckList, a testing methodology for NLP models proposed by Ribeiro, Guestrin et al. To check out all invariance possibilities, refer to the testing page;
- Sample size: controls the number of tests that will be generated.
- number of tests: there are two options for this parameter. One possibility is defining an integer, which corresponds to the number of samples that we will draw from our dataset at random to apply the modifications. Alternatively, we can specify a tag at the bottom of the page, directly specifying which samples will be used;
- number of tests per row: the number of perturbations for each row selected.
Creating an invariance test
Create an invariance test to assess model invariance to paraphrases. The test should be based on 30 data samples drawn at random from our dataset and we should create 3 variations per data sample.
Once you’ve defined the test parameters, you can click on Create at the right-hand part of the page.
After creating a test you will notice that on the test panel in the test page, our newly created test is added to the table. Click on Run to run the test.
After the test finished running, you will be able to see information such as the number of passes and the number of failures, which indicate the number of data samples for which your model maintained its predictions despite the input perturbations and the number of data samples for which your model flipped its prediction, respectively.
We notice that our banking chatbot model passed 63% of the tests, meaning that there were some paraphrases that made our model change its prediction. Click on Open to have a look at the results.
What we see is that by paraphrasing sentences our model correctly predicts, we were able to confuse the model multiple times.
After running such a test, one possibility is downloading all the generated rows, merging them with the training set and retraining the model. To download the data generated for a test, you can click on Download test, in the lower right corner. You will be taken to a screen where you can inspect the data to double-check if it makes sense and then confirm the download.
Playing with invariance tests
Feel free to play with invariance tests. Create new tests and apply other variations to the input sentences to see if you can detect undesirable behaviors of our chatbot.
- Catch bugs before shipping model, while the of fixing such bugs is significantly lower;
- Increase stakeholders trust in the model;
- Augment training set to increase model robustness.
Updated 2 months ago