Now you are ready for the fun part!
In this tutorial, we will explore the development of a chatbot for banking using Unbox.
Let’s say that we work at a bank and when clients have questions, they send messages via chat to customer support. You know that the customer support department would benefit greatly from a chatbot to either respond automatically to frequently asked questions or at least to automatically label the kind of inquiry the clients are making so that the messages can be directed to the correct person within the bank.
Equipped with your ML knowledge, you want to train a model that categorizes a client message into a category that represents the kind of question they are making. For example, a message such as “I ordered my card a couple of weeks ago and haven’t received it yet. When can I expect it?” belongs to the class
card_delivery_estimate. A message like “Tell me how to reset the passcode” belongs to the class
passcode_forgotten. There are many more classes, as the client’s inquiries can be quite diverse.
This multi-class classifier can be quite useful for different teams inside our organization and hopefully, if our model is good enough, we can improve the bank’s customer support rating.
As a data scientist or ML engineer, it’s all in your hands now.
Let’s train a model to see what happens.
To make your life easier, here is the link to a Colab notebook where you have everything you’ll need to follow this tutorial.
We are going to use the 'banking77' dataset available on Hugging Face, which contains excerpts of banking chat messages that fall into one of 62 categories. We also took the liberty of writing all the code that loads the dataset, tokenizes the messages, splits the dataset into training and validation sets, and trains a logistic regression (which is our model of choice). We added comments on the notebook to guide you throughout this process.
Running the notebook cells
Please, run the notebook cells up to the point where we evaluate the model’s performance on the validation set. How is our model doing? Do you see the accuracy?
Our model’s accuracy on the validation set is almost 85%. Pretty good huh?
Despite their popularity, aggregate metrics, such as accuracy, can be very misleading. They are a good first metric to look at, but they help little to answer questions such as:
- How does our model perform for different groups of the data? For example, what’s the performance for messages where the overarching theme is the users credit card? What about for messages from users that complain about refunds?
- Are there common errors our model is making that could be easily fixed if we had a little bit more data?
- Are there biases hidden in our model?
- Why is our model making predictions like this? Is it doing something reasonable or simply over-indexing to certain tokens and stopwords?
The list of questions we can ask is virtually infinite and staring at the accuracy won’t get us very far. Furthermore, notice that from a business perspective, the answers to these questions might be very relevant, so you need to be confident that your model is coherent enough to answer them.
The only way to start getting the answers we need before we ship the churn model is by systematically conducting error analysis.
The first step is giving the model and the validation set a new home: the Unbox platform. To upload models and datasets to Unbox, we are going to use our API. You will be modifying the notebook we provided to call the API and auto-magically load and deploy the dataset and the model.
When you call our API, it is critical that we know who is calling us, so that we can upload the model and dataset to the correct Unbox account.
Therefore, before uploading anything to Unbox, you need to instantiate the client with your API key.
Instantiating the client
Create a new cell on the notebook we provided, right after the model evaluation part. On that cell, we will instantiate the Unbox Client and you will replace
‘YOUR_API_KEY_HERE’with your API key.
import unboxapi client = unboxapi.UnboxClient('YOUR_API_KEY_HERE')
If you don’t know what’s your API key or if you get a
ModuleNotFoundError when trying to import
unboxapi, check out the installation part of the tutorial and verify if the
unboxapi is successfully installed.
Now that we have instantiated the Unbox client with the correct API key, let’s briefly talk about uploading the model.
The gradient boosting classifier we trained on the notebook is a
scikit learn model. Currently, we support models from the following frameworks:
️ Reach out
Frameworks we currently support: Tensorflow, Scikit-learn, PyTorch, HuggingFace, FastText, Rasa, and XGBoost.
Let us know if you use a different framework!
To be able to upload our model to Unbox, we first need to package it into a
predict_proba function. This function needs to receive the model object and the model’s input as arguments and it should output an array-like with class probabilities. There are other optional arguments you can use to apply any necessary transformations, but as long as the
predict_proba function receives the model and its inputs as arguments and outputs class probabilities, it is compatible with Unbox.
sci-kit learn models, this is basically a wrapper around the
predict_proba method, which receives an array-like of shape
(n_samples, n_features) as an input and outputs an array-like with class probabilities of shape
Therefore, in our case, the predict function simply looks like this:
def predict_function(model, text_list): return model.predict_proba(text_list)
Now that we have our model’s predict function, we are ready to upload it to Unbox. The model upload is done with the
from unboxapi.tasks import TaskType from unboxapi.models import ModelType unbox_model = client.add_model( function=predict_function, model=model, model_type=ModelType.sklearn, task_type=TaskType.TextClassification, class_names=label_list, name="Banking Classifier", description="this is my sklearn banking model" ) model.to_dict()
There are other optional arguments you can pass when uploading a model, but the above is enough for our purposes. For a complete reference on the
add_model method, check our API reference page.
It’s time to upload our dataset as well.
In our example, the validation set is a single
pandas data frame. That’s the data frame that we will upload, which can be done with the
from unboxapi.tasks import TaskType dataset = client.add_dataframe( df=validation_set, class_names=label_list, label_column_name="label_code", text_column_name="text", task_type=TaskType.TextClassification, name="Banking Validation", description="my banking validation dataset" ) dataset.to_dict()
There are other optional arguments you can pass when uploading a dataset. For a complete reference on the
add_dataframe method, check our API reference page.
After following the previous steps, if you log in to Unbox, you should be able to see the model and the dataset that you just uploaded.
Click on Models under Registry, on the sidebar, to check if our Banking model is indeed there.
Click on Datasets under Registry, on the sidebar, to check if the Banking test dataset is indeed there.
If both are there, you are good to move on to the next part of the tutorial!
If you encountered errors while running the previous steps, here are some common issues worth double-checking:
- check if you installed the most recent version of
unboxapi. The current version is 0.0.2. You can which version you have installed by opening your shell and typing:
$ pip show unboxapi
- verify if you imported the
TaskTypeand you are passing the correct model type and task type as arguments;
- verify that you are passing all other arguments correctly, as in the code samples we provided.
If you need a more comprehensive reference on the API methods, feel free to check out our API reference page.
Updated about 1 month ago