It is possible to update data previously streamed to the Openlayer platform. It usually happens when:

  • The ground truths for the data streamed to the platform were not available during the model inference time, but became available after some time.
  • You want to add human feedback associated with a request, but this feedback was not available during model inference time.

This guide shows how to use Openlayer SDKs to update previously published data.

How to update data

Every row streamed to Openlayer has an inference_id — a unique identifier of the row. You can provide the inference_id during stream time, and if you don’t, Openlayer will assign unique IDs to your rows.

You must use the inference_id to specify the rows you want to update.

Let’s say that you want to add a column called label with ground truths. If you have your data in a pandas DataFrame similar to:

Python
>>> df
            inference_id  label
0             d56d2b2c      0
1             3b0b2521      1
2             8c294a3a      0

First, you need to retrieve the inference pipeline object with:

Python
import openlayer

client = openlayer.OpenlayerClient("YOUR_API_KEY_HERE")

project = client.load_project(name="Churn prediction")

inference_pipeline = project.load_inference_pipeline(
    name="production",
)

Then, you can update the data specified by the inference IDs with:

Python
inference_pipeline.update_data(
    df=df,
    inference_id_column_name='inference_id',
    ground_truth_column_name='label',
)