The current landscape of MLOps

Thibaut
26 min readJan 20, 2022

Find your way in the jungle of MLOps solutions

Photo by ThisisEngineering RAEng on Unsplash

It is not a new story. When Machine Learning projects reach a critical size, things become difficult to handle with traditional tools.

Here are some challenges for MLOps and some solutions that can be deployed…

Handling the Data

Creation

In many real-world applications, the data is not just available as a clean set on Kaggle or Hugging Face.

Commonly the data need to be extracted and compiled from various enterprise tools, such as the ERP or the CRM. Typically, data pipelines are created to automatize the extract as much as possible. Enterprises are moving environments so these pipelines need to be updated and redeployed on a regular basis.

Saving and versioning these pipelines, as well as the resulting data, then measuring the resulting quality, is critical when the data need to be updated often.

Another very common task linked to the creation of the dataset is data labeling. This is often made by other specialists.

Let’s say that we work for a factory that produces valves. We want to automatize the visual inspection using a camera and a YOLO model for defaults detection. An engineer needs to be paid to label thousands of photos of valves, specifying the location and the type of the default. Then this data is fed to the model as a training and test set.

Data labeling can be a painful and tedious task, often involving experts debates and some storms in teacups. The resulting dataset becomes a critical asset for the company. It is easy to find a model that works, it is not easy to gather the data to train it.

There are literally tons of data labeling tools and services. We will only showcase some.

--

--

Thibaut

Publications in English & French about Data Science, Artificial Intelligence, and Innovation.