Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Sort By:

Encrypt Your Dataset and Train Your Model with It Directly

July 27, 2023

Encrypt Your Dataset and Train Your Model with It Directly


When we deal with dataset for creating AI models, we need to consider sensitive information managed and stored online in the cloud or on connected devices. Unsecured datasets can be vulnerable to unauthorized access, theft, and misuse, particularly when processed for machine learning workloads. Certain fields, such as industrial or medical sectors, face exceptionally high risks when their data is exposed to these potential threats. For example, if a dataset used to train a detection model for identifying factory process errors is leaked, it can expose sensitive factory process technology. This highlights the importance of safeguarding datasets at every stage, from data storage to model training.

Dataset Management Framework (Datumaro) offers a dataset encryption feature for AI model training. With Datumaro, you can encrypt datasets of any computer vision data format into the DatumaroBinary format. This encrypted dataset can remain encrypted as far as it is needed for decryption. By combining the encrypted dataset with OpenVINO training extensions™, you can use it directly for model training without decryption. Whenever needed, you can use Datumaro once again to decrypt the dataset and convert it back to any major computer vision data format, such as VOC, COCO, or YOLO. Please refer to another posting data_convert for data convert.

Encrypt Your Dataset Using Datumaro

Datumaro provides two ways to encrypt a dataset: CLI and Python API. First, you need to install Datumaro on your system. Please refer to the installation guide here for detailed instructions. Once you have completed the installation of Datumaro, let's first look at the CLI usage. You can encrypt a dataset using the datum convert CLI command as follows:

datum convert -i [input-dataset-path] -o [output-dataset-path] -f datumaro_binary -- --save-media --encryption

The necessary user inputs for this command are as follows:

  1. -i <input-dataset-path>: Enter the path to the dataset you want to encrypt in <input-dataset-path>.
  2. -o <output-dataset-path>: Enter the path where the encrypted dataset will be produced in <output-dataset-path>.

NOTE:: (Optional) You can additionally specify the data format of your input dataset by entering the -if <input-dataset-format> argument. In most cases, Datumaro can automatically infer the data format of the input dataset, but it might fail. In such cases, you can use the datum detect --show-rejections <input-dataset-path> command to identify the cause of the failure while inferring the data format.

NOTE:: The --save-media argument is a flag that allows you to convert your media files (e.g., images) as well. If this argument is not provided, the encrypted media will not be included in the output directory and only the encrypted annotations are included in the output directory.

Next, let's take a look at how to encrypt a dataset using the Python API. Please examine the following code snippet:

from datumaro import Dataset 

dataset = Dataset.import_from(path="[input-dataset-path]")

You import the dataset by specifying the path of the input dataset in the import_from function as path="<input-dataset-path>". Then, to export the dataset, you specify the path of the output dataset in the save_dir="<output-dataset-path>" of the export function. Similarly, you also need to provide the encryption=True and format="datumaro_binary" keyword arguments as in the CLI example. A more detailed end-to-end example for this can be found in a Jupyter notebook. Please refer to this link for more information.

So far, all the examples have used the datumaro_binary (DatumaroBinary) format for the exported dataset. Currently, the dataset encryption feature is only supported for the datumaro_binary format. DatumaroBinary is a Datumaro's own data format that stores annotation data in binary representation. It is much faster and storage efficient compared to string-based datasets such as COCO based on JSON. For more detailed information about DatumaroBinary, please refer to this link.

How Datumaro Encrypts Your Dataset?

Datumaro uses the Fernet symmetric encryption recipe provided by the cryptography library to encrypt the dataset. Fernet is built on top of a number of standard cryptographic primitives such as AES or HMAC, and hence Fernet guarantees that a message encrypted cannot be manipulated or read without the key. Please refer to this link for detailed information.

When encrypting the dataset, Datumaro generates a secret key through Fernet and saves it as a txt file at the following path: <output-dataset-path>/secret_key.txt. The secret key generated at this path is a 50-characters string, which consists of a randomly generated 32-bytes string encoded in base64, with the prefix datum- added.

cat [output-dataset-path]/secret_key.txt

# A secret key will be randomly generated.

If you have checked the secret key in this file, you must ensure that it is not in the same location with the dataset. If this secret key is uncovered, an attacker would be able to access the contents of the encrypted dataset. Additionally, this secret key is required when training models using OpenVINO training extensions™ with the encrypted dataset or when decrypting it later. Therefore, you should be careful not to lose this secret key.

The following table briefly shows how the data is encrypted. The binary representation of the data is encrypted, so that the following image cannot be seen by the image viewer.

Train Your Model with the Encrypted Dataset Using OpenVINO Training Extensions™

OpenVINO training extensions™ is a tool that allows convenient training of computer vision models and accelerated inference on Intel® devices by exporting trained models to OpenVINO Intermediate Representation (IR) through a CLI. Within the OpenVINO ecosystem, Datumaro is integrated with OpenVINO training extensions™ as a dataset interface. Therefore, the encrypted dataset can be directly used for model training through OpenVINO training extensions™. For detailed installation instructions of OpenVINO training extensions™, please refer to the following link.

Next, let's explore how to use the encrypted dataset directly for model training through the CLI command.

otx train [template] --train-data-roots [encrypted-dataset-path] --val-data-roots [encrypted-dataset-path] --encryption-key [secret-key]

The user inputs required for this command are as follows:

  1. --train-data-roots <encrypted-dataset-path> and --val-data-roots <encrypted-dataset-path>: Specify the path to the encrypted dataset by replacing <encrypted-dataset-path>. Since the DatumaroBinary format uses the same root directory for both the training and validation subsets, both arguments should have the same value.
  2. --encryption-key <secret-key>: Provide the secret key corresponding to the encrypted dataset in <secret-key>. This is the 50-character string with the datum- prefix described in the previous section.

NOTE:: <template> is the name of the model template provided by OpenVINO training extensions™. A model template is a recipe for a deep learning model for a specific computer vision task. To explore all the model templates supported by OpenVINO training extensions™, you can use the otx find CLI command or refer to this link.

Decrypt the Encrypted Dataset Using Datumaro

If you want to utilize the encrypted dataset in another AI workload, you need to decrypt the encrypted data. This process reverses the dataset encryption using Datumaro, and encryption-decryption preserves all the information without loss. Similar to the previous section, decryption can be done using the CLI or Python API. Let's first look at decryption using the CLI.

datum convert -i [encrypted-dataset-path] -o [output-dataset-path] -f [output-data-format] --encryption-key [secret-key] --save-media

You can use the same datum convert command as before. However, specify the path to the encrypted dataset as the input dataset path (-i <encrypted-dataset-path>), and provide the secret key, which is a 50-character string with the datum- prefix described in the previous section, as the <secret-key> argument for --encryption-key <secret-key>. Additionally, you can choose any data format supported by Datumaro as the output data format. To learn more about the data formats supported by Datumaro, refer to this link.

Next, let's see how decryption can be done using Python API.

from datumaro import Dataset

dataset = Dataset.import_from(

Similar to the CLI method, provide the path to the encrypted dataset and the secret key as arguments to the import_from function. For the export function, specify the output dataset path and the output data format.


This post introduced dataset encryption feature provided by Datumaro. It demonstrated how to encrypt a dataset using Datumaro and train a model with the encrypted dataset using OpenVINO training extensions™. Whenever needed you can decrypt it with Datumaro for other AI projects and training frameworks. You can refer to the end-to-end Jupyter notebook example provided on this blog post here for step-by-step guide. The features introduced in this post are available in Datumaro version 1.4.0 or higher and OpenVINO training extensions™ version 1.4.0 or higher.

Datumaro offers a range of useful features for managing datasets besides the dataset encryption feature. You can find examples of other Datumaro features, such as noisy label detection during training with OpenVINO training extensions™, in the Jupyter examples directory. For more information about Datumaro and its capabilities, you can visit the Datumaro documentation page. If you have any questions or requests about using Datumaro, feel free to open an issue here.


Make Your Own YOLOv8 OpenVINO™ Model from Any Data Format with Datumaro

May 3, 2023

Authors: Vinnam Kim, Wonju Lee, Mark Byun, Minje Park

Dataset formats and trainer


OpenVINO™ provides an easy way to deploy your model with the best inference performance on any Intel hardwares. However, to train your own model for deployment you need to prepare a training framework and dataset. Fortunately, there are many ready-to-use training frameworks and implementations. Then, what about the dataset? A specific training framework requires a specific data format, but there are many data formats in the world. For example, in object detection tasks there are data formats such as YOLO, COCO, and Pascal VOC that are widely used. These formats have different directory structures and annotation file formats as well as different extensions such as txt, json, and, xml, respectively. It's tedious task to convert dataset from one format to another whenever you adopt different training framework.

Let's assume you choose Detectron2, which only supports COCO format datasets. If your dataset is formatted as VOC, you have to convert it into COCO format. Below, we compare the directory structures and annotation file formats of both datasets, VOC and COCO. These datasets have distinct formats and you need to implement codes for format conversion at each time of handling different formats. Of course, this is not technically challenging but this may require tedious code work and debugging for several days. It won't be good to repeat this process if you intend to add more datasets with different formats.

Directory structures of VOC and COCO (Click me!)

└─ Dataset/
   ├── Annotations/
   │     ├── ann1.xml
   │     ├── ann2.xml
   │     └── ...
   ├── JPEGImages/
   │    ├── img1.jpg
   │    ├── img2.jpg
   │    └── ...
   ├── SegmentationClass/
   │    ├── img1.png
   │    ├── img2.png
   │    └── ...
   ├── SegmentationObject/
   │    ├── img1.png
   │    ├── img2.png
   │    └── ...
   └── ImageSets/
        ├── Main/
        │   ├── test.txt
        |   ├── train.txt
        |   └── ...
        ├── Layout/
        │   ├── test.txt
        |   ├── train.txt
        |   └── ...
        ├── Action/
        │   ├── test.txt
        |   ├── train.txt
        |   └── ...
        └── Segmentation/
            ├── test.txt
            ├── train.txt
            └── ...

└─ Dataset/
    ├── images/
    │   ├── train/
    │   │   ├── img1.jpg
    │   │   ├── img2.jpg
    │   │   └── ...
    │   └── val/
    │       ├── img1.jpg
    │       ├── img2.jpg
    │       └── ...
    └── annotations/
        ├── instances_train.json
        └── ...
Annotation file formats of VOC and COCO (Click me!)
VOC (xml) COCO (json)
<?xml version="1.0" encoding="UTF-8"?>
  "licenses": [
      "name": "",
      "id": 0,
      "url": ""
  "info": {
    "contributor": "",
    "date_created": "",
    "description": "",
    "url": "",
    "version": "",
    "year": ""
  "categories": [
      "id": 1,
      "name": "cat",
      "supercategory": ""
      "id": 2,
      "name": "dog",
      "supercategory": ""
  "images": [
      "id": 5,
      "width": 10,
      "height": 5,
      "file_name": "img1.jpg",
      "license": 0,
      "flickr_url": "",
      "coco_url": "",
      "date_captured": 0
  "annotations": [
      "id": 1,
      "image_id": 5,
      "category_id": 2,
      "segmentation": [],
      "area": 3.0,
      "bbox": [
      "iscrowd": 0

Dataset Management Framework (Datumaro) is a framework that provides Python API and CLI tools to convert, transform, and analyze datasets. Among the many features of Datumaro, we would like to introduce the data format conversion feature on this blog, which is one of the fundamental feature for handling many datasets with different training frameworks. Datumaro supports the import and export of over 40 computer vision data formats (please take a look at supported formats for details!). This means that you can easily change your data format through Datumaro. If your model training framework can only read specific formats, don't worry. Use Datumaro and convert it!

Train YOLOv8 model and export it to OpenVINO™ model

  • Prepare dataset
  • Convert dataset with Datumaro
  • Train with YOLOv8 and export to OpenVINO™ IR

YOLOv8 is a well-known model training framework for object detection and tracking, instance segmentation, image classification, and pose estimation tasks. It provides simple CLI commands to train, test, and export a model to OpenVINO™ Intermediate Representation (IR). However, the data format consumed by YOLOv8 is slightly different from the YOLO format itself. Datumaro named it refers to it as YOLO-Ultralytics format. As you can see here, it requires a special meta file to indicate annotation files for each subset and subset files to list subset image files. It further requires them to be placed in an appropriate directory structure. It can be very tedious to go through these details and implement dataset preprocessing when you want to train a model on your custom dataset.

On this blog, we provide an end-to-end example that covers the complete process of converting your dataset, training a model with the converted dataset, and exporting the trained model to OpenVINO™ IR. We understand that dataset conversion can be a tricky process, especially if you have annotated and built your own dataset. Therefore, we will provide an example of converting the dataset created by the popular CVAT annotation tool. By following our step-by-step guide, you will be able to convert your data format easily and accelerate the inference of your trained model with OpenVINO™.

Prepare dataset

In this section, we introduce the steps to export the project annotated by CVAT for the following workflows. You can skip this section if your dataset is formatted as a different data format and is ready to be imported by Datumaro.

CVAT project for object detection task with train, val, and test subsets

NOTE: We used the cats-and-dogs dataset for this example. You can find the reference for this dataset here.

NOTE: You should have three subsets in your project: "train", "val", and "test" (optional). If your dataset has different subset names, you have to rename them. You can do this by using Datumaro's MapSubsets transform.

We export this project to CVAT for images 1.1 data format. Datumaro can import this data format and export it to YOLO-Ultralytics format which can be consumed by YOLOv8.

Export CVAT project to CVAT for images 1.1 data format

Export CVAT project to CVAT for images 1.1 data format. After exporting the dataset, extract it to the cvat_dataset directory.

ls yolo_v8_dataset

You can see the following directory structure:

annotations.xml  images

Convert your dataset using Datumaro

You can convert the dataset located in cvat_dataset using Datumaro's CLI command as follows. For a detailed explanation of the input arguments, see here.

datum convert -i cvat_dataset -if cvat -f yolo_ultralytics -o yolo_v8_dataset -- --save-media

NOTE: If your dataset is not CVAT for images 1.1 format, you can replace -if cvat with the different input format as -if INPUT_FORMAT. Use datum detect CLI command to figure out what format your dataset is.

After the conversion, you can see that yolo_v8_dataset directory is created.

ls yolo_v8_dataset

This directory is structured as follows.

data.yaml  images  labels  test.txt  train.txt  val.txt

Train with YOLOv8 Trainer and Export to OpenVINO™ IR

In this section, we will train the YOLOv8 detector with the dataset converted in the previous section. To train a YOLOv8 detector, please execute the following command.

yolo detect train model=yolov8n.pt data=$(realpath yolo_v8_dataset/data.yaml) project=my-project

NOTE: We use data=$(realpath yolo_v8_dataset/data.yaml) to convert the relative path yolo_v8_dataset/data.yaml to the absolute path. This is because YOLOv8 needs the absolute path for the custom dataset.

After the training, the following command enables testing on the test dataset.

yolo detect val model=my-project/train/weights/best.pt data=$(realpath yolo_v8_dataset/data.yaml) split=test

Lastly, we will export your YOLOv8 detector to OpenVINO™ IR for inference acceleration on Intel devices.

yolo detect export model=my-project/train/weights/best.pt format=openvino

Using this command, the exported IR is created at this directory path, my-project/train/weights/best_openvino_model.

ls my-project/train/weights/best_openvino_model
best.bin  best.xml  metadata.yaml


This post provided an example of training a YOLOv8 detector on an arbitrary data format by utilizing the data format conversion feature of Datumaro and exporting the model to OpenVINO™ IR. You can refer to the executable Jupyter notebook example provided on this blog post here for step-by-step guide. Datumaro offers a range of useful features for managing datasets beyond data format conversion. You can find examples of other Datumaro features, such as noisy label detection during training with OpenVINO™ Training Extensions, in the Jupyter examples directory. For more information about Datumaro and its capabilities, you can visit the Datumaro documentation page. If you have any questions or requests about using Datumaro, feel free to open an issue here.


OpenVINO™ optimize Fairseq S2T model

December 14, 2022

OpenVINO™ Optimize Fairseq S2T Model


Fairseq is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks.

There are 2 steps to generate model ready for OpenVINO™ acceleration:

1. Use torch.export.onnx function convert the “.pt” model to “.onnx” model;

2. Use OpenVINO™ MO toolkit convert the “.onnx” model to “IR” model.

Figure 1. OpenVINO™ optimize Fairseq model workflow

The following graph is the Fairseq framework inference workflow, it defines the model structure by “Model Config”, composes “Model Definition List” through multiple subgraph models, and dynamically loads the submodules in the model inference runtime.

Such as in the S2T task, model consists of two parts: Encoder and Decoder.
·       Encoder is for extracting feature information from audio file.
·       Decoder is for decoding the feature information to generate text information.

Fairseq Inference workflow

The length of audio information will affect the length of the feature information, and the length of the feature information will affect the Decoder submodule loop’s times. Therefore, the structure of the S2T model is dynamically defined according to the length of the input audio.

Figure 2. Fairseq S2T model inference workflow

To optimize Fairseq framework model there’re 4 challenges need to be solved:
-       Fairseq define submodules for various function, include variable in model layer define.
-       Model structure is dynamically loaded in runtime and can’t export a whole torch model graph.
-       Encoder and Decoder part models’ input shapes are dynamic, depending on input data size.
-       Decoder part loop times depends by input sequence lengths.

OpenVINO™ optimize Fairseq workflow

So that we should use some optimization tricks to solve these problems, to make sure the pipeline optimized by OpenVINO™.
-       Divide model into Encoder and Decoder two parts, and separately export to onnx model,
-       Because of the model structure define by input seq_len, should export dynamic shape onnx model.
-       Convert onnx to IR model by OpenVINO™ MO toolkit.
-       Replace the Fairseq S2T task pipeline Encoder and Decoder into IR model.
-       Loading Inference Engine to run pipeline the pipeline on OpenVINO™.

Figure 3. OpenVINO™ optimize Fairseq S2T model workflow


-       Fairseq is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks
-       OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference which can boost deep learning performance in computer vision, automatic speech recognition, natural language processing and other common task.
-       Python version >=3.8
-       PyTorch version >=1.10.0
Reference: GitHub: Fairseq-OpenVINO

Quick Start Demo

Step 1. Install fairseq and requirement

# Install Fairseq
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./
#(Optional) Install the latest stable release (0.10.x)
pip install fairseq

#Install OpenVINO™
Reference: Install OpenVINO by source code for Linux
Reference: Install OpenVINO by release package

Step 2. Download audio file and pre-train model file

In this blog we refer the “S2T Example: STon CoVoST” as sample, Preparation dataset and pre-train model can follow the Fairseq original step. Also, you can use “torch audio” to convert audio file to build customer dataset.

import torchaudio
# load tensor from file
waveform, sample_rate = torchaudio.load('foo.arm') 
# save tensor to file
torchaudio.save('foo_save.wav', waveform, sample_rate)

Step 3. Modify code to export onnx

Torch model export to onnx, We should adjust the contents in fairseq/sequence_generator.py +781 line "self.save_onnx = True" , +782 line "self.openvino_engine = False" The encoder.onnx and decoder.onnx will save in models

python fairseq_cli/interactive.py datasets/en/ --config-yaml config_st_en_de.yaml --task speech_to_text --path models/covost2_fr_en_st_transformer_s.pt --max-tokens 50000 --beam 1

Encoder part model export to dynamic onnx

Figure 4. Encoder part model export to dynamic onnx

Decoder part model export to dynamic onnx

Figure 5. Decoder part model export to dynamic onnx

Step 4. Convert Model to IR

Convert encoder.onnx and decoder.onnx to encoder.xml and decoder.xml

# Convert encoder onnx to IR
mo -m encoder.onnx --input "onnx::Transpose_0[-1,-1,-1],src_lengths[-1]"
# Convert decoder onnx to IR
mo -m decoder.onnx --input "prev_output_tokens[-1,-1],onnx::MatMul_1[-1,-1,-1]"

Step 5. OpenVINO™ Inference Engine optimize S2T pipeline

OpenVINO™ Inference S2T pipeline We should adjust the contents in fairseq/sequence_generator.py +781 line "self.save_onnx = False" , +782 line "self.openvino_engine =True" Use the converted the model to run OpenVINO™ Inference S2T pipeline.

python fairseq_cli/interactive.py datasets/en/ --config-yaml config_st_en_de.yaml --task speech_to_text --path models/covost2_fr_en_st_transformer_s.pt --max-tokens 50000 --beam 1

OpenVINO™ Inference Engine initialization

Figure 6. OpenVINO™ Inference Engine Initialization

Encoder part inference by OpenVINO™

Figure 7. Encoder part inference by OpenVINO™
Figure 8. Encoder part inference by OpenVINO™

Decoder part inference by OpenVINO™

Figure 9. Decoder part inference by OpenVINO™
Figure 10. Decoder part inference by OpenVINO™

Inference Result

Figure 11. Inference Result

Build OpenVINO on CentOS 7 with Script

December 7, 2022

Authors: Xiake Sun, Su Yang

We provide a script to help users easily install and test OpenVINO™ on CentOS 7.6. This script is a practice and supplement to the Building OpenVINO™ on CentOS 7 Guide in the OpenVINO™ wiki. The installation of OpenVINO™ 2022.1.0 and 2022.2.0 has been verified under the following eight configurations of CentOS 7.6 (GCC:7.3.1, 8.3.1 and Python: 3.6,3.7, 3.8, 3.9). The script could also be used for the newer version of OpenVINO with some modifications in the future.

CentOS 7 is an open-source server operating system released by the CentOS project. It was officially released on July 7, 2014. CentOS 7 is an enterprise-level Linux distribution, which is redistributed from the free and open-source code of RedHat, and its stability is trustworthy. The CentOS community and RedHat have announced that CentOS 8 will end on December 31, 2021, and CentOS 7 will end on June 30, 2024. Therefore, CentOS 7 is still the mainstream operating system of most cloud platforms.

However, CentOS 7 default GCC 4.8.5 and CMake 2.8.12 does not meet the OpenVINO software requirement for building on a Linux system. Since OpenVINO™ release 2022.1, the binary and python wheel installation packages will not be provided on CentOS 7. To help users with the verification and integration of the OpenVINO™ new version, Here we provide the installation guide and script.

Script Usage

1.Download OpenVINO™  and Script:

git clone https://github.com/openvinotoolkit/openvino.git -b 2022.2.0 && cd openvino
wget https://github.com/yangsu2022/openvino/blob/centos7-install-script/scripts/centos7_install_guide.sh -P ./script
chmod +x ./script/centos7_install_guide.sh

2.Usage Example:

./scripts/centos_install_guide.sh -g 8 -p 7 -c ~ -m resnet-50-pytorch -b true

3.Script Execution Time:

The environment configuration and dependency library download time are dependent on the network environment and will not be downloaded repeatedly. Compile time depends on the hardware. Every time the script is run, the user needs clean up the previous compilation-related folders (install, build, temp) and recompile.


Usage: [-g <7 or 8>] [-p <6, 7, 8 or9>] [-c <Cmake Dir>] [-m <evaluation model>] [-b benchmark_app C++ <true or false>]


The script steps are similar to the Building OpenVINO™ on CentOS 7 Guide. Here are the contents with some comments. For details, please refer to the script and wiki guide.

0. Install system dependencies setup on CentOS 7.6 and install Anaconda

To test more Python version, we use Anaconda instead of Miniconda. Anaconda silent installation is recommended to avoid manually agreeing to the certificate and restarting bash (the script part is as shown below).

1. Download CMake

Download the pre-built CMake 3.18.4 and add environment variables: no need to change the existing CMake version of the system and no need to compile.

2. Install devtoolset and setup environment

Because CentOS7 was released too early, the default GCC version is 4.8.5, which does not support the compilation of OpenVINO™. Using devtoolset can install a higher version of GCC without changing the system GCC version.

The library Wheel is one of Python dependencies. Wheel 0.37.1 is recommended for OpenVINO 2022.1.0 and 2022.2.0. The higher version of Wheel is not suitable.

3. Build OV with CMake

Create a Python environment with conda and check the name of Python library and header folder.

# download prebuild TBB instead of using system's

4. Install Python Wheels

Install the two generated wheels.(runtime and development tool)

5.Model evaluation with benchmark_app and OpenVINO™ CLI usage

Install the specified version of dependencies, for ONNX 1.12 not support python3.6.

pip install protobuf==3.16.0 onnx==1.11.0

Quickly download and convert models from OpenVINO™ Toolkit - Open Model Zoo repository.

C++/Python benchmark_app: The C++ version requires additional compilation, with the Python version by default.

In the end, the script will show the OpenVINO™ usage example as below.

Congratulation! centos7-install-guide is finished.
Here is an OV usage example on centos7:
source /opt/rh/devtoolset-8/enable
export PATH="~/anaconda3/bin:$PATH"
conda activate py37
benchmark_app -m ~/ov_models/public/resnet-50-pytorch/FP32/resnet-50-pytorch.xml -d CPU


Use Metrics to Scale Model Serving Deployments in Kubernetes

December 6, 2022

In this blog you will learn how to set up horizontal autoscaling in Kubernetes using inference performance metrics exposed by OpenVINO™ Model Server. This will enable efficient scaling of model serving pods for inference on Intel® CPUs and GPUs.

Figure 1. Visualization of autoscaling OpenVINO Model Server using Prometheus metrics in Kubernetes

Why use custom metrics?

OpenVINO™ Model Server provides high performance AI inference on Intel CPUs and GPUs that can be scaled in Kubernetes. However, when it comes to automatic scaling in Kubernetes, the Horizontal Pod Autoscaler by default, relies on CPU utilization and memory usage metrics only. Although resource consumption indicates how busy the application is, it does not clearly say whether serving provides expected quality of service to the clients or not. Since OpenVINO Model Server exposes performance metrics, we can automatically scale based on service quality rather than resource utilization.

The first metric that comes to mind when thinking about service performance is the duration of request processing, otherwise known as latency. For example, mean or median over a specified period or latency percentiles. OpenVINO Model Server provides such metrics but setting autoscaling based on latency requires specific knowledge about each model and the environment where the inference is running in order to properly set thresholds that trigger scaling.

While autoscaling based on latency works and may be a good choice when you have model-specific knowledge, we will instead focus on a more generic metric using ovms_requests_streams_ratio. Let’s dive into what this means.

In the equation above:

  • currently_processed_requests - number of inference requests to a model being processed by the service at a given time.
  • execution_streams_number – number of execution streams. (When a model is loaded on the device, its computing units are divided into streams. Each stream independently handles inference requests, meaning that the number of streams defines how many inferences can be run on the device in parallel. Note that the more streams there are, the less powerful they are, so we get more throughput at a cost of higher minimal latency / inference time.)

In this equation, for any model exceeding a value of 1 indicates that requests are starting to queue up. Setting the autoscaler threshold for the ovms_requests_streams_ratio metric is somewhat of an arbitrary decision that should be made by a cluster administrator. Setting the threshold too low will result in underutilization of nodes and setting it too high will force the system to work with insufficient resources for extended periods of time. Now that we have chosen a metric for autoscaling, let’s start setting it up.

Deploy Model Server with Autoscaling Metrics

First, we need to create a deployment of OpenVINO Model Server in Kubernetes. To do this, follow instructions to install the OpenVINO Operator in your Kubernetes cluster. Then create a configuration where we can specify the model to be served and enable metrics:

apiVersion: v1 
kind: ConfigMap 
  name: ovms-config 
  namespace: default 
  ovms_config.json: | 
    "model_config_list": [ 
            "config": { 
                 "name": "resnet50-int8", 
                 "base_path": "gs://ovms-public-eu/resnet50-binary" 
                 "enable": true 

Create ConfigMap:

kubectl apply -f https://raw.githubusercontent.com/openvinotoolkit/operator/main/examples/hpa_custom_metrics/ovms_config.yaml

With the configuration in place, we can deploy OpenVINO Model Server instance:

apiVersion: intel.com/v1alpha1 
kind: ModelServer 
  name: demo 
  image_name: 'openvino/model_server:2022.2' 
    grpc_port: 8080 
    rest_port: 8081 
    single_model_mode: false 
    config_configmap_name: 'ovms-config' 
    config_path: '/config/ovms_config.json' 
    file_system_poll_wait_seconds: 0 
    log_level: INFO 
    replicas: 1

Create ModelServer resource:

kubectl apply -f https://raw.githubusercontent.com/openvinotoolkit/operator/main/examples/hpa_custom_metrics/ovms.yaml

Deploy and Configure Prometheus

Next, we need to read serving metrics and expose them to the Horizontal Pod Autoscaler. To do this we will deploy Prometheus to collect serving metrics and the Prometheus Adapter to expose them to the autoscaler.  

Deploy Prometheus Monitoring Tool

Let’s start with Prometheus. In the example below we deploy a simple Prometheus instance via the Prometheus Operator. To deploy the Prometheus Operator, run the following command:

kubectl create -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.59.2/bundle.yaml

Next, we need to configure role-based access control to give Prometheus permission to access the Kubernetes API:

kubectl apply -f https://raw.githubusercontent.com/openvinotoolkit/operator/main/examples/hpa_custom_metrics/prometheus_rbac.yaml

The last step is to create a Prometheus instance by deploying Prometheus resource:

kubectl apply -f https://raw.githubusercontent.com/openvinotoolkit/operator/main/examples/hpa_custom_metrics/prometheus.yaml 

If the deployment was successful, a Prometheus service should be running on port 9090. You can set up a port forward for this service, enabling access to the web interface via localhost on your machine:

kubectl port-forward svc/prometheus-operated 9090:9090

Now, when you open http://localhost:9090 in a browser you should see the Prometheus user interface. Next, we need to expose the Model Server to Prometheus by creating a ServiceMonitor resource:

kubectl apply -f https://raw.githubusercontent.com/openvinotoolkit/operator/main/examples/hpa_custom_metrics/service_monitor.yaml

Once it’s ready, you should see a demo-ovms target in the Prometheus UI:

Figure 2. Prometheus User Interface with demo-ovms target

Now that the metrics are available via Prometheus, we need to expose them to the Horizonal Pod Autoscaler. To do this, we deploy the Prometheus Adapter.

Deploy Prometheus Adapter

Prometheus Adapter can be quickly installed via helm or step-by-step via kubectl. For the sake of simplicity, we will use helm3. Before deploying the adapter, we will prepare a configuration that tells it how to expose the ovms_requests_streams_ratio metric:  

apiVersion: v1 
kind: ConfigMap 
  name: adapter-config 
  namespace: default 
  config.yaml: |+ 
    - seriesQuery: 'ovms_current_requests' 
            resource: namespace 
            resource: pod 
        matches: "ovms_current_requests" 
        as: "ovms_requests_streams_ratio" 
      metricsQuery: avg(avg_over_time(ovms_current_requests{<<.LabelMatchers>>}[1m]) 
 / avg_over_time(ovms_streams{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)

Create a ConfigMap:

kubectl apply –f https://raw.githubusercontent.com/openvinotoolkit/operator/main/examples/hpa_custom_metrics/prometheus_adapter_config.yaml

Now that we have a configuration, we can install the adapter:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

helm repo update 

helm install --set 'prometheus.url=http://prometheus-operated.default.svc' --set 'rules.existing=adapter-config' prometheus-adapter prometheus-community/prometheus-adapter

Keep checking until custom metrics are available from the API:

kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq 
  "kind": "APIResourceList", 
  "apiVersion": "v1", 
  "groupVersion": "custom.metrics.k8s.io/v1beta1", 
  "resources": [ 
      "name": "namespaces/ovms_requests_streams_ratio", 
      "singularName": "", 
      "namespaced": false, 
      "kind": "MetricValueList", 
      "verbs": [ 
      "name": "pods/ovms_requests_streams_ratio", 
      "singularName": "", 
      "namespaced": true, 
      "kind": "MetricValueList", 
      "verbs": [ 

Once you see the output above, you can configure the Horizontal Pod Autoscaler to use these metrics.

Set up Horizontal Pod Autoscaler

As mentioned previously, we will set up autoscaling based on the ovms_requests_streams_ratio metric and target an average value of 1. This will try to keep all streams busy all the time while preventing requests from queueing up. We will set minimum and maximum number of replicas to 1 and 3, respectively, and the stabilization window for both upscaling and downscaling to 120 seconds:

kind: HorizontalPodAutoscaler 
apiVersion: autoscaling/v2 
  name: ovms-hpa 
    apiVersion: intel.com/v1alpha1 
    kind: ModelServer 
    name: demo 
  minReplicas: 1 
  maxReplicas: 3 
  - type: Pods 
        name: ovms_requests_streams_ratio 
        type: AverageValue 
        averageValue: 1 
      stabilizationWindowSeconds: 120 
      stabilizationWindowSeconds: 120

Create HorizontalPodAutoscaler:

kubectl apply -f https://raw.githubusercontent.com/openvinotoolkit/operator/main/examples/hpa_custom_metrics/ovms_hpa.yaml 

Once deployed, you can generate some load for your model and see the results. Below you can see how Horizontal Pod Autoscaler scales the number of replicas by checking its status:

kubectl describe hpa ovms-hpa
Autoscaler Status

This data can also be visualized with a Grafana dashboard:

Metrics Visualized on Grafana Dashboard

As you can see, with OpenVINO Model Server metrics, you can quickly set up inferencing system with monitoring and autoscaling for any model. Moreover, with custom metrics, you can set up autoscaling for inference on any Intel CPUs and GPUs.  

See also: