Fiona

Zhao

November 9, 2023

June 6, 2024

Use Encrypted Model with OpenVINO

Deploying deep-learning capabilities to edge devices can present security challenges like ensuring inference integrity, or providing copyright protection of your deep-learning models. OpenVINO provide a simple method with crypto algorithm to protect model in disk. Model encryption, decryption and authentication are not provided by OpenVINO but can be implemented with third-party tools (i.e., OpenSSL). In this example, we use AES-128-cbc algorithm in OpenSSL to demonstrate the model cryptography.

As you can see the mechanism in below image, there are two part to process:

First is to encrypt your plain IR model into encrypted model.
The second part is to use the same password key and IV which used for encryption before to decrypt model at model loading runtime.

The schema of model encryption and decryption by OpenVINO

Step 1: Encrypt model

Make sure you install the OpenSSL and boost, for example in Ubuntu:


$ sudo apt install openssl libboost-dev

Then use command line to do model encryption by OpenSSL AES-128-CBC algorithm. In this simply example, I use same password for Key and IV, it is hexadecimal of string "openvino encrypt". You can use some online str2hex tool to generate hex representation of your string password.


$ openssl enc -aes-128-cbc -in openvino_model.xml -out openvino_model_enc.xml -K 6f70656e76696e6f20656e6372797074 -iv 6f70656e76696e6f20656e6372797074
$ openssl enc -aes-128-cbc -in openvino_model.bin -out openvino_model_enc.bin -K 6f70656e76696e6f20656e6372797074 -iv 6f70656e76696e6f20656e6372797074

Step 2: Decrypt model

Here provide the sample code to read encrypted model into buffer and decrypt to plain model binary. Then read and compile model.


#include <fstream>
#include <iostream>
#include <vector>
#include <cmath>
#include <cctype>
#include <string>
#include <openvino/runtime/core.hpp>
#include <openssl/aes.h>
#include <boost/algorithm/hex.hpp>

using namespace std;

vector<unsigned char> aes_128_cbc_decrypt(
    vector<unsigned char> &cipher,
    std::vector<unsigned char> &key,
    std::vector<unsigned char> iv) {

    AES_KEY ctx;
    AES_set_decrypt_key(key.data(), 128, &ctx);
    std::vector<uint8_t> plain;
    //cipherLen = clearLen + 16 - (clearLen mod 16)
    int plain_size = ceil(cipher.size()/16)*16; //make sure alloc buffer is enough to plain_size
    plain.resize(plain_size);
    std::cout << "AES_cbc_encrypt start:" << std::endl;
    AES_cbc_encrypt(cipher.data(), plain.data(), plain.size(), &ctx, iv.data(), AES_DECRYPT);
    std::cout << "AES_cbc_encrypt done" << std::endl;
    return plain;
}

void decrypt_file(std::ifstream & stream,
                  std::vector<unsigned char> & key,
                  std::vector<unsigned char> & iv,
                  std::vector<uint8_t> & result) {
    std::vector<unsigned char> cipher((std::istreambuf_iterator<char>(stream)),  std::istreambuf_iterator<char>());
    std::cout << "aes_128_cbc_decrypt" << std::endl;
    std::vector<unsigned char> decrypt_model = aes_128_cbc_decrypt(cipher, key, iv);
    result = decrypt_model;

}

int main() {
    std::string key_hex = "6f70656e76696e6f20656e6372797074";
    std::string iv_hex = "6f70656e76696e6f20656e6372797074";
    std::vector<unsigned char> key_bytes;
    std::vector<unsigned char> iv_bytes;
    boost::algorithm::unhex(key_hex, std::back_inserter(key_bytes));
    boost::algorithm::unhex(iv_hex, std::back_inserter(iv_bytes));
    std::vector<uint8_t> model_data, weights_data;
    std::ifstream model_file("openvino_model_enc.xml",std::ios::in | std::ios::binary), weights_file("openvino_model_enc.bin",std::ios::in | std::ios::binary);
    // Read model files and decrypt them into temporary memory block
    std::cout << "decrypt file" << std::endl;
    decrypt_file(model_file, key_bytes, iv_bytes, model_data); //key & iv is the same
    decrypt_file(weights_file, key_bytes, iv_bytes, weights_data);
    ov::Core core;
    // Load model from temporary memory block
    std::string str_model(model_data.begin(), model_data.end());
    std::unique_ptr<ov::InferRequest> infer_request= std::make_unique<ov::InferRequest>(core.compile_model(str_model,ov::Tensor(ov::element::u8, {weights_data.size()}, weights_data.data()),"CPU").create_infer_request());
    std::cout << "compile success" << std::endl;
    return 0;
}

CMakeLists.txt file like below for compiling:


cmake_minimum_required(VERSION 3.5)
set(CMAKE_CXX_STANDARD 23)
set(CMAKE_BUILD_TYPE "Release" CACHE STRING "CMake build type")
add_compile_options(-O3 -march=native -Wall)

find_package(OpenVINO REQUIRED)
find_package(OpenSSL REQUIRED)
find_package(Boost REQUIRED)

add_executable(model_crypto main.cpp)
target_include_directories(model_crypto PRIVATE ${OV_INCLUDE_DIR} )
target_link_libraries(model_crypto PRIVATE openvino::runtime OpenSSL::SSL Boost::headers)

This blog just provide an example of model encryption by OpenSSL. This method can only protect you model in disk, for total memory crypto, you can refer technologies like OpenVINO™ Security Add-on in virtual machine to provide an isolated environment for security sensitive operations, and use Intel® SGX (Software Guard Extensions) which allows developers to split a computer's memory into private, predefined, highly secure areas called enclaves, which better protect sensitive information.

‍

Reference:

OpenVINO model protection: https://docs.openvino.ai/2023.1/openvino_docs_OV_UG_protecting_model_guide.html
OpenVINO™ Security Add-on: https://docs.openvino.ai/2023.1/ovsa_get_started.html
OpenSSL official website: https://www.openssl.org/

OpenVINO Latent Consistency Model C++ pipeline with LoRA model support

January 25, 2024

Introduction

Latent Consistency Models (LCMs) is the next generation of generative models after Latent Diffusion Models (LDMs). While Latent Diffusion Models (LDMs) like Stable Diffusion are capable of achieving the outstanding quality of generation, they often suffer from the slowness of the iterative image denoising process. LCM is an optimized version of LDM. Inspired by Consistency Models (CM), Latent Consistency Models (LCMs) enabled swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion. The Consistency Models is a new family of generative models that enables one-step or few-step generation. More details about the proposed approach and models can be found using the following resources: project page, paper, original repository.

This article will demonstrate a C++ application of the LCM model with Intel’s OpenVINO™ C++ API on Linux systems. For model inference performance and accuracy, the C++ pipeline is well aligned with the Python implementation.

The full implementation of the LCM C++ demo described in this post is available on the GitHub: openvino.genai/lcm_dreamshaper_v7.

Model Conversion

To leverage efficient inference with OpenVINO™ runtime on Intel platforms, the original model should be converted to OpenVINO™ Intermediate Representation (IR).

LCM model

Optimum Intel can be used to load SimianLuo/LCM_Dreamshaper_v7 model from Hugging Face Hub and convert PyTorch checkpoint to the OpenVINO™ IR on-the-fly, by setting export=True when loading the model, like:

from optimum.intel.openvino import OVLatentConsistencyModelPipeline

model = OVLatentConsistencyModelPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7", export=True)
model.save_pretrained("ov_lcm_model")

Tokenizer

OpenVINO Tokenizers is an extension that adds text processing operations to OpenVINO Inference Engine. In addition, the OpenVINO Tokenizers project has a tool to convert a HuggingFace tokenizer into OpenVINO IR model tokenizer and detokenizer: it provides the convert_tokenizer function that accepts a tokenizer Python object and returns an OpenVINO Model object:

from transformers import AutoTokenizer
from openvino_tokenizers import convert_tokenizer
from openvino import compile_model, save_model

hf_tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
ov_tokenizer_encoder = convert_tokenizer(hf_tokenizer)
save_model(ov_tokenizer_encoder, "ov_tokenizer.xml")

Note: Currently OpenVINO Tokenizers can be inferred on CPU devices only.

Conversion step

You can find the full script for model conversion at the original repo.

Note: The tutorial assumes that the current working directory is and <openvino.genai repo>/image_generation/lcm_ dreamshaper_v7/cpp all paths are relative to this folder.

Let’s prepare a Python environment and install dependencies:

conda create -n openvino_lcm_cpp python==3.10
conda activate openvino_lcm_cpp
conda install -c conda-forge 'openvino>=2023.3.0'
python -m pip install -r scripts/requirements.txt
python -m pip install ../../../thirdparty/openvino_contrib/modules/custom_operations/[transformers]

Now we can use the script scripts/convert_model.py to download and convert models:

cd scripts
python convert_model.py -lcm "SimianLuo/LCM_Dreamshaper_v7" -t FP16

C++ Pipeline

Pipeline flow

Let’s now talk about the logical structure of the LCM model pipeline.

Just like the classic Stable Diffusion pipeline, the LCM pipeline consists of three important parts:
- A text encoder to create a condition to generate an image from a text prompt.
- U-Net for step-by-step denoising the latent image representation.
- Autoencoder (VAE) for decoding the latent space to an image.

The pipeline takes a latent image representation and a text prompt transformed to text embedding via CLIP’s text encoder as an input. The initial latent image representation is generated using random noise generator. LCM uses a guidance scale for getting time step conditional embeddings as input for the diffusion process, while in Stable Diffusion, it used for scaling output latents.

Next, the U-Net iteratively denoises the random latent image representations while being conditioned on the text embeddings. The output of the U-Net, being the noise residual, is used to compute a denoised latent image representation via a scheduler algorithm. LCM introduces its own scheduling algorithm that extends the denoising procedure introduced by denoising diffusion probabilistic models (DDPMs) with non-Markovian guidance. The denoising process is repeated for a given number of times to step-by-step retrieve better latent image representations. When complete, the latent image representation is decoded by the decoder part of the variational auto encoder.

The C++ implementations of the scheduler algorithm and LCM pipeline are available at the following links: LCM Scheduler, LCM Pipeline.

LoRA support

LoRA (Low-Rank Adaptation) is a training technique for fine-tuning Stable Diffusion models. There are various LoRA models available on https://civitai.com/tag/lora.

The main idea for LoRA weights enabling, is to append weights onto the OpenVINO LCM models at runtime before compiling the Unet/text_encoder model. The method is to extract LoRA weights from safetensors file, find the corresponding weights in Unet/text_encoder model and insert the LoRA bias weights. The common approach to add LoRA weights looks like:

The original LoRA safetensor model is loaded via safetensors.h. The layer name and weight of LoRA are modified with Eigen Lib and inserted into Unet/text_encoder OpenVINO model using ov::pass::MatcherPass - you can see the implementation in the file common/diffusers/src/lora.cpp.

To run the LCM demo with the LoRA model, first download LoRA, for example: LoRa/Soulcard.

Build and Run LCM demo

Let’s start with the dependencies installation:‍

conda activate openvino_lcm_cpp
conda install -c conda-forge eigen c-compiler cxx-compiler make

Now we can build the application:

cmake -DCMAKE_BUILD_TYPE=Release -S . -B build
cmake --build build --config Release --parallel
cd build

‍‍And finally we’re ready to run the LCM demo. By default the positive prompt is set to: “a beautiful pink unicorn”.

Please note, that the quality of the resulting image depends on the quality of the random noise generator, so there is a difference for output images generated by the C++ noise generator and the PyTorch generator. Use oprion -r to read the PyTorch generated noise from the provided textfiles for the alignment with Python pipeline.
‍

Note: Run ./lcm_dreamshaper -h to see all the available demo options

Let’s try to run the application in a few modes:

Read the numpy latent input and noise for scheduler instead of C++ std lib for the alignment with Python pipeline: ./lcm_dreamshaper -r‍

Generate image with C++ std lib generated latent and noise : ./lcm_dreamshaper

‍

‍

Generate image with Soulcard LoRa and C++ generated latent and noise: ./lcm_dreamshaper -r -l path/to/soulcard.safetensors

‍

Introduction

The OpenVINO™ Benchmark Application estimates deep learning inference performance on supported devices for synchronous and asynchronous modes.

NOTE: This guide describes the usage of the C++ implementation of the Benchmark Tool. For the Python implementation, refer to the Benchmark Python Tool page. The Python version is recommended for benchmarking models used in Python applications, and the C++ version is recommended for benchmarking models used in C++ applications.

In this tutorial, we will guide you through building and running the C++ implementation of the Benchmark Tool on Ubuntu with OpenVINO™ 2023.1.0 release and demonstrate its usage by benchmarking the Inception (GoogleNet) V3 deep learning model. The following steps outline the process:

Download and Convert the Model
Install OpenVINO™ Runtime
Build OpenVINO™ C++ Runtime Samples
Run the Benchmark Application

The benchmark application works with models in the OpenVINO™ IR (.xml and .bin), ONNX (.onnx), TensorFlow (*.pb), TensorFlow Lite (*.tflite) and PaddlePaddle (*.pdmodel) formats. Make sure to convert your models if necessary (see "Model conversion to OpenVINO™ IR format" step below).

Requirements

Before getting started, ensure that you have the following requirements in place:

Ubuntu 18.04 or higher
CMake version 3.10 or higher

Step 1: Install OpenVINO™

To get started, first install OpenVINO™ Runtime C++ API.

Download and Setup OpenVINO™ Runtime archive file for Linux for your system. The following steps describe the installation process for Ubuntu 20.04 x86_64 system:

1. Download the archive file, extract the files, rename the extracted folder, and move it to the desired path:

curl -L https://storage.openvinotoolkit.org/repositories/openvino/packages/2023.1/linux/l_openvino_toolkit_ubuntu20_2023.1.0.12185.47b736f63ed_x86_64.tgz --output openvino_2023.1.0.tgz
tar -xf openvino_2023.1.0.tgz
sudo mkdir /opt/intel/openvino_2023.1.0
sudo mv -v l_openvino_toolkit_ubuntu20_2023.1.0.12185.47b736f63ed_x86_64/* /opt/intel/openvino_2023.1.0

2. Install required system dependencies on Linux. To do this, OpenVINO provides a script in the extracted installation directory. Run the following command:

cd /opt/intel/openvino_2023.1.0
sudo -E ./install_dependencies/install_openvino_dependencies.sh

3. For simplicity, it is useful to create a symbolic link as below:

cd /opt/intel
sudo ln -s openvino_2023.1.0 openvino_2023

4. Set OpenVINO™ environment variables. Open a terminal window and run the setupvars.sh script to temporarily set your environment variables. If your <INSTALL_DIR> is not /opt/intel/openvino_2023, use the correct one instead:

source /opt/intel/openvino_2023/setupvars.sh

‍

Step 2: Build OpenVINO™ C++ Runtime Samples

In the existing terminal window where the OpenVINO™ environment is set up, navigate to the /opt/intel/openvino_2023.1.0/samples/cpp directory and run the /build_samples.sh script:

cd /opt/intel/openvino_2023.1.0/samples/cpp
./build_samples.sh

As a result of a successful build, you'll get the message with a path to the sample binaries:

...
[100%] Linking CXX executable ../intel64/Release/benchmark_app
[100%] Built target benchmark_app
[100%] Built target ie_samples

Build completed, you can find binaries for all samples in the /home/user/openvino_cpp_samples_build/intel64/Release subfolder.

NOTE: You can also use the -b option to specify the sample build directory and -i to specify the sample install directory, for example:

./build_samples.sh -b /home/user/ov_samples/build -i /home/user/ov_samples

NOTE: The build_samples.sh script will build all the samples in the /opt/intel/openvino_2023.1.0/samples/cpp folder. Remove the other samples from the folder if you want to build only a few samples or only the benchmark_app.

Step 3: Run the Benchmark Application

NOTE: You can use your model for benchmark running or if necessary download model for demo using the Model Downloader. You can find pre-trained models from either public models or Intel’s pre-trained modelsfrom the OpenVINO™ Open Model Zoo. Following are the steps to install the tools and obtain the IR for the Inception (GoogleNet) V3 PyTorch model:

pip install "openvino-dev>=2023.1.0"
omz_downloader --name googlenet-v3-pytorch
omz_converter --name googlenet-v3-pytorch --precisions FP32

The googlenet-v3-pytorch IR files will be located at: <CURRENT_DIRECTORY>/public/googlenet-v3-pytorch/FP32

Navigate to the samples binaries folder and run the benchmark_app with the following command:

cd /home/user/openvino_cpp_samples_build/intel64/Release
./benchmark_app -m path/to/public/googlenet-v3-pytorch/FP32/googlenet-v3-pytorch.xml

By default, the application will load the specified model onto the CPU and perform inferencing on batches of randomly generated data inputs for 60 seconds. As it loads, it prints information about benchmark parameters. When benchmarking is completed, it reports the minimum, average, and maximum inferencing latency and average the throughput.

NOTE: You can use images from the media files collection available at test_data and infer with specific input data using the -i argument to benchmark_app.

You may be able to improve benchmark results beyond the default configuration by configuring some of the execution parameters for your model. Please find other options for configuring execution parameters here: Benchmark C++ Tool Configuration Options

Model conversion to OpenVINO™ IR format

You can use OpenVINO™ Model Converter to convert your model to Intermediate Representation (IR) when necessary:

1. Install OpenVINO™ for Python which includes the necessary components for utilizing the OpenVINO™ Model Converter.

NOTE: Ensure you install the same version of OpenVINO™ Runtime Package for Python as the OpenVINO™ Runtime C++ API in step 2.

pip install "openvino>=2023.1.0"

2. To convert the model to IR, run Model Converter:

ovc INPUT_MODEL

Install OpenVINO™ Runtime on Linux from an Archive File

Transition from Legacy Conversion API¶

OpenVINO™ Benchmark C++ Tool

OpenVINO™ Samples Overview

OpenVINO™ Development Tools

Running OpenVINO™ C++ samples on Visual Studio

‍

Enable OpenVINO™ Optimization for GroundingDINO

April 25, 2024

Authors: Wenyi Zou, Xiake Sun

Introduction

GroundingDINO introduces a language-guided query selection module to enhance object detection using input text. This module selects relevant features from image and text inputs and uses them as decoder queries. In this blog, we provide the OpenVINO™ optimization for GroundingDINO on Intel® platforms.

The public GroundingDINO project is referenced from: GroundingDINO

The GroundingDINO refer the model structure in below picture:

Figure 1. The framework of Grounding DINO. We present the overall framework, a feature enhancer layer, and a decoder layer in block 1,block 2, and block 3,respectively.

OpenVINO™ backend on GroundingDINO

In this project, you do not require to download OpenVINO™ and build the library with GroundingDINO project manually. It’s already fully integrated with OpenVINO™ runtime library for downloading, program compiling and linking.

At present, this repository already optimized and validated by OpenVINO™ 2023.1.0.dev20230811 version. Check the operating system which can support OpenVINO™ runtime library directly:

Ubuntu 22.04 long-term support (LTS), 64-bit (Kernel 5.15+)
Ubuntu 20.04 long-term support (LTS), 64-bit (Kernel 5.15+)
Ubuntu 18.04 long-term support (LTS) with limitations, 64-bit (Kernel 5.4+)
Windows* 10
Windows* 11
macOS* 10.15 and above, 64-bit
Red Hat Enterprise Linux* 8, 64-bit

Step 1: Install system dependency and setup environment

Create and enable python virtual environment

conda create -n ov_py310 python=3.10 -y
conda activate ov_py310

Clone the GroundingDINO repository from GitHub

git clone https://github.com/wenyi5608/GroundingDINO.git -b wenyi5608-openvino

Change the current directory to the GroundingDINO folder

cd GroundingDINO/

Install python dependency

pip install -r requirements.txt
pip install openvino==2023.1.0.dev20230811 openvino-dev==2023.1.0.dev20230811 onnx onnxruntime

Install the required dependencies in the current directory

pip install -e .

Download pre-trained model weights

mkdir weights
cd weights/
wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
cd ..

Step 2: Export to OpenVINO™ models

python demo/export_openvino.py -c groundingdino/config/GroundingDINO_SwinT_OGC.py -p weights/groundingdino_swint_ogc.pth -o weights/

Step 3: Simple inference test with PyTorch and OpenVINO™

Inference with PyTorch

python demo/inference_on_a_image.py \
-c groundingdino/config/GroundingDINO_SwinT_OGC.py \
-p weights/groundingdino_swint_ogc.pth \
-i .asset/demo7.jpg \
-t "Horse. Clouds. Grasses. Sky. Hill." \
-o logs/1111 \
 --cpu-only

Inference with OpenVINO™

python demo/ov_inference_on_a_image.py \
-c groundingdino/config/GroundingDINO_SwinT_OGC.py \
-p weights/groundingdino.xml \
-i .asset/demo7.jpg  \
-t " Horse. Clouds. Grasses. Sky. Hill."  \
-o logs/2222 -d CPU

Fiona

Zhao

Use Encrypted Model with OpenVINO

Step 1: Encrypt model

Step 2: Decrypt model

Reference:

OpenVINO Latent Consistency Model C++ pipeline with LoRA model support

Introduction

Model Conversion

LCM model

Tokenizer

Conversion step

C++ Pipeline

Pipeline flow

LoRA support

Build and Run LCM demo

See Also

How to build and run OpenVino™ C++ Benchmark Application for Linux

Introduction

Requirements

Step 1: Install OpenVINO™

Step 2: Build OpenVINO™ C++ Runtime Samples

Step 3: Run the Benchmark Application

Model conversion to OpenVINO™ IR format

Related Articles

Enable OpenVINO™ Optimization for GroundingDINO

Authors: Wenyi Zou, Xiake Sun

Introduction

OpenVINO™ backend on GroundingDINO