Kunda

Xu

April 23, 2023

May 8, 2023

OpenVINO optimizer Latent Diffusion Models (LDM) for super-resolution

OpenVINO optimizer Latent Diffusion Models(LDM) for super-resolution

Introduction

A computer vision approach called image super-resolution aims to increase the resolution of low-resolution images so that they are clearer and more detailed. Applicationsfor super-resolution include the processing of medical images, surveillancefootage, and satellite images.

The LDM (LatentDiffusion Models) Super Resolution model, a deep learning-based approach to photo super-resolution, was developed by the Hugging Face Research team. The residual network (ResNet) architecture, a type of convolutional neural network(CNN) created to address the issue of vanishing gradients in deep neuralnetworks.

Diffusion models are generative models,meaning that they are used to generate data similar to the data on which they are trained. Fundamentally, Diffusion Models work by destroying training data through the successive addition of Gaussian noise, andthen learning to recover the data by reversing this noising process. After training, we can use the Diffusion Model to generatedata by simply passing randomly sampled noise through the learned denoising process.

Figure2-Diffusion Models can be used to generate images from noise

Diffusion Model is a latent variable model which maps to the latent space using a fixed Markov chain. This chain gradually adds noise to thedata in order to obtain the approximate posterior.

Figure3-The Markov chain is manifested for image data

Ultimately, the image is asymptotically transformed to pure Gaussian noise. The goal of training a diffusion model is to learn the reverse process. By traversing backward along this chain, we can generate new data.

Figure4-The process for new image data generation by diffusion

Requirement

- Optimum-intel Optimum Intel is the interface betweenthe HuggingFace Transformers and Diffusers libraries and the differenttools and libraries provided by Intel to accelerate end-to-end pipelines onIntel architectures.
Intel Neural Compressor is an open-source library enabling the usageof the most popular compression techniques such as quantization, pruning and knowledge distillation

- OpenVINO™ is an open-sourcetoolkit for optimizing and deploying AI inference which can boost deep learningperformance in computer vision, automatic speech recognition, natural language processing and other common task.

- optimum-intel==1.5.2(include openvino)

- openvino

- openvino-dev

- diffusers

- pytorch >= 1.9.1

- onnx >= 1.13.0

Reference: optimum-intel-ldm-super-resolution-4x

QuickStart Demo

‍

Original repo is from HuggingFace CompVis/ldm-super-resolution-4x-openimages,we are reference to build our pipeline to implement super-resolution related function.

Figure5-The super resolution pipeline from huggingface

To transformand acceleration optimize the pipeline by openvino, there are 3 steps need to do.

- Step1. Install the requirement package and initial environment.

- Step2. Convert original model to openvino IR model.

- Step3. Build OpenVINO super resolution pipeline.

Now, Let’s start with the content of our tutorial.

Step 1. Install the requirementpackage and initial environment

OpenVINO has the standard installation process, we can directly refer tothe official OpenVINO documentation to install.

Reference: Install OpenVINO by source code for Linux

Reference: Install OpenVINO by release package

Optimum Intel also can refer the standard guide.

Reference: Optimum-intel install guide

(Optional) Install the latest stable release by pipe :

# pip install openvino, openvino-dev

# pip install"optimum[openvino,nncf]"

Step 2. Convert originalmodel to OpenVINO IR model

Firstly, run pipe the HuggingFace pipeline, it will automate download the models, and we need to convert them from pytorch->onnx->IR, to enable the model by OpenVINO.

Figure6-OpenVINO enable HuggingFace model (pytorch base) workflow

The LDM (LatentDiffusion Models) Super Resolution model has two part of sub-models: unet and vqvae,we should convert each of them in to IR model.

Figure7-OpenVINO enable super resolution pipeline workflow

The reference source code for model convert,also we provide the script in the GitHub repo : ov-ldm4x-model-convert.py

Initial parameter and the ov-pipeline

Figure8-Initial parameter and ov-pipeline

Unet sub-model convert to IR

Vqvae sub-model convert to IR

Step 3. Build OpenVINOsuper resolution pipeline

The LDM (Latent Diffusion Models) Super Resolution OpenVINO pipeline main function part code, the whole pipeline script is provided in GitHub repo: ov-ldm4x-pipeline.py‍

Inference Result

Figure12-Super resolution effect display

‍

OpenVINO Latent Consistency Model C++ pipeline with LoRA model support

January 25, 2024

Introduction

Latent Consistency Models (LCMs) is the next generation of generative models after Latent Diffusion Models (LDMs). While Latent Diffusion Models (LDMs) like Stable Diffusion are capable of achieving the outstanding quality of generation, they often suffer from the slowness of the iterative image denoising process. LCM is an optimized version of LDM. Inspired by Consistency Models (CM), Latent Consistency Models (LCMs) enabled swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion. The Consistency Models is a new family of generative models that enables one-step or few-step generation. More details about the proposed approach and models can be found using the following resources: project page, paper, original repository.

This article will demonstrate a C++ application of the LCM model with Intel’s OpenVINO™ C++ API on Linux systems. For model inference performance and accuracy, the C++ pipeline is well aligned with the Python implementation.

The full implementation of the LCM C++ demo described in this post is available on the GitHub: openvino.genai/lcm_dreamshaper_v7.

Model Conversion

To leverage efficient inference with OpenVINO™ runtime on Intel platforms, the original model should be converted to OpenVINO™ Intermediate Representation (IR).

LCM model

Optimum Intel can be used to load SimianLuo/LCM_Dreamshaper_v7 model from Hugging Face Hub and convert PyTorch checkpoint to the OpenVINO™ IR on-the-fly, by setting export=True when loading the model, like:

from optimum.intel.openvino import OVLatentConsistencyModelPipeline

model = OVLatentConsistencyModelPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7", export=True)
model.save_pretrained("ov_lcm_model")

Tokenizer

OpenVINO Tokenizers is an extension that adds text processing operations to OpenVINO Inference Engine. In addition, the OpenVINO Tokenizers project has a tool to convert a HuggingFace tokenizer into OpenVINO IR model tokenizer and detokenizer: it provides the convert_tokenizer function that accepts a tokenizer Python object and returns an OpenVINO Model object:

from transformers import AutoTokenizer
from openvino_tokenizers import convert_tokenizer
from openvino import compile_model, save_model

hf_tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
ov_tokenizer_encoder = convert_tokenizer(hf_tokenizer)
save_model(ov_tokenizer_encoder, "ov_tokenizer.xml")

Note: Currently OpenVINO Tokenizers can be inferred on CPU devices only.

Conversion step

You can find the full script for model conversion at the original repo.

Note: The tutorial assumes that the current working directory is and <openvino.genai repo>/image_generation/lcm_ dreamshaper_v7/cpp all paths are relative to this folder.

Let’s prepare a Python environment and install dependencies:

conda create -n openvino_lcm_cpp python==3.10
conda activate openvino_lcm_cpp
conda install -c conda-forge 'openvino>=2023.3.0'
python -m pip install -r scripts/requirements.txt
python -m pip install ../../../thirdparty/openvino_contrib/modules/custom_operations/[transformers]

Now we can use the script scripts/convert_model.py to download and convert models:

cd scripts
python convert_model.py -lcm "SimianLuo/LCM_Dreamshaper_v7" -t FP16

C++ Pipeline

Pipeline flow

Let’s now talk about the logical structure of the LCM model pipeline.

Just like the classic Stable Diffusion pipeline, the LCM pipeline consists of three important parts:
- A text encoder to create a condition to generate an image from a text prompt.
- U-Net for step-by-step denoising the latent image representation.
- Autoencoder (VAE) for decoding the latent space to an image.

The pipeline takes a latent image representation and a text prompt transformed to text embedding via CLIP’s text encoder as an input. The initial latent image representation is generated using random noise generator. LCM uses a guidance scale for getting time step conditional embeddings as input for the diffusion process, while in Stable Diffusion, it used for scaling output latents.

Next, the U-Net iteratively denoises the random latent image representations while being conditioned on the text embeddings. The output of the U-Net, being the noise residual, is used to compute a denoised latent image representation via a scheduler algorithm. LCM introduces its own scheduling algorithm that extends the denoising procedure introduced by denoising diffusion probabilistic models (DDPMs) with non-Markovian guidance. The denoising process is repeated for a given number of times to step-by-step retrieve better latent image representations. When complete, the latent image representation is decoded by the decoder part of the variational auto encoder.

The C++ implementations of the scheduler algorithm and LCM pipeline are available at the following links: LCM Scheduler, LCM Pipeline.

LoRA support

LoRA (Low-Rank Adaptation) is a training technique for fine-tuning Stable Diffusion models. There are various LoRA models available on https://civitai.com/tag/lora.

The main idea for LoRA weights enabling, is to append weights onto the OpenVINO LCM models at runtime before compiling the Unet/text_encoder model. The method is to extract LoRA weights from safetensors file, find the corresponding weights in Unet/text_encoder model and insert the LoRA bias weights. The common approach to add LoRA weights looks like:

The original LoRA safetensor model is loaded via safetensors.h. The layer name and weight of LoRA are modified with Eigen Lib and inserted into Unet/text_encoder OpenVINO model using ov::pass::MatcherPass - you can see the implementation in the file common/diffusers/src/lora.cpp.

To run the LCM demo with the LoRA model, first download LoRA, for example: LoRa/Soulcard.

Build and Run LCM demo

Let’s start with the dependencies installation:‍

conda activate openvino_lcm_cpp
conda install -c conda-forge eigen c-compiler cxx-compiler make

Now we can build the application:

cmake -DCMAKE_BUILD_TYPE=Release -S . -B build
cmake --build build --config Release --parallel
cd build

‍‍And finally we’re ready to run the LCM demo. By default the positive prompt is set to: “a beautiful pink unicorn”.

Please note, that the quality of the resulting image depends on the quality of the random noise generator, so there is a difference for output images generated by the C++ noise generator and the PyTorch generator. Use oprion -r to read the PyTorch generated noise from the provided textfiles for the alignment with Python pipeline.
‍

Note: Run ./lcm_dreamshaper -h to see all the available demo options

Let’s try to run the application in a few modes:

Read the numpy latent input and noise for scheduler instead of C++ std lib for the alignment with Python pipeline: ./lcm_dreamshaper -r‍

Generate image with C++ std lib generated latent and noise : ./lcm_dreamshaper

‍

‍

Generate image with Soulcard LoRa and C++ generated latent and noise: ./lcm_dreamshaper -r -l path/to/soulcard.safetensors

‍

Deploying deep-learning capabilities to edge devices can present security challenges like ensuring inference integrity, or providing copyright protection of your deep-learning models. OpenVINO provide a simple method with crypto algorithm to protect model in disk. Model encryption, decryption and authentication are not provided by OpenVINO but can be implemented with third-party tools (i.e., OpenSSL). In this example, we use AES-128-cbc algorithm in OpenSSL to demonstrate the model cryptography.

As you can see the mechanism in below image, there are two part to process:

First is to encrypt your plain IR model into encrypted model.
The second part is to use the same password key and IV which used for encryption before to decrypt model at model loading runtime.

The schema of model encryption and decryption by OpenVINO

Step 1: Encrypt model

Make sure you install the OpenSSL and boost, for example in Ubuntu:


$ sudo apt install openssl libboost-dev

Then use command line to do model encryption by OpenSSL AES-128-CBC algorithm. In this simply example, I use same password for Key and IV, it is hexadecimal of string "openvino encrypt". You can use some online str2hex tool to generate hex representation of your string password.


$ openssl enc -aes-128-cbc -in openvino_model.xml -out openvino_model_enc.xml -K 6f70656e76696e6f20656e6372797074 -iv 6f70656e76696e6f20656e6372797074
$ openssl enc -aes-128-cbc -in openvino_model.bin -out openvino_model_enc.bin -K 6f70656e76696e6f20656e6372797074 -iv 6f70656e76696e6f20656e6372797074

Step 2: Decrypt model

Here provide the sample code to read encrypted model into buffer and decrypt to plain model binary. Then read and compile model.


#include <fstream>
#include <iostream>
#include <vector>
#include <cmath>
#include <cctype>
#include <string>
#include <openvino/runtime/core.hpp>
#include <openssl/aes.h>
#include <boost/algorithm/hex.hpp>

using namespace std;

vector<unsigned char> aes_128_cbc_decrypt(
    vector<unsigned char> &cipher,
    std::vector<unsigned char> &key,
    std::vector<unsigned char> iv) {

    AES_KEY ctx;
    AES_set_decrypt_key(key.data(), 128, &ctx);
    std::vector<uint8_t> plain;
    //cipherLen = clearLen + 16 - (clearLen mod 16)
    int plain_size = ceil(cipher.size()/16)*16; //make sure alloc buffer is enough to plain_size
    plain.resize(plain_size);
    std::cout << "AES_cbc_encrypt start:" << std::endl;
    AES_cbc_encrypt(cipher.data(), plain.data(), plain.size(), &ctx, iv.data(), AES_DECRYPT);
    std::cout << "AES_cbc_encrypt done" << std::endl;
    return plain;
}

void decrypt_file(std::ifstream & stream,
                  std::vector<unsigned char> & key,
                  std::vector<unsigned char> & iv,
                  std::vector<uint8_t> & result) {
    std::vector<unsigned char> cipher((std::istreambuf_iterator<char>(stream)),  std::istreambuf_iterator<char>());
    std::cout << "aes_128_cbc_decrypt" << std::endl;
    std::vector<unsigned char> decrypt_model = aes_128_cbc_decrypt(cipher, key, iv);
    result = decrypt_model;

}

int main() {
    std::string key_hex = "6f70656e76696e6f20656e6372797074";
    std::string iv_hex = "6f70656e76696e6f20656e6372797074";
    std::vector<unsigned char> key_bytes;
    std::vector<unsigned char> iv_bytes;
    boost::algorithm::unhex(key_hex, std::back_inserter(key_bytes));
    boost::algorithm::unhex(iv_hex, std::back_inserter(iv_bytes));
    std::vector<uint8_t> model_data, weights_data;
    std::ifstream model_file("openvino_model_enc.xml",std::ios::in | std::ios::binary), weights_file("openvino_model_enc.bin",std::ios::in | std::ios::binary);
    // Read model files and decrypt them into temporary memory block
    std::cout << "decrypt file" << std::endl;
    decrypt_file(model_file, key_bytes, iv_bytes, model_data); //key & iv is the same
    decrypt_file(weights_file, key_bytes, iv_bytes, weights_data);
    ov::Core core;
    // Load model from temporary memory block
    std::string str_model(model_data.begin(), model_data.end());
    std::unique_ptr<ov::InferRequest> infer_request= std::make_unique<ov::InferRequest>(core.compile_model(str_model,ov::Tensor(ov::element::u8, {weights_data.size()}, weights_data.data()),"CPU").create_infer_request());
    std::cout << "compile success" << std::endl;
    return 0;
}

CMakeLists.txt file like below for compiling:


cmake_minimum_required(VERSION 3.5)
set(CMAKE_CXX_STANDARD 23)
set(CMAKE_BUILD_TYPE "Release" CACHE STRING "CMake build type")
add_compile_options(-O3 -march=native -Wall)

find_package(OpenVINO REQUIRED)
find_package(OpenSSL REQUIRED)
find_package(Boost REQUIRED)

add_executable(model_crypto main.cpp)
target_include_directories(model_crypto PRIVATE ${OV_INCLUDE_DIR} )
target_link_libraries(model_crypto PRIVATE openvino::runtime OpenSSL::SSL Boost::headers)

This blog just provide an example of model encryption by OpenSSL. This method can only protect you model in disk, for total memory crypto, you can refer technologies like OpenVINO™ Security Add-on in virtual machine to provide an isolated environment for security sensitive operations, and use Intel® SGX (Software Guard Extensions) which allows developers to split a computer's memory into private, predefined, highly secure areas called enclaves, which better protect sensitive information.

‍

Reference:

OpenVINO model protection: https://docs.openvino.ai/2023.1/openvino_docs_OV_UG_protecting_model_guide.html
OpenVINO™ Security Add-on: https://docs.openvino.ai/2023.1/ovsa_get_started.html
OpenSSL official website: https://www.openssl.org/

How to build and run OpenVino™ C++ Benchmark Application for Linux

October 10, 2023

October 13, 2023

Introduction

The OpenVINO™ Benchmark Application estimates deep learning inference performance on supported devices for synchronous and asynchronous modes.

NOTE: This guide describes the usage of the C++ implementation of the Benchmark Tool. For the Python implementation, refer to the Benchmark Python Tool page. The Python version is recommended for benchmarking models used in Python applications, and the C++ version is recommended for benchmarking models used in C++ applications.

In this tutorial, we will guide you through building and running the C++ implementation of the Benchmark Tool on Ubuntu with OpenVINO™ 2023.1.0 release and demonstrate its usage by benchmarking the Inception (GoogleNet) V3 deep learning model. The following steps outline the process:

Download and Convert the Model
Install OpenVINO™ Runtime
Build OpenVINO™ C++ Runtime Samples
Run the Benchmark Application

The benchmark application works with models in the OpenVINO™ IR (.xml and .bin), ONNX (.onnx), TensorFlow (*.pb), TensorFlow Lite (*.tflite) and PaddlePaddle (*.pdmodel) formats. Make sure to convert your models if necessary (see "Model conversion to OpenVINO™ IR format" step below).

Requirements

Before getting started, ensure that you have the following requirements in place:

Ubuntu 18.04 or higher
CMake version 3.10 or higher

Step 1: Install OpenVINO™

To get started, first install OpenVINO™ Runtime C++ API.

Download and Setup OpenVINO™ Runtime archive file for Linux for your system. The following steps describe the installation process for Ubuntu 20.04 x86_64 system:

1. Download the archive file, extract the files, rename the extracted folder, and move it to the desired path:

curl -L https://storage.openvinotoolkit.org/repositories/openvino/packages/2023.1/linux/l_openvino_toolkit_ubuntu20_2023.1.0.12185.47b736f63ed_x86_64.tgz --output openvino_2023.1.0.tgz
tar -xf openvino_2023.1.0.tgz
sudo mkdir /opt/intel/openvino_2023.1.0
sudo mv -v l_openvino_toolkit_ubuntu20_2023.1.0.12185.47b736f63ed_x86_64/* /opt/intel/openvino_2023.1.0

2. Install required system dependencies on Linux. To do this, OpenVINO provides a script in the extracted installation directory. Run the following command:

cd /opt/intel/openvino_2023.1.0
sudo -E ./install_dependencies/install_openvino_dependencies.sh

3. For simplicity, it is useful to create a symbolic link as below:

cd /opt/intel
sudo ln -s openvino_2023.1.0 openvino_2023

4. Set OpenVINO™ environment variables. Open a terminal window and run the setupvars.sh script to temporarily set your environment variables. If your <INSTALL_DIR> is not /opt/intel/openvino_2023, use the correct one instead:

source /opt/intel/openvino_2023/setupvars.sh

‍

Step 2: Build OpenVINO™ C++ Runtime Samples

In the existing terminal window where the OpenVINO™ environment is set up, navigate to the /opt/intel/openvino_2023.1.0/samples/cpp directory and run the /build_samples.sh script:

cd /opt/intel/openvino_2023.1.0/samples/cpp
./build_samples.sh

As a result of a successful build, you'll get the message with a path to the sample binaries:

...
[100%] Linking CXX executable ../intel64/Release/benchmark_app
[100%] Built target benchmark_app
[100%] Built target ie_samples

Build completed, you can find binaries for all samples in the /home/user/openvino_cpp_samples_build/intel64/Release subfolder.

NOTE: You can also use the -b option to specify the sample build directory and -i to specify the sample install directory, for example:

./build_samples.sh -b /home/user/ov_samples/build -i /home/user/ov_samples

NOTE: The build_samples.sh script will build all the samples in the /opt/intel/openvino_2023.1.0/samples/cpp folder. Remove the other samples from the folder if you want to build only a few samples or only the benchmark_app.

Step 3: Run the Benchmark Application

NOTE: You can use your model for benchmark running or if necessary download model for demo using the Model Downloader. You can find pre-trained models from either public models or Intel’s pre-trained modelsfrom the OpenVINO™ Open Model Zoo. Following are the steps to install the tools and obtain the IR for the Inception (GoogleNet) V3 PyTorch model:

pip install "openvino-dev>=2023.1.0"
omz_downloader --name googlenet-v3-pytorch
omz_converter --name googlenet-v3-pytorch --precisions FP32

The googlenet-v3-pytorch IR files will be located at: <CURRENT_DIRECTORY>/public/googlenet-v3-pytorch/FP32

Navigate to the samples binaries folder and run the benchmark_app with the following command:

cd /home/user/openvino_cpp_samples_build/intel64/Release
./benchmark_app -m path/to/public/googlenet-v3-pytorch/FP32/googlenet-v3-pytorch.xml

By default, the application will load the specified model onto the CPU and perform inferencing on batches of randomly generated data inputs for 60 seconds. As it loads, it prints information about benchmark parameters. When benchmarking is completed, it reports the minimum, average, and maximum inferencing latency and average the throughput.

NOTE: You can use images from the media files collection available at test_data and infer with specific input data using the -i argument to benchmark_app.

You may be able to improve benchmark results beyond the default configuration by configuring some of the execution parameters for your model. Please find other options for configuring execution parameters here: Benchmark C++ Tool Configuration Options

Model conversion to OpenVINO™ IR format

You can use OpenVINO™ Model Converter to convert your model to Intermediate Representation (IR) when necessary:

1. Install OpenVINO™ for Python which includes the necessary components for utilizing the OpenVINO™ Model Converter.

NOTE: Ensure you install the same version of OpenVINO™ Runtime Package for Python as the OpenVINO™ Runtime C++ API in step 2.

pip install "openvino>=2023.1.0"

2. To convert the model to IR, run Model Converter:

ovc INPUT_MODEL

Install OpenVINO™ Runtime on Linux from an Archive File

Transition from Legacy Conversion API¶

OpenVINO™ Benchmark C++ Tool

OpenVINO™ Samples Overview

OpenVINO™ Development Tools

Running OpenVINO™ C++ samples on Visual Studio

‍

Kunda

Xu

OpenVINO optimizer Latent Diffusion Models (LDM) for super-resolution

OpenVINO optimizer Latent Diffusion Models(LDM) for super-resolution

Introduction

Requirement

QuickStart Demo

Step 1. Install the requirementpackage and initial environment

Step 2. Convert originalmodel to OpenVINO IR model

Step 3. Build OpenVINOsuper resolution pipeline

Inference Result

OpenVINO Latent Consistency Model C++ pipeline with LoRA model support

Introduction

Model Conversion

LCM model

Tokenizer

Conversion step

C++ Pipeline

Pipeline flow

LoRA support

Build and Run LCM demo

See Also

Use Encrypted Model with OpenVINO

Step 1: Encrypt model

Step 2: Decrypt model

Reference:

How to build and run OpenVino™ C++ Benchmark Application for Linux

Introduction

Requirements

Step 1: Install OpenVINO™

Step 2: Build OpenVINO™ C++ Runtime Samples

Step 3: Run the Benchmark Application

Model conversion to OpenVINO™ IR format

Related Articles