C++ Pipeline for Stable Diffusion v1.5 with Pybind for Lora Enabling

Authors: Fiona Zhao, Xiake Sun, Su Yang

The purpose is to demonstrate the use of C++ native OpenVINO API.

For model inference performance and accuracy, the pipelines of C++ and python are well aligned.

Source code github: OV_SD_CPP.

Step 1: Prepare Environment

Setup in Linux:

C++ pipeline loads the Lora safetensors via Pybind

conda create -n SD-CPP python==3.10
conda activate SD-CPP
conda install numpy safetensors pybind11 

C++ Dependencies:

  • OpenVINO: Tested with OpenVINO 2023.1.0.dev20230811 pre-release
  • Boost: Install with sudo apt-get install libboost-all-dev for LMSDiscreteScheduler's integration
  • OpenCV: Install with sudo apt install libopencv-dev for image saving

Notice:

SD Preparation in two steps above could be auto implemented with build_dependencies.sh in the scripts directory.

cd scripts
chmod +x build_dependencies.sh
./build_dependencies.sh

Step 2: Prepare SD model and Tokenizer Model

  • SD v1.5 model:

Refer this link to generate SD v1.5 model, reshape to (1,3,512,512) for best performance.

With downloaded models, the model conversion from PyTorch model to OpenVINO IR could be done with script convert_model.py in the scripts directory.

python -m convert_model.py -b 1 -t <INT8|FP16|FP32> -sd Path_to_your_SD_model

Lora enabling with safetensors, refer this blog.

SD model dreamlike-anime-1.0 and Lora soulcard are tested in this pipeline.

  • Tokenizer model:
  1. The script convert_sd_tokenizer.py in the scripts dir could serialize the tokenizer model IR
  2. Build OpenVINO extension:
git clone https://github.com/apaniukov/openvino_contrib/  -b tokenizer-fix-decode

Refer to PR OpenVINO custom extension ( new feature still in experiments )

  1. read model with extension in the SD pipeline

Step 3: Build Pipeline

source /Path_to_your_OpenVINO_package/setupvars.sh
conda activate SD-CPP
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make

Step 4: Run Pipeline

./SD-generate -p <posPromp> -n <negPrompt> -d <device> -s <seed> --height <output image> --width <output image> --log <use logger> -c <use cache> -e <useOVExtension> -r <readNPLatent> -m <modelPath> -t <type of model IR> -l <lora.safetensors> -a <alpha> -h <help>

Usage: OV_SD_CPP [OPTION...]

  • -p, --posPrompt arg Initial positive prompt for SD (default: cyberpunk cityscape like Tokyo New York with tall buildings at dusk golden hour cinematic lighting)
  • -n, --negPrompt arg Default negative prompt is empty with space (default: )
  • -d, --device arg AUTO, CPU, or GPU (default: CPU)
  • -s, --seed arg Number of random seed to generate latent (default: 42)
  • --height arg height of output image (default: 512)
  • --width arg width of output image (default: 512)
  • --log arg Generate logging into log.txt for debug
  • -c, --useCache Use model caching
  • -e, --useOVExtension Use OpenVINO extension for tokenizer
  • -r, --readNPLatent Read numpy generated latents from file
  • -m, --modelPath arg Specify path of SD model IR (default: /YOUR_PATH/SD_ctrlnet/dreamlike-anime-1.0)
  • -t, --type arg Specify precision of SD model IR (default: FP16_static)
  • -l, --loraPath arg Specify path of lora file. (*.safetensors). (default: /YOUR_PATH/soulcard.safetensors)
  • -a, --alpha arg alpha for lora (default: 0.75)
  • -h, --help Print usage

Example:

Positive prompt: cyberpunk cityscape like Tokyo New York with tall buildings at dusk golden hour cinematic lighting.

Negative prompt: (empty, here couldn't use OV tokenizer, check the issues for details).

Read the numpy latent instead of C++ std lib for the alignment with Python pipeline.

  • Generate image without lora
./SD-generate -r -l ""
Fig. 1 without Lora
  • Generate image with Soulcard Lora
./SD-generate -r
Fig. 2 with Lora
  • Generate the debug logging into log.txt
./SD-generate --log

Benchmark:

The performance and image quality of C++ pipeline are aligned with Python.

To align the performance with Python SD pipeline, C++ pipeline will print the duration of each model inferencing only.

For the diffusion part, the duration is for all the steps of Unet inferencing, which is the bottleneck.

For the generation quality, be careful with the negative prompt and random latent generation.

Limitation:

  • Pipeline features:
- Batch size 1
- LMS Discrete Scheduler
- Text to image
  • Program optimization: now parallel optimization with std::for_each only and add_compile_options(-O3 -march=native -Wall) with CMake
  • The pipeline with INT8 model IR not improve the performance
  • Lora enabling only for FP16
  • Random generation fails to align, C++ random with MT19937 results is differ from numpy.random.randn(). Hence, please use -r, --readNPLatent for the alignment with Python
  • OV extension tokenizer cannot recognize the special character, like “.”, ”,”, “”, etc. When write prompt, need to use space to split words, and cannot accept empty negative prompt. So use default tokenizer without config -e, --useOVExtension, when negative prompt is empty

Setup in Windows 10 with VS2019:

1. Python env: Setup Conda env SD-CPP with the anaconda prompt terminal

2. C++ dependencies:

  • OpenVINO and OpenCV:

Download and setup Environment Variable: add the path of bin and lib (System Properties -> System Properties -> Environment Variables -> System variables -> Path )

  • Boost:

- Download from sourceforge

- Unzip

- Setup: bootstrap.bat

- Build: b2.exe

- Install: b2.exe install

Installed boost in the path C:/Boost, add CMakeList with "SET(BOOST_ROOT"C:/Boost")"

3. Setup of conda env SD-CPP and Setup OpenVINO with setupvars.bat

4. CMake with build.bat like:

rmdir /Q /S build
mkdir build
cd build
cmake -G "Visual Studio 16 2019" -A x64 ^
 -DCMAKE_BUILD_TYPE=Release ^
      ..
cmake --build . --config Release
cd ..

5. Setup of Visual Studio with release and x64, and build: open .sln file in the build Dir

6. Run the SD_generate.exe