Model Enable

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Sort By:

Enable OpenVINO™ Optimization for WeNet

February 8, 2023


The WeNet model provides two-pass approach to unify streaming and non-streaming end-to-end (E2E) speech recognition which is widely used with various HW platforms. In this blog, we provide the OpenVINO™ optimization for WeNet on Intel® platforms.

The public WeNet project is referenced from: wenet-e2e/wenet

The WeNet model can be considered as a pipeline which is split into 3 parts for decoder, CTC and encoder. Refer the model structure in below picture:

WeNet model processing flow

We implement the wrapper function of Automatic Speech Recognition (ASR) model class with OpenVINO™ runtime API programming for these 3 models’ data preparation and inference. Please refer the integrated OpenVINO™ optimization in official project: wenet-e2e/wenet/runtime/openvino

OpenVINO™backend on WeNet

In this project, you do not require to download OpenVINO™ and build the library with WeNet project manually. It’s already fully integrated with OpenVINO™ runtime library for downloading, program compiling and linking. If your operating system is not one of OpenVINO™ runtime library supported, the script will download OpenVINO™ source from Github, and build with CPU plugin to support.

At present, this repository already optimized and validated by OpenVINO™ 2022.3.0 version. Check the operating system which can support OpenVINO™ runtime library directly:

  • Windows* 10
  • CentOS 7, Red Hat* Enterprise Linux* 8
  • Ubuntu* 18.04, 20.04
  • Debian 9.13 for X86
  • macOS* 10.15
git clone
cd wenet

Step 1: Get pretrained ONNX model (Optional)

If you already have the exported ONNX model for WeNet test, you can skip this step.

For users to get pretrained model from WeNet project, you can refer this link:

Export to 3 ONNX models, including encoder.onnx, ctc.onnx and decoder.onnx by export_onnx_cpu script.

python -m wenet.bin.export_onnx_cpu \
  --config ${model_path}/train.yaml \
  --checkpoint ${model_path}/ \
  --chunk_size 16 \
  --output_dir ${onnx_dir}\
  --num_decoding_left_chunks -1

Step 2: Convert ONNX model to OpenVINO™ Intermediate Representation (IR)

Make sure your python environment already installed OpenVINO™ runtime library.

pip install openvino

Convert these three ONNX models into IR by OpenVINO™ Model Optimizer command:

mo --input_model ${onnx_dir}/encoder.onnx --input chunk,att_cache,cnn_cache --input_shape [1,-1,80],[12,4,-1,128],[12,1,256,7] --output_dir ${openvino_dir} 
mo --input_model ${onnx_dir}/ctc.onnx --input_shape [1,-1,256] --output_dir ${openvino_dir}
mo --input_model ${onnx_dir}/decoder.onnx --input hyps,hyps_lens,encoder_out --input_shape [-1,-1],[-1],[1,-1,256] --output_dir ${openvino_dir}

Step 3: Build WeNet with OpenVINO™ backend

Please refer system requirement to check if the hardware platform available by OpenVINO™. It will download and install OpenVINO™ library during the CMake configuration.

cd ./runtime/openvino
mkdir build && cd build
make --jobs=$(nproc --all)

Some users may cannot easily download OpenVINO™ binary package from server due to firewall or proxy issue. If you failed to download by CMake script, you can download OpenVINO™ package by your selves and put the package to below path:


If you already have OpenVINO™ runtime which is manually built before the WeNet building, you can put the runtime library to below path:


Step 4: Simple inference test

You may run the inference test like below with the speech input audio file (.wav) and model unit file (.txt):

./bin/decoder_main \
    --chunk_size 16 \
    --wav_path ${wav_path} \
    --openvino_dir ${openvino_dir} \
    --unit_path ${unit_path}

The information of OpenVINO™ integration and results will be print out:

    Version : 2022.3.0
    Build   : 2022.3.0-9052-9752fafe8eb-releases/2022/3

Get Encoder input chunk
Get Encoder input offset
Get Encoder input att_cache
Get Encoder input cnn_cache
test 如果你尝试了就会知道这是个很有趣的例子

Extend OpenVINO™ to run PyTorch models with custom operations

February 1, 2023

Authors: Anna Likholat, Nico Galoppo

The OpenVINO™ Frontend Extension API lets you register new custom operations to support models with operations that OpenVINO™ does not support out-of-the-box. This article explains how to export the custom operation to ONNX, add support for it in OpenVINO™, and infer it with the OpenVINO™ Runtime.

The full implementation of the examples in this article can be found on GitHub in the openvino_contrib.

Export a PyTorch model to ONNX

Let's imagine that we have a PyTorch model which includes a new complex multiplication operation created by user (this operation was taken from DIRECT):

def complex_multiplication(input_tensor: torch.Tensor, other_tensor: torch.Tensor) -> torch.Tensor:
    assert_complex(input_tensor, complex_last=True)
    assert_complex(other_tensor, complex_last=True)
    complex_index = -1
    real_part = input_tensor[..., 0] * other_tensor[..., 0] - input_tensor[..., 1] * other_tensor[..., 1]
    imaginary_part = input_tensor[..., 0] * other_tensor[..., 1] + input_tensor[..., 1] * other_tensor[..., 0]
    multiplication =
    return multiplication

class MyModel(nn.Module):
    def __init__(self):
      super(MyModel, self).__init__()

    def forward(self, x, y):
      return complex_multiplication(x, y)

We'd like to export the model to ONNX and preserve complex multiplication operations as single fused nodes in the ONNX model graph, so that we can replace those nodes with custom OpenVINO operations down the line. If we were to export MyModel which directly calls the function above from its forward method, then onnx.export() would inline the PyTorch operations into the graph. This can be observed in the figure of the exported ONNX model below.

To prevent inlining of native PyTorch functions during ONNX export, we can wrap the function in a sub-class of torch.autograd.Function and define a static symbolic method. This method should return ONNX operators that represent the function's behavior in ONNX. For example:

class ComplexMul(torch.autograd.Function):
    def symbolic(g, input_tensor, other_tensor, is_conj = True):
      return g.op("ComplexMultiplication", input_tensor, other_tensor, is_conj_i=int(is_conj))

    def forward(ctx, input_tensor, other_tensor):
      return complex_multiplication(input_tensor, other_tensor)

class MyModel(nn.Module):
    def __init__(self):
      super(MyModel, self).__init__()
      self.complex_mul = ComplexMul()

    def forward(self, x, y):
      return self.complex_mul.apply(x, y)

You can find the full implementation of the wrapper class here:

So now we're able to export the model with custom operation nodes to ONNX representation. You can reproduce this step with the script:

cd modules/custom_operations
python -m examples.complex_mul.export_model

The resulting ONNX model graph now has a single ComplexMultiplication node, as illustrated below:

Enable custom operation for OpenVINO with Extensibility Mechanism

Now we can proceed with adding support for the ComplexMultiplication operation in OpenVINO. We will create an extension library with the custom operation for OpenVINO. As described in the Custom OpenVINO Operations docs, we start by deriving a custom operation class from the ov::op::Op base class, as in complex_mul.hpp.

1. Implement Operation Constructors

Implement the default constructor and constructors that optionally take the operation inputs and attributes as parameters. (code)

2. Override methods

2.1 validate_and_infer_types() method

Validates operation attributes and calculates output shapes using attributes of the operation: complex_mul.cpp.

2.2 clone_with_new_inputs() method

Creates a copy of the operation with new inputs: complex_mul.cpp.

2.3 has_evaluate() method

Defines the contstraints for evaluation of this operation: complex_mul.cpp.

2.4 evaluate() method

Implementation of the custom operation: complex_mul.cpp

3. Create an entry point

Create an entry point for the extension library with the OPENVINO_CREATE_EXTENSIONS() macro, the declaration of an extension class might look like the following:


        // Register operation itself, required to be read from IR

        // Register operaton mapping, required when converted from framework model format

This is implemented for the ComplexMultiplication operation in ov_extension.cpp.

4. Configure the build

Configure the build of your extension library using CMake. Here you can find the template of such script:

set(TARGET_NAME "complex_mul_extension")

find_package(OpenVINO REQUIRED)

set(SRC complex_mul.cpp ov_extension.cpp)

add_library(${TARGET_NAME} MODULE ${SRC})

target_link_libraries(${TARGET_NAME} PRIVATE openvino::runtime)

Also see an example of the finished CMake script for module with custom extensions here: CMakeLists.txt.

5. Build the extension library

Next we build the extension library using CMake. As a result, you'll get a dynamic library - on Linux it will be called, after the TARGET_NAME defined in the CMakeLists.txt above.

cd user_ie_extensions
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release && cmake --build . --parallel 4

Deploy and run the custom model

You can deploy and run the exported ONNX model with custom operations directly with the OpenVINO Python API. Before we load the model, we load the extension library into the OpenVINO Runtime using the add_extension() method.

from openvino.runtime import Core
core = Core()

# Load extension library

Now you're ready to load the ONNX model, and infer with it. You could load the model from the ONNX file directly using the read_model() method:


Alternatively, you can convert to an OpenVINO IR model first using Model Optimizer, while pointing at the extension library:

mo --input_model model.onnx --extension /path/to/

Note that in this case, you still need to load the extension library with the add_extension() method prior to loading the IR into your Python application.

The complete sequence of exporting, inferring, and testing the OpenVINO output against the PyTorch output can be found in the custom_ops test code.

See Also


OpenVINO™ optimize Fairseq S2T model

December 14, 2022

OpenVINO™ Optimize Fairseq S2T Model


Fairseq is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks.

There are 2 steps to generate model ready for OpenVINO™ acceleration:

1. Use torch.export.onnx function convert the “.pt” model to “.onnx” model;

2. Use OpenVINO™ MO toolkit convert the “.onnx” model to “IR” model.

Figure 1. OpenVINO™ optimize Fairseq model workflow

The following graph is the Fairseq framework inference workflow, it defines the model structure by “Model Config”, composes “Model Definition List” through multiple subgraph models, and dynamically loads the submodules in the model inference runtime.

Such as in the S2T task, model consists of two parts: Encoder and Decoder.
·       Encoder is for extracting feature information from audio file.
·       Decoder is for decoding the feature information to generate text information.

Fairseq Inference workflow

The length of audio information will affect the length of the feature information, and the length of the feature information will affect the Decoder submodule loop’s times. Therefore, the structure of the S2T model is dynamically defined according to the length of the input audio.

Figure 2. Fairseq S2T model inference workflow

To optimize Fairseq framework model there’re 4 challenges need to be solved:
-       Fairseq define submodules for various function, include variable in model layer define.
-       Model structure is dynamically loaded in runtime and can’t export a whole torch model graph.
-       Encoder and Decoder part models’ input shapes are dynamic, depending on input data size.
-       Decoder part loop times depends by input sequence lengths.

OpenVINO™ optimize Fairseq workflow

So that we should use some optimization tricks to solve these problems, to make sure the pipeline optimized by OpenVINO™.
-       Divide model into Encoder and Decoder two parts, and separately export to onnx model,
-       Because of the model structure define by input seq_len, should export dynamic shape onnx model.
-       Convert onnx to IR model by OpenVINO™ MO toolkit.
-       Replace the Fairseq S2T task pipeline Encoder and Decoder into IR model.
-       Loading Inference Engine to run pipeline the pipeline on OpenVINO™.

Figure 3. OpenVINO™ optimize Fairseq S2T model workflow


-       Fairseq is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks
-       OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference which can boost deep learning performance in computer vision, automatic speech recognition, natural language processing and other common task.
-       Python version >=3.8
-       PyTorch version >=1.10.0
Reference: GitHub: Fairseq-OpenVINO

Quick Start Demo

Step 1. Install fairseq and requirement

# Install Fairseq
git clone
cd fairseq
pip install --editable ./
#(Optional) Install the latest stable release (0.10.x)
pip install fairseq

#Install OpenVINO™
Reference: Install OpenVINO by source code for Linux
Reference: Install OpenVINO by release package

Step 2. Download audio file and pre-train model file

In this blog we refer the “S2T Example: STon CoVoST” as sample, Preparation dataset and pre-train model can follow the Fairseq original step. Also, you can use “torch audio” to convert audio file to build customer dataset.

import torchaudio
# load tensor from file
waveform, sample_rate = torchaudio.load('foo.arm') 
# save tensor to file'foo_save.wav', waveform, sample_rate)

Step 3. Modify code to export onnx

Torch model export to onnx, We should adjust the contents in fairseq/ +781 line "self.save_onnx = True" , +782 line "self.openvino_engine = False" The encoder.onnx and decoder.onnx will save in models

python fairseq_cli/ datasets/en/ --config-yaml config_st_en_de.yaml --task speech_to_text --path models/ --max-tokens 50000 --beam 1

Encoder part model export to dynamic onnx

Figure 4. Encoder part model export to dynamic onnx

Decoder part model export to dynamic onnx

Figure 5. Decoder part model export to dynamic onnx

Step 4. Convert Model to IR

Convert encoder.onnx and decoder.onnx to encoder.xml and decoder.xml

# Convert encoder onnx to IR
mo -m encoder.onnx --input "onnx::Transpose_0[-1,-1,-1],src_lengths[-1]"
# Convert decoder onnx to IR
mo -m decoder.onnx --input "prev_output_tokens[-1,-1],onnx::MatMul_1[-1,-1,-1]"

Step 5. OpenVINO™ Inference Engine optimize S2T pipeline

OpenVINO™ Inference S2T pipeline We should adjust the contents in fairseq/ +781 line "self.save_onnx = False" , +782 line "self.openvino_engine =True" Use the converted the model to run OpenVINO™ Inference S2T pipeline.

python fairseq_cli/ datasets/en/ --config-yaml config_st_en_de.yaml --task speech_to_text --path models/ --max-tokens 50000 --beam 1

OpenVINO™ Inference Engine initialization

Figure 6. OpenVINO™ Inference Engine Initialization

Encoder part inference by OpenVINO™

Figure 7. Encoder part inference by OpenVINO™
Figure 8. Encoder part inference by OpenVINO™

Decoder part inference by OpenVINO™

Figure 9. Decoder part inference by OpenVINO™
Figure 10. Decoder part inference by OpenVINO™

Inference Result

Figure 11. Inference Result