Intel® DL Streamer Optimize Media-AI pipeline on Intel® Data Center Flex dGPU by Docker

Introduction

In this blog is about How to use DL-streamer to build a complete Media-AI pipeline (Including: Video Access, Media Decode, AI Inference, Media Encode and Result Export). And the pipeline will be accelerated by OpenVINO™ and optimize to run on Flex dGPU(Intel® Data Center Flex dGPU)

Requirement

- DL-streamer
Intel® Deep Learning Streamer (Intel® DL Streamer)Pipeline Framework is an easy way to construct media analytics pipelines using Intel® Distribution of OpenVINO™ Toolkit. It leverages the open source media framework GStreamer to provide optimized media operations and Deep Learning Inference Engine from OpenVINO™ Toolkit to provide optimized inference.

- OpenVINO
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference which can boost deep learning performance in computer vision, automatic speech recognition, natural language processing and other common task.

- Docker (Optional)
Docker is an open-source platform that enables developers to build, deploy, run, update, and manage containers—standardized, executable components that combine application source code with the operating system (OS) libraries and dependencies required to run that code in any environment.

Install DL-Streamer and OpenVINO™ via Docker

Images for Intel® Data Center GPU Flex Series

Images 2022.2.0-ubuntu20-gpu419.40* are intended for Intel® Data Center GPU Flex Series and include

1.     Intel®DL Streamer 2022.2-release

2.    OpenVINO™ Toolkit 2022.2.0

3.    Drivers for Intel® Data Center GPU Flex Series, drivers version 419.40

Four images are listed below, images -devel additionally contain samples and development files, images with -dpcpp additionally contain Intel® oneAPI DPC++/C++ Compiler

Runtime image that includes GStreamer* Pipeline Framework elements

docker pull intel/dlstreamer:2022.2.0-ubuntu20-gpu419.40


Developer image that builds on runtime image containing samples,development files and a model downloader.

docker pull intel/dlstreamer:2022.2.0-ubuntu20-gpu419.40-devel

Runtime image including elements built with Intel® oneAPI DPC++/C++ Compiler

docker pull intel/dlstreamer:2022.2.0-ubuntu20-gpu419.40-dpcpp


Developer image for elements built with Intel® oneAPI DPC++/C++ Compiler

docker pull intel/dlstreamer:2022.2.0-ubuntu20-gpu419.40-dpcpp-devel

Taking “2022.2.0-ubuntu20-gpu419.40” docker images as a sample to show how to pull the docker image from docker hub.

docker pull intel/dlstreamer:2022.2.0-ubuntu20-gpu419.40
Flag 1. docker pull images from docker hub

DL-Streamer Media-AI pipeline quick start example

Make sure the pre-requirement had already installed, there is a very basic introduction to using object detection models(yolov4) to build a DL-streamer pipeline.

Step 1.Download video and yolov4-tf model file

Download video, Enter the following link into your browser to download the related files
https://www.pexels.com/photo/5325136/download

Download yolov4-tf model

git clone https://github.com/dlstreamer/pipeline-zoo-models.git


Step 2.Enter Docker and copy the files into docker container

Create andenter the docker container

docker run -it --device /dev/dri/ --user root --rm intel/dlstreamer:2022.2.0-ubuntu20-gpu419.40

Open another terminal for file copy into container ,copy video and model into docker container

docker cp pexels-george-morina-5325136.mp4 <Docker CONTAINER ID>:/home/dlstreamer
docker cp pipeline-zoo-models/storage/yolo-v4-tf_INT8 <Docker CONTAINER ID>:/home/dlstreamer
Figure 2. Copy video and IR model into docker


Step 3. Run an objectdetection Media-AI pipeline

By the following script, we can run pipeline the Media-AI objection detection on the Flex dGPU in the docker container.

gst-launch-1.0 filesrc location=/path/to/pexels-george-morina-5325136.mp4 ! decodebin ! vaapipostproc ! gvadetect model=/path/to/yolo-v4-tf_INT8/yolo-v4-tf_INT8.xml model_proc=/path/to/yolo-v4-tf_INT8/yolo-v4-tf_INT8.json device=GPU batch-size=32 pre-process-backend=vaapi-surface-sharing ! queue ! gvatrack tracking-type=short-term-imageless ! gvafpscounter ! fakesink sync=false
Figure 3. DL-streamer run pipeline on the dGPU

If want to encode the detection result and save as video file, can use the follow script

gst-launch-1.0 filesrc location=/path/to/pexels-george-morina-5325136.mp4 ! decodebin ! vaapipostproc ! gvadetect model=/path/to/yolo-v4-tf_INT8/yolo-v4-tf_INT8.xml  model_proc=/path/to/yolo-v4-tf_INT8/yolo-v4-tf_INT8.json device=GPU batch-size=32 pre-process-backend=vaapi-surface-sharing ! queue ! gvatrack tracking-type=short-term-imageless !  gvafpscounter ! vaapipostproc ! vaapih265enc rate-control=cbr bitrate=4096 ! filesinklocation=./encoded_video_track.265 sync=false

The encoded video file will save in the container and can be copied out in new terminal.

docker cp <Docker CONTAINER ID>:/home/dlstreamer encoded_video_track.265 .
Figure 4. DL-streamer yolov4 pipeline result

PS. Instruction aboutDL-streamer CLI parameter

decodebin: Auto-magically constructs a decoding pipeline using available decoders and demuxers via auto-plugging.

vaapipostproc: Consists in various post processing algorithms to be applied to VA surfaces. For e.g. scaling, deinterlacing (bob, motion-adaptive, motion-compensated), noise reduction or sharpening.

gvadetect: Performs object detection on a full-frame or region of interest (ROI) using object detection models such as YOLO v3-v5, MobileNet-SSD, Faster-RCNN etc. Outputs the ROI for detected objects.

gvatrack: Performs object tracking using zero-term, zero-term-imageless, or short-term-imageless tracking algorithms. Zero-term tracking assigns unique object IDs and requires object detection to run on every frame. Short-term tracking allows to track objects between frames, there by reducing the need to run object detection on each frame. Imageless tracking forms object associations based on the movement and shape of objects, and it does not use image data.

gvafpscounter: Measures frames per second across multiple streams in a single process.

Tuning Tips

Users can refer the different platform using case which were supported by OpenVINO™ and the device profiling API to realize performance tuning of your inference program between CPU, iGPU, dGPU. It will also be helpful to developer finding out the place where has the potential space of performance improvement.