Make Your Own YOLOv8 OpenVINO™ Model from Any Data Format with Datumaro

Authors: Vinnam Kim, Wonju Lee, Mark Byun, Minje Park

Dataset formats and trainer

Introduction

OpenVINO™ provides an easy way to deploy your model with the best inference performance on any Intel hardwares. However, to train your own model for deployment you need to prepare a training framework and dataset. Fortunately, there are many ready-to-use training frameworks and implementations. Then, what about the dataset? A specific training framework requires a specific data format, but there are many data formats in the world. For example, in object detection tasks there are data formats such as YOLO, COCO, and Pascal VOC that are widely used. These formats have different directory structures and annotation file formats as well as different extensions such as txt, json, and, xml, respectively. It's tedious task to convert dataset from one format to another whenever you adopt different training framework.

Let's assume you choose Detectron2, which only supports COCO format datasets. If your dataset is formatted as VOC, you have to convert it into COCO format. Below, we compare the directory structures and annotation file formats of both datasets, VOC and COCO. These datasets have distinct formats and you need to implement codes for format conversion at each time of handling different formats. Of course, this is not technically challenging but this may require tedious code work and debugging for several days. It won't be good to repeat this process if you intend to add more datasets with different formats.

Directory structures of VOC and COCO (Click me!)
VOC COCO

└─ Dataset/
   ├── Annotations/
   │     ├── ann1.xml
   │     ├── ann2.xml
   │     └── ...
   ├── JPEGImages/
   │    ├── img1.jpg
   │    ├── img2.jpg
   │    └── ...
   ├── SegmentationClass/
   │    ├── img1.png
   │    ├── img2.png
   │    └── ...
   ├── SegmentationObject/
   │    ├── img1.png
   │    ├── img2.png
   │    └── ...
   │
   └── ImageSets/
        ├── Main/
        │   ├── test.txt
        |   ├── train.txt
        |   └── ...
        ├── Layout/
        │   ├── test.txt
        |   ├── train.txt
        |   └── ...
        ├── Action/
        │   ├── test.txt
        |   ├── train.txt
        |   └── ...
        └── Segmentation/
            ├── test.txt
            ├── train.txt
            └── ...

└─ Dataset/
    ├── images/
    │   ├── train/
    │   │   ├── img1.jpg
    │   │   ├── img2.jpg
    │   │   └── ...
    │   └── val/
    │       ├── img1.jpg
    │       ├── img2.jpg
    │       └── ...
    └── annotations/
        ├── instances_train.json
        └── ...
Annotation file formats of VOC and COCO (Click me!)
VOC (xml) COCO (json)
<?xml version="1.0" encoding="UTF-8"?>
<annotation>
  <folder>VOC2007</folder>
  <filename>img1.jpg</filename>
  <size>
    <width>20</width>
    <height>10</height>
    <depth>3</depth>
  </size>
  <segmented>1</segmented>
  <object>
    <name>cat</name>
    <pose>Unspecified</pose>
    <truncated>1</truncated>
    <difficult>0</difficult>
    <bndbox>
      <xmin>1</xmin>
      <ymin>2</ymin>
      <xmax>3</xmax>
      <ymax>4</ymax>
    </bndbox>
  </object>
  <object>
    <name>dog</name>
    <bndbox>
      <xmin>4</xmin>
      <ymin>5</ymin>
      <xmax>6</xmax>
      <ymax>7</ymax>
    </bndbox>
    <part>
      <name>head</name>
      <bndbox>
        <xmin>5.5</xmin>
        <ymin>6</ymin>
        <xmax>7.5</xmax>
        <ymax>8</ymax>
      </bndbox>
    </part>
    <actions>
      <other>1</other>
      <jumping>0</jumping>
      <phoning>1</phoning>
      <playinginstrument>0</playinginstrument>
      <reading>1</reading>
      <ridingbike>0</ridingbike>
      <ridinghorse>1</ridinghorse>
      <running>0</running>
      <takingphoto>1</takingphoto>
      <usingcomputer>0</usingcomputer>
      <walking>1</walking>
    </actions>
  </object>
</annotation>
{
  "licenses": [
    {
      "name": "",
      "id": 0,
      "url": ""
    }
  ],
  "info": {
    "contributor": "",
    "date_created": "",
    "description": "",
    "url": "",
    "version": "",
    "year": ""
  },
  "categories": [
    {
      "id": 1,
      "name": "cat",
      "supercategory": ""
    },
    {
      "id": 2,
      "name": "dog",
      "supercategory": ""
    },
  ],
  "images": [
    {
      "id": 5,
      "width": 10,
      "height": 5,
      "file_name": "img1.jpg",
      "license": 0,
      "flickr_url": "",
      "coco_url": "",
      "date_captured": 0
    }
  ],
  "annotations": [
    {
      "id": 1,
      "image_id": 5,
      "category_id": 2,
      "segmentation": [],
      "area": 3.0,
      "bbox": [
        2.0,
        2.0,
        3.0,
        1.0
      ],
      "iscrowd": 0
    }
  ]
}

Dataset Management Framework (Datumaro) is a framework that provides Python API and CLI tools to convert, transform, and analyze datasets. Among the many features of Datumaro, we would like to introduce the data format conversion feature on this blog, which is one of the fundamental feature for handling many datasets with different training frameworks. Datumaro supports the import and export of over 40 computer vision data formats (please take a look at supported formats for details!). This means that you can easily change your data format through Datumaro. If your model training framework can only read specific formats, don't worry. Use Datumaro and convert it!

Train YOLOv8 model and export it to OpenVINO™ model

  • Prepare dataset
  • Convert dataset with Datumaro
  • Train with YOLOv8 and export to OpenVINO™ IR

YOLOv8 is a well-known model training framework for object detection and tracking, instance segmentation, image classification, and pose estimation tasks. It provides simple CLI commands to train, test, and export a model to OpenVINO™ Intermediate Representation (IR). However, the data format consumed by YOLOv8 is slightly different from the YOLO format itself. Datumaro named it refers to it as YOLO-Ultralytics format. As you can see here, it requires a special meta file to indicate annotation files for each subset and subset files to list subset image files. It further requires them to be placed in an appropriate directory structure. It can be very tedious to go through these details and implement dataset preprocessing when you want to train a model on your custom dataset.

On this blog, we provide an end-to-end example that covers the complete process of converting your dataset, training a model with the converted dataset, and exporting the trained model to OpenVINO™ IR. We understand that dataset conversion can be a tricky process, especially if you have annotated and built your own dataset. Therefore, we will provide an example of converting the dataset created by the popular CVAT annotation tool. By following our step-by-step guide, you will be able to convert your data format easily and accelerate the inference of your trained model with OpenVINO™.

Prepare dataset

In this section, we introduce the steps to export the project annotated by CVAT for the following workflows. You can skip this section if your dataset is formatted as a different data format and is ready to be imported by Datumaro.

CVAT project for object detection task with train, val, and test subsets

NOTE: We used the cats-and-dogs dataset for this example. You can find the reference for this dataset here.

NOTE: You should have three subsets in your project: "train", "val", and "test" (optional). If your dataset has different subset names, you have to rename them. You can do this by using Datumaro's MapSubsets transform.

We export this project to CVAT for images 1.1 data format. Datumaro can import this data format and export it to YOLO-Ultralytics format which can be consumed by YOLOv8.

Export CVAT project to CVAT for images 1.1 data format

Export CVAT project to CVAT for images 1.1 data format. After exporting the dataset, extract it to the cvat_dataset directory.

ls yolo_v8_dataset

You can see the following directory structure:

annotations.xml  images

Convert your dataset using Datumaro

You can convert the dataset located in cvat_dataset using Datumaro's CLI command as follows. For a detailed explanation of the input arguments, see here.

datum convert -i cvat_dataset -if cvat -f yolo_ultralytics -o yolo_v8_dataset -- --save-media

NOTE: If your dataset is not CVAT for images 1.1 format, you can replace -if cvat with the different input format as -if INPUT_FORMAT. Use datum detect CLI command to figure out what format your dataset is.

After the conversion, you can see that yolo_v8_dataset directory is created.

ls yolo_v8_dataset

This directory is structured as follows.

data.yaml  images  labels  test.txt  train.txt  val.txt

Train with YOLOv8 Trainer and Export to OpenVINO™ IR

In this section, we will train the YOLOv8 detector with the dataset converted in the previous section. To train a YOLOv8 detector, please execute the following command.

yolo detect train model=yolov8n.pt data=$(realpath yolo_v8_dataset/data.yaml) project=my-project

NOTE: We use data=$(realpath yolo_v8_dataset/data.yaml) to convert the relative path yolo_v8_dataset/data.yaml to the absolute path. This is because YOLOv8 needs the absolute path for the custom dataset.

After the training, the following command enables testing on the test dataset.

yolo detect val model=my-project/train/weights/best.pt data=$(realpath yolo_v8_dataset/data.yaml) split=test

Lastly, we will export your YOLOv8 detector to OpenVINO™ IR for inference acceleration on Intel devices.

yolo detect export model=my-project/train/weights/best.pt format=openvino

Using this command, the exported IR is created at this directory path, my-project/train/weights/best_openvino_model.

ls my-project/train/weights/best_openvino_model
best.bin  best.xml  metadata.yaml

Conclusion

This post provided an example of training a YOLOv8 detector on an arbitrary data format by utilizing the data format conversion feature of Datumaro and exporting the model to OpenVINO™ IR. You can refer to the executable Jupyter notebook example provided on this blog post here for step-by-step guide. Datumaro offers a range of useful features for managing datasets beyond data format conversion. You can find examples of other Datumaro features, such as noisy label detection during training with OpenVINO™ Training Extensions, in the Jupyter examples directory. For more information about Datumaro and its capabilities, you can visit the Datumaro documentation page. If you have any questions or requests about using Datumaro, feel free to open an issue here.