Károly, István Artúr and Nádas, Imre and Galambos, Péter (2024) Synthetic Multimodal Video Benchmark (SMVB): Utilizing Blender for rich dataset generation. In: IEEE 22nd World Symposium on Applied Machine Intelligence and Informatics : SAMI 2024 : Proceedings. IEEE, Danvers (MA), pp. 65-70. ISBN 9798350317190; 9798350317206
![]() |
Text
08_Synthetic_Multimodal_Video_Benchmark_SMVB_Utilizing_Blender_for_rich_dataset_generation.pdf - Published Version Restricted to Repository staff only Download (4MB) | Request a copy |
Abstract
Deep Learning methods for visual tasks have seen significant improvements in accuracy and resilience. To enhance their performance, many approaches now leverage multiple modalities or train models for various tasks concurrently. Datasets encompassing rich annotations and multiple modalities enable training and assessing a broader spectrum of methods and facilitate the development of more intricate models. However, a dilemma arises concerning dataset diversity and the richness of annotations. Datasets that support multiple tasks tend to be constrained to specific domains, while datasets encompassing diverse scenarios often concentrate on a single task with a single annotation type. Synthetic data offers a potential resolution to this challenge, combining the advantages of both approaches. However, current synthetic datasets are also subject to a similar trade-off, either having various annotations and modalities but limited domains (e.g., traffic scenarios) or extensive data variation with limited support for different tasks. This paper introduces a novel annotation approach for generating synthetic datasets by expanding upon the Blender Annotation Tool. The enhanced approach can automatically produce ground-truth data of segmentation masks, depth maps, surface normals, and optical flow in synthetic 3D scenes within Blender after minimal manual setup. Utilizing this extended annotation method, we have created the initial subset of a synthetic benchmark dataset known as the Synthetic Multimodal Video Benchmark. This illustrates that synthetic datasets can be generated with substantial diversity and rich annotations by harnessing freely available 3D scenes from a wide array of domains on the internet.
Item Type: | Book Section |
---|---|
Subjects: | T Technology / alkalmazott, műszaki tudományok > T2 Technology (General) / műszaki tudományok általában |
SWORD Depositor: | MTMT SWORD |
Depositing User: | MTMT SWORD |
Date Deposited: | 29 Sep 2024 16:44 |
Last Modified: | 29 Sep 2024 16:44 |
URI: | https://real.mtak.hu/id/eprint/206361 |
Actions (login required)
![]() |
Edit Item |