AI model developments in general are often marked by a common pain point: the requirement for a large quantity of quality data. In the field of comput

How can genAI and synthetic data 4X computer vision performance?

submited by

Style Pass

2024-11-05 12:00:02

AI model developments in general are often marked by a common pain point: the requirement for a large quantity of quality data. In the field of computer vision, this challenge is even more pronounced, as the acquisition of adequate visual data is often more time-consuming and more difficult than in other fields. Real-world datasets are typically unbalanced, with the distribution of relevant classes frequently skewed, leading to underrepresented classes and scenarios that models struggle to learn. These issues result in significant difficulties for computer vision engineers during data-driven development, as a comprehensive tool for all data-related tasks would be required.

The data problem in computer vision is addressed by DiffuseDrive [1], an automatic visual data service designed to provide meaningful insights into available data, identify data gaps, and fill those gaps with photorealistic, domain-specific, and labeled (both visual and textual) synthetic data. In this case study, the focus is on demonstrating the enhancement of precision, recall, and mAP metrics of an industry-standard object detection and classification model (YOLOv5 [2]). A fourfold improvement was achieved by utilizing DiffuseDrive visual data service on a real-world, aerial, unbalanced, and underrepresented dataset.

Computer vision developments often result in products that interact with the real-world. In many industries, the environment in which the computer vision enabled products are used, are controlled, like in a factory. However, in some industries, like autonomous driving, aerial autonomy (eg. drones) or industrial/civil/defense robotics, just to name a few, the environments can be complex, uncontrollable and dynamic. That is the reason why an aerial dataset was chosen as the subject of our study, because by achieving improvements in a model trained on such data, the results can be extrapolated to other industries, with more controlled environments as well.