Simon Ruber:
Impact of Domain-Based Data Sampling Approaches on the Performance of Object Detection Models.
Kurzbeschreibung
Deep learning-based models have become essential for the progress in object detection, but they are heavily dependent on the data with which they are trained. The optimal constitution of this training data to support the model learning the patterns of the data is not yet fully understood. One subtopic of this area is the relevance of image domains which is investigated within this thesis. Image domains and their associated classes influence the visual attributes of images and therefore lead to systematic differences between domain classes (e.g., time of the day or weather). This work proposes a process to evaluate the relevance of image domain classes and to measure their impact on object detection models. Further on, the impact of image domain-based sampling on the model performance is evaluated. The BDD100K dataset was used as the data source for the experiments. Cleaning and label validation processes were developed to prepare the dataset. The relevance of an image domain class and the impact of domainbased sampling were tested with the YOLOv5s-P6 model. Twelve image domain classes, belonging to three image domains (weather, time of the day and scene) were investigated. Ten out of twelve image domain classes are considered relevant for the performance of the object detection model. Three model groups, trained with stratified sampled data, were tested against models trained with randomly sampled data. Stratified sampling was not superior in any of the conducted comparisons. Instead, the object size distribution in the training data of the models showed significant impact on the model performances.