Imbalanced data are a non-trivial problem in deep learning. The high variability in the number of samples composing each category might force learning procedures to become biased towards classes with major cardinality and disregard classes with low instances. To overcome such limitations, common strategies involve data balancing using resampling techniques. The cardinality of overnumbered categories is often lowered by sample deletion, thus reducing the data space where the model can learn from. This paper introduces a new approach based on data balancing without sample deletion, allowing for biasing reduction in instance localization and classification tasks. The method is a multi-stage pipeline starting with data cleaning and data filtering steps and ending with the actual data balancing process, during which overnumbered samples are not deleted but divided into multiple sub-classes. In this way, the model can learn from balanced data distribution in which some classes have a high correlation factor. To evaluate the effectiveness of the method in real-life scenarios, a case study in the field of precision agriculture has been developed, motivated by the fact that the publicly available datasets for pest classification often reflect the real-world imbalanced distribution of pests, making the task challenging. Two models for the localization and recognition of pests belonging to several species are also indicated. The obtained results show the method’s validity as the performance both in the detection and classification tasks outperforms the state-of-the-art methods. The general nature of the conceived balancing technique may make the approach useful in other application fields.

Improving Classification Performance by Addressing Dataset Imbalance: A Case Study for Pest Management / Longo, Antonello; Rizzi, Maria; Guaragnella, Cataldo. - In: APPLIED SCIENCES. - ISSN 2076-3417. - ELETTRONICO. - 15:10(2025). [10.3390/app15105385]

Improving Classification Performance by Addressing Dataset Imbalance: A Case Study for Pest Management

Longo, Antonello
;
Rizzi, Maria
;
Guaragnella, Cataldo
2025

Abstract

Imbalanced data are a non-trivial problem in deep learning. The high variability in the number of samples composing each category might force learning procedures to become biased towards classes with major cardinality and disregard classes with low instances. To overcome such limitations, common strategies involve data balancing using resampling techniques. The cardinality of overnumbered categories is often lowered by sample deletion, thus reducing the data space where the model can learn from. This paper introduces a new approach based on data balancing without sample deletion, allowing for biasing reduction in instance localization and classification tasks. The method is a multi-stage pipeline starting with data cleaning and data filtering steps and ending with the actual data balancing process, during which overnumbered samples are not deleted but divided into multiple sub-classes. In this way, the model can learn from balanced data distribution in which some classes have a high correlation factor. To evaluate the effectiveness of the method in real-life scenarios, a case study in the field of precision agriculture has been developed, motivated by the fact that the publicly available datasets for pest classification often reflect the real-world imbalanced distribution of pests, making the task challenging. Two models for the localization and recognition of pests belonging to several species are also indicated. The obtained results show the method’s validity as the performance both in the detection and classification tasks outperforms the state-of-the-art methods. The general nature of the conceived balancing technique may make the approach useful in other application fields.
2025
Improving Classification Performance by Addressing Dataset Imbalance: A Case Study for Pest Management / Longo, Antonello; Rizzi, Maria; Guaragnella, Cataldo. - In: APPLIED SCIENCES. - ISSN 2076-3417. - ELETTRONICO. - 15:10(2025). [10.3390/app15105385]
File in questo prodotto:
File Dimensione Formato  
applsci-15-05385.pdf

accesso aperto

Tipologia: Versione editoriale
Licenza: Creative commons
Dimensione 7.6 MB
Formato Adobe PDF
7.6 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11589/287240
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact