This paper presents an automated procedure for optimizing datasets used in land/water segmentation tasks with deep learning models. The proposed method employs the Normalized Difference Water Index (NDWI) with a variable threshold to automatically assess the quality of annotations associated with multispectral satellite images. By systematically identifying and excluding low-quality samples, the method enhances dataset quality and improves model performance. Experimental results on two different publicly available datasets—the SWED and SNOWED—demonstrate that deep learning models trained on optimized datasets outperform those trained on baseline datasets, achieving significant improvements in segmentation accuracy, with up to a 10% increase in mean intersection over union, despite a reduced dataset size. Therefore, the presented methodology is a promising scalable solution for improving the quality of datasets for environmental monitoring and other remote sensing applications.
Optimizing Satellite Imagery Datasets for Enhanced Land/Water Segmentation / Scarpetta, Marco; De Palma, Luisa; Di Nisio, Attilio; Spadavecchia, Maurizio; Affuso, Paolo; Giaquinto, Nicola. - In: SENSORS. - ISSN 1424-8220. - ELETTRONICO. - 25:6(2025). [10.3390/s25061793]
Optimizing Satellite Imagery Datasets for Enhanced Land/Water Segmentation
Scarpetta, Marco;De Palma, Luisa
;Di Nisio, Attilio;Spadavecchia, Maurizio;Affuso, Paolo;Giaquinto, Nicola
2025-01-01
Abstract
This paper presents an automated procedure for optimizing datasets used in land/water segmentation tasks with deep learning models. The proposed method employs the Normalized Difference Water Index (NDWI) with a variable threshold to automatically assess the quality of annotations associated with multispectral satellite images. By systematically identifying and excluding low-quality samples, the method enhances dataset quality and improves model performance. Experimental results on two different publicly available datasets—the SWED and SNOWED—demonstrate that deep learning models trained on optimized datasets outperform those trained on baseline datasets, achieving significant improvements in segmentation accuracy, with up to a 10% increase in mean intersection over union, despite a reduced dataset size. Therefore, the presented methodology is a promising scalable solution for improving the quality of datasets for environmental monitoring and other remote sensing applications.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.