Questa tesi di dottorato indaga lo studio, la progettazione e lo sviluppo di sistemi intelligenti per il supporto automatico alla diagnostica in ambito biomedico e industriale, facendo riferimento a un quadro metodologico unificato basato sul deep learning. In entrambi i domini, l’attenzione è rivolta alla robustezza e all’affidabilità dei modelli in condizioni di dati limitati o affetti da rumore. Nel dominio biomedico, viene sviluppata un’architettura generativa mask-guided, denominata CALIMAR-GAN, per la riduzione degli artefatti metallici in tomografia computerizzata (CT). Il modello è in grado di preservare le strutture anatomiche e di migliorare metriche di realismo (ad esempio la Fréchet Inception Distance), anche su dati clinici reali, dimostrando una capacità di generalizzazione superiore rispetto alle strategie di apprendimento supervisionato accoppiato. Sulla base di questo approccio, viene successivamente progettato un framework generativo condizionale, denominato SG2Pix, per la ricostruzione di immagini fotoacustiche direttamente a partire dai sinogrammi. Il metodo analizza diverse strategie di codifica — rappresentazioni reshaped, windowed e Gramian Angular Field — nonché ingressi ibridi che combinano sinogrammi e immagini retroproiettate (back-projection, BP), al fine di incorporare priori fisicamente informati. Viene inoltre proposto un distinto framework di apprendimento supervisionato basato su architetture U-Net per il quantitative photoacoustic imaging, che sfrutta dati multi-lunghezza d’onda per la stima dell’ossigenazione del sangue (sO₂) e per la segmentazione vascolare. I risultati mostrano che l’integrazione di prior fisici consente di migliorare sia l’accuratezza di ricostruzione sia il realismo percettivo delle mappe di ossigenazione. Infine, viene implementata una pipeline ecografica in tempo reale che permette lo streaming dei frame B-mode da un sistema a ultrasuoni verso una workstation e un visore HoloLens 2. Il sistema integra la segmentazione tramite deep learning e la misura volumetrica automatica del rene, basata sull’adattamento di un ellissoide mediante analisi delle componenti principali, offrendo modalità di interazione hands-free e voice-based per un utilizzo clinician-in-the-loop. Nel dominio industriale, viene condotta una survey sistematica di oltre 220 studi sull’impiego del deep learning per l’ispezione di difetti superficiali. Viene introdotta una tassonomia bidimensionale che mette in relazione i task di riconoscimento e i paradigmi di apprendimento, evidenziando sfide aperte legate alla scarsità di dati, all’interpretabilità dei modelli e all’applicabilità in tempo reale. Sulla base di tali considerazioni, viene svolto uno studio sistematico sul one-shot learning, confrontando un foundation model (DINOv2) con architetture convenzionali CNN e ResNet18 in diversi regimi di addestramento, inclusi scenari puramente one-shot, con data augmentation e con informazione sulla classe senza difetti. Gli esperimenti mettono in luce punti di forza complementari: ResNet18 mostra una maggiore robustezza in contesti di reale scarsità di dati, mentre DINOv2 raggiunge prestazioni superiori quando sono disponibili una supervisione più ricca o informazioni contestuali, confermando il potenziale dei foundation models per un’ispezione industriale adattiva ed efficiente dal punto di vista dei dati. La tesi è organizzata in due parti principali: la prima è dedicata al dominio biomedico e affronta la riduzione degli artefatti, la ricostruzione fotoacustica, la stima dell’ossigenazione e la segmentazione ecografica in tempo reale con visualizzazione in realtà aumentata; la seconda è focalizzata sul dominio industriale e comprende una revisione estesa della letteratura sull’ispezione di difetti superficiali e analisi sperimentali sulla classificazione one-shot dei difetti mediante architetture foundation e convoluzionali.
This doctoral research investigates the study, design, and development of intelligent systems for automatic diagnostic support across biomedical and industrial imaging. A unified methodological framework based on deep learning underlies both domains, emphasizing robustness and reliability under limited or noisy data conditions. In the biomedical domain, a generative mask-guided architecture, named CALIMAR-GAN, is developed for metal artifact reduction in computed tomography (CT). The model preserves anatomical structures while enhancing realism metrics (e.g., Fréchet Inception Distance) also on real clinical data, demonstrating improved generalization compared with paired learning strategies. Building upon this foundation, a conditional generative framework (SG2Pix) is designed for photoacoustic image reconstruction directly from sinograms. The approach investigates different encoding strategies—reshaped, windowed, and Gramian Angular Field representations—as well as hybrid inputs combining sinograms with back-projected (BP) images to embed physics-informed priors. A distinct supervised learning framework based on U-Net architectures is then proposed for quantitative photoacoustic imaging, exploiting multi-wavelength data for blood oxygenation (sO2) estimation and vascular segmentation. The results indicate that incorporating physically informed priors improves both reconstruction accuracy and perceptual realism of the oxygenation maps. Finally, a real-time ultrasound (US) pipeline is implemented, enabling the streaming of B-mode frames from a US system to both a workstation and a HoloLens 2 headset. This framework integrates deep segmentation and automatic volumetric kidney measurements based on principal-component ellipsoid fitting, providing hands- and voice-based interaction for clinician-in-the-loop usability. In the industrial domain, a comprehensive survey of more than 220 studies on deep learning for surface-defect inspection is conducted. A bi-dimensional taxonomy is introduced to relate recognition tasks and learning paradigms, revealing open challenges concerning data scarcity, explainability, and real-time applicability. Building on these insights, a systematic one-shot learning study is carried out, comparing a foundation model (DINOv2) with conventional CNN and ResNet18 architectures under multiple training regimes, including pure one-shot, augmented, and good-class-informed scenarios. The experiments highlight complementary strengths: ResNet18 exhibits higher robustness in genuine low-data settings, whereas DINOv2 achieves superior performance when richer supervision or contextual cues are available, confirming the potential of foundation models for adaptive and data-efficient industrial inspection. The dissertation is organized into two main parts: the first focuses on the biomedical domain, addressing artifact reduction, photoacoustic reconstruction, oxygenation estimation, and real-time US segmentation with augmented-reality visualization; the second focuses on the industrial domain, encompassing a comprehensive literature survey on surface defect inspection and experimental analyses on one-shot defect classification with foundation and convolutional architectures.
Study, design, and development of intelligent systems for automatic diagnostics support / Scardigno, Roberto Maria. - ELETTRONICO. - (2026).
Study, design, and development of intelligent systems for automatic diagnostics support
SCARDIGNO, ROBERTO MARIA
2026
Abstract
This doctoral research investigates the study, design, and development of intelligent systems for automatic diagnostic support across biomedical and industrial imaging. A unified methodological framework based on deep learning underlies both domains, emphasizing robustness and reliability under limited or noisy data conditions. In the biomedical domain, a generative mask-guided architecture, named CALIMAR-GAN, is developed for metal artifact reduction in computed tomography (CT). The model preserves anatomical structures while enhancing realism metrics (e.g., Fréchet Inception Distance) also on real clinical data, demonstrating improved generalization compared with paired learning strategies. Building upon this foundation, a conditional generative framework (SG2Pix) is designed for photoacoustic image reconstruction directly from sinograms. The approach investigates different encoding strategies—reshaped, windowed, and Gramian Angular Field representations—as well as hybrid inputs combining sinograms with back-projected (BP) images to embed physics-informed priors. A distinct supervised learning framework based on U-Net architectures is then proposed for quantitative photoacoustic imaging, exploiting multi-wavelength data for blood oxygenation (sO2) estimation and vascular segmentation. The results indicate that incorporating physically informed priors improves both reconstruction accuracy and perceptual realism of the oxygenation maps. Finally, a real-time ultrasound (US) pipeline is implemented, enabling the streaming of B-mode frames from a US system to both a workstation and a HoloLens 2 headset. This framework integrates deep segmentation and automatic volumetric kidney measurements based on principal-component ellipsoid fitting, providing hands- and voice-based interaction for clinician-in-the-loop usability. In the industrial domain, a comprehensive survey of more than 220 studies on deep learning for surface-defect inspection is conducted. A bi-dimensional taxonomy is introduced to relate recognition tasks and learning paradigms, revealing open challenges concerning data scarcity, explainability, and real-time applicability. Building on these insights, a systematic one-shot learning study is carried out, comparing a foundation model (DINOv2) with conventional CNN and ResNet18 architectures under multiple training regimes, including pure one-shot, augmented, and good-class-informed scenarios. The experiments highlight complementary strengths: ResNet18 exhibits higher robustness in genuine low-data settings, whereas DINOv2 achieves superior performance when richer supervision or contextual cues are available, confirming the potential of foundation models for adaptive and data-efficient industrial inspection. The dissertation is organized into two main parts: the first focuses on the biomedical domain, addressing artifact reduction, photoacoustic reconstruction, oxygenation estimation, and real-time US segmentation with augmented-reality visualization; the second focuses on the industrial domain, encompassing a comprehensive literature survey on surface defect inspection and experimental analyses on one-shot defect classification with foundation and convolutional architectures.| File | Dimensione | Formato | |
|---|---|---|---|
|
38 ciclo-SCARDIGNO Roberto Maria.pdf
accesso aperto
Tipologia:
Tesi di dottorato
Licenza:
Creative commons
Dimensione
15.59 MB
Formato
Adobe PDF
|
15.59 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

