Background. In Pancreatic Ductal Adenocarcinoma (PDAC), current prognostic scores are unable to fully capture the biological heterogeneity of the disease. While some approaches investigating the role of multi-omics in PDAC are emerging, the analysis of methylation data is under exploited. Materials and Methods. We analyzed CpG sites from two publicly available datasets, the TCGA-PAAD used as discovery set and the CPTAC-PDA as external test set. Single mutations and co-mutation of KRAS and TP53 genes were identified as targets, and differentially methylated CpG sites (DMC) were detected accordingly. We trained and validated Random Forest (RF) models to predict each target. Area Under the Receiver Operating Characteristic curve (AUROC) and Area Under the Precision-Recall curve (AUPRC) were used as performance metrics. Then, we performed consensus clustering from the DMCs to identify novel patients’ profiles. Finally, we trained and validated a combination of eXtreme Gradient Boosting (XGB) and tree models to select an epigenomic prognostic determinant. Results. From 598 DMCs extracted, an RF model predicted KRAS and TP53 co-mutation on the external test set with AUROC of 0.77 and AUPRC of 0.87. The consensus clustering allowed us to identify 4 clusters (C1, C2, C3, and C4) of patients. The C4 cluster captured a subgroup of patients with favorable Overall Survival (OS) with respect to others. The XGB model perfectly predicted C4 vs other clusters on the discovery set. In both cohorts, patients were stratified into two risk groups according to methylation levels of cg16854533, individuated as the most important CpG site. Conclusion. We analyzed methylation data to develop a classifier for the TP53 and KRAS mutational status. Four prognostic clusters were pointed out and a prognostic model using a CpG site was validated in an independent cohort. Our results evidence that the proposed use of methylation data facilitates risk stratification for PDAC.
Development and validation of a machine learning prognostic model based on an epigenomic signature in patients with pancreatic ductal adenocarcinoma / Zaccaria, Gian Maria; Altini, Nicola; Mongelli, Valentina; Marino, Francescomaria; Bevilacqua, Vitoantonio. - In: INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS. - ISSN 1386-5056. - (2025). [10.1016/j.ijmedinf.2025.105883]
Development and validation of a machine learning prognostic model based on an epigenomic signature in patients with pancreatic ductal adenocarcinoma
Zaccaria, Gian MariaConceptualization
;Altini, Nicola
Conceptualization
;Mongelli, ValentinaFormal Analysis
;Marino, FrancescomariaInvestigation
;Bevilacqua, VitoantonioConceptualization
2025
Abstract
Background. In Pancreatic Ductal Adenocarcinoma (PDAC), current prognostic scores are unable to fully capture the biological heterogeneity of the disease. While some approaches investigating the role of multi-omics in PDAC are emerging, the analysis of methylation data is under exploited. Materials and Methods. We analyzed CpG sites from two publicly available datasets, the TCGA-PAAD used as discovery set and the CPTAC-PDA as external test set. Single mutations and co-mutation of KRAS and TP53 genes were identified as targets, and differentially methylated CpG sites (DMC) were detected accordingly. We trained and validated Random Forest (RF) models to predict each target. Area Under the Receiver Operating Characteristic curve (AUROC) and Area Under the Precision-Recall curve (AUPRC) were used as performance metrics. Then, we performed consensus clustering from the DMCs to identify novel patients’ profiles. Finally, we trained and validated a combination of eXtreme Gradient Boosting (XGB) and tree models to select an epigenomic prognostic determinant. Results. From 598 DMCs extracted, an RF model predicted KRAS and TP53 co-mutation on the external test set with AUROC of 0.77 and AUPRC of 0.87. The consensus clustering allowed us to identify 4 clusters (C1, C2, C3, and C4) of patients. The C4 cluster captured a subgroup of patients with favorable Overall Survival (OS) with respect to others. The XGB model perfectly predicted C4 vs other clusters on the discovery set. In both cohorts, patients were stratified into two risk groups according to methylation levels of cg16854533, individuated as the most important CpG site. Conclusion. We analyzed methylation data to develop a classifier for the TP53 and KRAS mutational status. Four prognostic clusters were pointed out and a prognostic model using a CpG site was validated in an independent cohort. Our results evidence that the proposed use of methylation data facilitates risk stratification for PDAC.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.