In recent years, a constant and fast information growing has characterized digital applications in the majority of real-life scenarios. Thus, a new information asset, namely Big Data, has been defined and lead to different challenges, mainly related to data storage, management and analysis. Focusing on the last challenge, several Big Data analytics techniques have been developed, based on Machine Learning and Deep Learning paradigms. When dealing with Big Data, traditional approaches often take a lot of time to produce even a single predictive model, due to the extremely high demand of computational resources. The design of approaches specifically oriented to Big Data is required to overcome these computational issues. Most solutions rely on the deployment of Big Data analytics infrastructures on a cluster of machines and/or on parallelization techniques. When deployment and parallelization apply to Machine Learning and Deep Learning, we can refer to the terms Distributed Machine Learning and Distributed Deep Learning, respectively. We here discuss the main principles and features of Distributed Machine Learning and Distributed Deep Learning frameworks. The main contribution of this work is a survey of solutions proposed in the literature, through the investigation of selected features and capabilities. In particular, the survey provides a comparative analysis according to the following classification criteria: implemented parallelization technique, supporting device, supported architecture, implemented communication mode, working mode, and class of algorithms. The paper also gives an overview of the most commonly used criteria and metrics for the performance evaluation of analyzed frameworks; finally, some emerging but promising optimization techniques are reviewed apart from our classification.

Distributed analytics for big data: A survey / Berloco, Francesco; Bevilacqua, Vitoantonio; Colucci, Simona. - In: NEUROCOMPUTING. - ISSN 0925-2312. - STAMPA. - 574:(2024). [10.1016/j.neucom.2024.127258]

Distributed analytics for big data: A survey

Berloco, Francesco;Bevilacqua, Vitoantonio
;
Colucci, Simona
2024-01-01

Abstract

In recent years, a constant and fast information growing has characterized digital applications in the majority of real-life scenarios. Thus, a new information asset, namely Big Data, has been defined and lead to different challenges, mainly related to data storage, management and analysis. Focusing on the last challenge, several Big Data analytics techniques have been developed, based on Machine Learning and Deep Learning paradigms. When dealing with Big Data, traditional approaches often take a lot of time to produce even a single predictive model, due to the extremely high demand of computational resources. The design of approaches specifically oriented to Big Data is required to overcome these computational issues. Most solutions rely on the deployment of Big Data analytics infrastructures on a cluster of machines and/or on parallelization techniques. When deployment and parallelization apply to Machine Learning and Deep Learning, we can refer to the terms Distributed Machine Learning and Distributed Deep Learning, respectively. We here discuss the main principles and features of Distributed Machine Learning and Distributed Deep Learning frameworks. The main contribution of this work is a survey of solutions proposed in the literature, through the investigation of selected features and capabilities. In particular, the survey provides a comparative analysis according to the following classification criteria: implemented parallelization technique, supporting device, supported architecture, implemented communication mode, working mode, and class of algorithms. The paper also gives an overview of the most commonly used criteria and metrics for the performance evaluation of analyzed frameworks; finally, some emerging but promising optimization techniques are reviewed apart from our classification.
2024
Distributed analytics for big data: A survey / Berloco, Francesco; Bevilacqua, Vitoantonio; Colucci, Simona. - In: NEUROCOMPUTING. - ISSN 0925-2312. - STAMPA. - 574:(2024). [10.1016/j.neucom.2024.127258]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11589/264721
Citazioni
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact