POLITECNICO DI BARI - Catalogo dei prodotti della Ricerca

Artificial intelligence applications are becoming increasingly popular and are producing better results in many areas of research. The quality of the results depends on the quantity of data and its information content. In recent years, the amount of data available has increased significantly, but this does not always mean more information and therefore better results. The aim of this work is to evaluate the effects of a new data preprocessing method for machine learning. This method was designed for sparce matrix approximation, and it is called semi-pivoted QR approximation (SPQR). To best of our knowledge, it has never been applied to data preprocessing in machine learning algorithms. This method works as a feature selection algorithm, and in this work, an evaluation of its effects on the performance of an unsupervised clustering algorithm is proposed. The obtained results are compared to those obtained using, as preprocessing algorithm, principal component analysis (PCA). These two methods have been applied to various publicly available datasets. The obtained results show that the SPQR algorithm can achieve results comparable to those obtained using PCA without introducing any transformation of the original dataset.

Data Analysis for Information Discovery / Amato, A., Di Lecce, V.. - In: APPLIED SCIENCES. - ISSN 2076-3417. - ELETTRONICO. - 13:6(2023). [10.3390/app13063481]

Data Analysis for Information Discovery

Amato, Alberto;Di Lecce, Vincenzo

2023

Abstract

Artificial intelligence applications are becoming increasingly popular and are producing better results in many areas of research. The quality of the results depends on the quantity of data and its information content. In recent years, the amount of data available has increased significantly, but this does not always mean more information and therefore better results. The aim of this work is to evaluate the effects of a new data preprocessing method for machine learning. This method was designed for sparce matrix approximation, and it is called semi-pivoted QR approximation (SPQR). To best of our knowledge, it has never been applied to data preprocessing in machine learning algorithms. This method works as a feature selection algorithm, and in this work, an evaluation of its effects on the performance of an unsupervised clustering algorithm is proposed. The obtained results are compared to those obtained using, as preprocessing algorithm, principal component analysis (PCA). These two methods have been applied to various publicly available datasets. The obtained results show that the SPQR algorithm can achieve results comparable to those obtained using PCA without introducing any transformation of the original dataset.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Rivista
	
				APPLIED SCIENCES
			
	Codice DOI
	
				https://dx.doi.org/10.3390/app13063481
			
	Citazione
	
				Data Analysis for Information Discovery / Amato, A., Di Lecce, V.. - In: APPLIED SCIENCES. - ISSN 2076-3417. - ELETTRONICO. - 13:6(2023). [10.3390/app13063481]
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
2023_Data_Analysis_for_Information_Discovery_pdfeditoriale.pdf accesso aperto Tipologia: Versione editoriale Licenza: Creative commons Dimensione 1.65 MB Formato Adobe PDF Visualizza/Apri	1.65 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11589/263941

Citazioni

2

2

social impact