The advent of pretrained language have renovated the ways of handling natural languages, improving the quality of systems that rely on them. BERT played a crucial role in revolutionizing the Natural Language Processing (NLP) area. However, the deep learning framework it implements lacks interpretability. Thus, recent research efforts aimed to explain what BERT learns from the text sources exploited to pre-train its linguistic model. In this paper, we analyze the latent vector space resulting from the BERT context-aware word embeddings. We focus on assessing whether regions of the BERT vector space hold an explicit meaning attributable to a Knowledge Graph (KG). First, we prove the existence of explicitly meaningful areas through the Link Prediction (LP) task. Then, we demonstrate these regions being linked to explicit ontology concepts of a KG by learning classification patterns. To the best of our knowledge, this is the first attempt at interpreting the BERT learned linguistic knowledge through a KG relying on its pretrained context-aware word embeddings.
Interpretability of BERT Latent Space through Knowledge Graphs / Anelli, V. W.; Biancofiore, G. M.; De Bellis, A.; Di Noia, T.; Di Sciascio, E.. - (2022), pp. 3806-3810. (Intervento presentato al convegno 31st ACM International Conference on Information and Knowledge Management, CIKM 2022 tenutosi a Westin Peachtree Plaza Hotel, usa nel 2022) [10.1145/3511808.3557617].
Interpretability of BERT Latent Space through Knowledge Graphs
Anelli V. W.;Biancofiore G. M.;Di Noia T.;Di Sciascio E.
2022-01-01
Abstract
The advent of pretrained language have renovated the ways of handling natural languages, improving the quality of systems that rely on them. BERT played a crucial role in revolutionizing the Natural Language Processing (NLP) area. However, the deep learning framework it implements lacks interpretability. Thus, recent research efforts aimed to explain what BERT learns from the text sources exploited to pre-train its linguistic model. In this paper, we analyze the latent vector space resulting from the BERT context-aware word embeddings. We focus on assessing whether regions of the BERT vector space hold an explicit meaning attributable to a Knowledge Graph (KG). First, we prove the existence of explicitly meaningful areas through the Link Prediction (LP) task. Then, we demonstrate these regions being linked to explicit ontology concepts of a KG by learning classification patterns. To the best of our knowledge, this is the first attempt at interpreting the BERT learned linguistic knowledge through a KG relying on its pretrained context-aware word embeddings.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.