Assessing Lexical Semantic Regularities in Portuguese Word Embeddings.

Authors

DOI:

https://doi.org/10.9781/ijimai.2021.02.006

Keywords:

Natural Language Processing, Semantics, Word Embeddings, Semantic Lexicon
Supporting Agencies
This is an extended version of a paper describing the creation of the TALES dataset and its usage for assessing three models of static word embeddings [47], presented at the ECAI 2020 Workshop on Hybrid Intelligence for Natural Language Processing Tasks. Besides some more detailed discussions, the main additions are the inclusion of the BERT models in our experiment and a more thorough inspection of the results, in order to complement our discussion on the utility of this kind of approach for enriching computational lexical resources. We would like to thank the reviewers for their useful comments that lead to a better version of this manuscript.

Abstract

Models of word embeddings are often assessed when solving syntactic and semantic analogies. Among the latter, we are interested in relations that one would find in lexical-semantic knowledge bases like WordNet, also covered by some analogy test sets for English. Briefly, this paper aims to study how well pretrained Portuguese word embeddings capture such relations. For this purpose, we created a new test, dubbed TALES, with an exclusive focus on Portuguese lexical-semantic relations, acquired from lexical resources. With TALES, we analyse the performance of methods previously used for solving analogies, on different models of Portuguese word embeddings. Accuracies were clearly below the state of the art in analogies of other kinds, which shows that TALES is a challenging test, mainly due to the nature of lexical-semantic relations, i.e., there are many instances sharing the same argument, thus allowing for several correct answers, sometimes too many to be all included in the dataset. We further inspect the results of the best performing combination of method and model to find that some acceptable answers had been considered incorrect. This was mainly due to the lack of coverage by the source lexical resources and suggests that word embeddings may be a useful source of information for enriching those resources, something we also discuss.

Downloads

Download data is not yet available.

Downloads

Published

2021-03-01
Metrics
Views/Downloads
  • Abstract
    228
  • PDF
    59

How to Cite

Gonçalo Oliveira, H., Sousa, T., and Alves, A. (2021). Assessing Lexical Semantic Regularities in Portuguese Word Embeddings. International Journal of Interactive Multimedia and Artificial Intelligence, 6(5), 34–46. https://doi.org/10.9781/ijimai.2021.02.006