Multi-sense Embeddings Using Synonym Sets and Hypernym Information from Wordnet.
DOI:
https://doi.org/10.9781/ijimai.2020.07.001Keywords:
Hypernym Path, Multi-sense Embeddings, Word Embeddings, Word Similarity, Synonym SetsAbstract
Word embedding approaches increased the efficiency of natural language processing (NLP) tasks. Traditional word embeddings though robust for many NLP activities, do not handle polysemy of words. The tasks of semantic similarity between concepts need to understand relations like hypernymy and synonym sets to produce efficient word embeddings. The outcomes of any expert system are affected by the text representation. Systems that understand senses, context, and definitions of concepts while deriving vector representations handle the drawbacks of single vector representations. This paper presents a novel idea for handling polysemy by generating Multi-Sense Embeddings using synonym sets and hypernyms information of words. This paper derives embeddings of a word by understanding the information of a word at different levels, starting from sense to context and definitions. Proposed sense embeddings of words obtained prominent results when tested on word similarity tasks. The proposed approach is tested on nine benchmark datasets, which outperformed several state-of-the-art systems.
Downloads
References
[1] X. Ji, A. Ritter, P. Y. Yen, “Using ontology-based semantic similarity to facilitate the article screening process for systematic reviews,” Journal of biomedical informatics, 2017, vol. 69, pp. 33-42.
[2] M. B. Aouicha, and M. A. H. Taieb, “Computing semantic similarity between biomedical concepts using new information content approach,” Journal of biomedical informatics, 2016, vol. 59, pp. 258-275.
[3] G. Zhu, and C. A. Iglesias, “Computing semantic similarity of concepts in knowledge graphs,” IEEE Trans. Know. Data Engg., 2016, vol. 29(1), pp. 72-85.
[4] A. Skabar, and K. Abdalgader, “Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm,” IEEE Trans. Know. Data Engg., 2013, vol. 25(1), pp. 62-75.
[5] G. Glavas, M. Franco-Salvador, S. P. Ponzetto, and P. A. Rosso, “A resource-light method for cross-lingual semantic textual similarit,” Knowl.-Based Syst., 2018, vol. 143, pp. 1–9.
[6] Jaccard, “Etude comparative de la distribution florale dans une portion des Alpes et des Jura,” Bulletin de la Société Vaudoise des Sciences Naturelles, 1945, vol. 37, pp. 547-579.
[7] V. Levenshtein, “Binary Codes capable of correcting deletions, insertions, and reversals,” Cyber. and Control Theory, 1966, vol.10, pp. 707–710.
[8] Dice, “Measures of the amount of ecologic association between species,” Ecology, 1945.
[9] C. Fellbaum, “WordNet: An Electronic Lexical Database,” Cambridge, MA: MIT Press, 1998.
[10] Jiang, D. W. Conrath, “Semantic similarity based on corpus statistics and lexical taxonomy,” arXiv preprint cmp-lg/9709008, 1997.
[11] C. Leacock, M. Chodorow, “Combining local context and WordNet similarity for word sense identification,” WordNet: An electronic lexical database, 1998, vol. 49, pp. 265-283.
[12] Li, McLean, “An approach for measuring semantic similarity between words using multiple information sources,’’ IEEE Trans. on know. and data eng., 2003, vol. 15, pp. 871-882.
[13] Lin, Dekang, “An information-theoretic definition of similarity,” Proc. Of Icml, 1998.
[14] R. Rada, H. Mili, E. Bicknell, and M. Blettner, “Development and application of a metric on semantic nets,” IEEE Trans. Syst. Man. Cyber., 1989, vol. 19(1), pp. 17-30.
[15] Resnik, “Using information content to evaluate semantic similarity in a taxonomy,” arXiv preprint cmp-lg/9511007, 1995.
[16] G. Zhu, C. A. Iglesias, “Computing semantic similarity of concepts in knowledge graphs,” IEEE Trans. Knowl. and Data Eng., 2017, vol. 29, pp. 72-85.
[17] Z. Wu, M. Palmer, “Verbs semantics and lexical selection,” In Proceedings of the 32nd annual meeting on ACM, 1994, pp. 133-138.
[18] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” CoRR. ArXiv: 1301.3781, 2013.
[19] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and Dean, “Distributed representations of words and phrases and their compositionality,” in NIPS’13, 2013, pp. 3111–3119.
[20] T. Chen, R. Xu, and Y. He, and X. Wang, “Improving distributed representation of word sense via wordnet gloss composition and context clustering,” International Joint Conference on Natural Language Processing, 2015, pp. 15-20.
[21] S. M. Rezaeinia, R. Rahmani, A. Ghodsi, and H. Veisi, “Sentiment analysis based on improved pre-trained word embeddings,” Expert Systems with Applications, 2019, vol. 117, pp. 139-147.
[22] T. Khai Tran, and T. Thi Phan, “Deep Learning Application to Ensemble Learning—The Simple, but Effective, Approach to Sentiment Classifying,” Applied Sciences, 2019, vol. 9(13).
[23] F. Li, Y. Yin, J. Shi, X. Mao, and R. Shi, “Method of feature reduction in short text classification based on feature clustering,” Applied Sciences, 2019, vol. 9(8).
[24] D. Dimitriadis, and G. Tsoumakas, “Word embeddings and external resources for answer processing in biomedical factoid question answering,” Journal of biomedical informatics, 2019, vol. 92, pp. 103-118.
[25] J. Li, D. Jurafsky, “Do multi-sense embeddings improve natural language understanding?” Proceedings of Empirical methods in natural language processing, 2015, pp.1722–1732.
[26] Z. Harris, “Distributional structure,” Word, 1954, vol. 10(23), pp. 146–162.
[27] G. Salton, A. Wong, C. S. A. Yang, “A vector space model for automatic indexing,” Communications of the ACM, 1975, vol. 18(11), pp. 613–620.
[28] P. D. Turney, and P. Pantel, “From frequency to meaning: Vector space models of semantics,” CoRR. ArXiv: 1003.1141, 2010.
[29] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. P. Kuksa, “Natural language processing (almost) from scratch,” Journal of Machine Learning Research, 2011, vol. 12, pp. 2493–2537.
[30] J. Pennington, and R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” Proceedings of Empirical Methods in Natural Language Processing, 2014, pp. 1532–1543.
[31] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,” Transactions of the Association for Computational Linguistics, 2017, vol. 5, pp. 135–146.
[32] I. Iacobacci, M. T. Pilehvar, and R. Navigli, “Sensembed: Learning sense embeddings for word and relational similarity,” International Joint Conference on Natural Language Processing, 2015, pp. 95-105.
[33] X. Chen, Z. Liu, M. A. Sun, “A unified model for word sense representation and disambiguation,” Proceedings of Empirical Methods in Natural Language Processing, 2014, pp.1025-1030.
[34] D. Oele, G. van Noord, “Simple Embedding-Based Word Sense Disambiguation,” Proceedings of the 9th Global WordNet Conference, 2018.
[35] T. Ruas, W. Grosky, and A. Aizawa, “Multi-Sense embeddings through a word sense disambiguation process,” Expert Systems with Applications, 2019 (In press).
[36] M. Lesk, “Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone,” Proceedings of the 5th annual international conference on systems documentation SIGDOC’ 86, 1986, pp. 24–26.
[37] S. Rothe, and H. Schutze, “Autoextend: Extending word embeddings to embeddings for synsets and lexemes,” arXiv preprint arXiv:1507.01127, 2015.
[38] M. Fabian, M. F. Suchanek, G. Kasneci, G. Weikum, “Yago: A Core of Semantic Knowledge,” 16th International Conference on the World Wide Web, 2007, pp. 697–706.
[39] N. Mrksic, I. Vulic, D. O. Seaghdha, I. Leviant, R. Reichart, M. Gašić, A. Korhonen, and S.Young, “Semantic specialisation of distributional word vector spaces using monolingual and cross-lingual constraints,” Trans. ACL., 2017, vol. 5, pp. 309–324.
[40] J. Wieting, M. Bansal, K. Gimpel, K. Livescu, and D. Roth, “From paraphrase database to compositional paraphrase model and back,” Trans. ACL., 2015, vol. 3, pp. 345–358.
[41] N. Mrksic, D. O Seaghdha, et al., “Counter-fitting word vectors to linguistic constraints,” In Proceedings of NAACL-HLT, 2016, pp.142–148.
[42] T. Tanimoto, “An Elementary Mathematical theory of Classification and Pre-diction,” IBM Internal Report 17th IBM, 1957.
[43] G. A. Miller, W. G. Charles, “Contextual correlates of semantic similarity,” Lang. Cogn. Process., 1991, vol. 6 (1), pp. 1–28.
[44] H. Rubenstein, J. B. Goodenough, “Contextual correlates of synonymy,” Commun. ACM, 1965, vol. 8(10), pp. 627–633.
[45] L. Finkelstein, et al., “Placing search in context: The concept revisited,” ACM Trans. Inf. Syst., 2002, vol. 20(1), pp. 116–131.
[46] G. Halawi, G. Dror, E. Gabrilovich, and Y. Koren, “Large-scale learning of word relatedness with constraints,” In Proc. of ACM SIGKDD. 2012, pp. 1406–1414.
[47] K. Radinsky, et al., “A word at a time: computing word relatedness using temporal semantic analysis,” In Proc. of the Intl. Conf. on WWW. ACM, 2011, pp.337–346.
[48] E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng, “Improving word representations via global context and multiple word prototypes,” In Proc. of the Annual Meeting of the ACL, 2012, pp. 873–882.
[49] T. Luong, R. Socher, C. D. Manning, “Better word representations with recursive neural networks for morphology,” In: Proc. of CoNLL., 2013, pp. 104–113.
[50] E. Bruni, N. K. Tran, and M. Baroni, “Multimodal distributional semantics,” Journal of Artificial Intelligence Research, 2014, vol. 49(1), pp. 1–47.
[51] F. Hill, R. Reichart, A. Korhonen, “SimLex-999: Evaluating semantic models with (genuine) similarity estimation,” Comput. Linguist., 2015, vol. 41 (4), pp. 665–695.
[52] R. Navigli, S. P. Ponzetto, “BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network,” Artificial Intelligence, 2012, vol. 193, pp. 217–250.
[53] J. Ganitkevitch, B. Van Durme, C. Callison-Burch, “PPDB: The paraphrase database,” In: Proc. of HLT-NAACL, 2013, pp. 758–764.
Downloads
Published
-
Abstract209
-
PDF44






