A Topic Modeling Guided Approach for Semantic Knowledge Discovery in e-Commerce

Authors

  • V. S. Anoop Indian Institute of Information Technology and Management, Kerala image/svg+xml
  • S. Asharaf Indian Institute of Information Technology and Management, Kerala image/svg+xml

DOI:

https://doi.org/10.9781/ijimai.2017.03.014

Keywords:

Web Mining, e-commerce, Graphs, Semantic Web, Text Mining, Latent Dirichlet Allocation

Abstract

The task of mining large unstructured text archives, extracting useful patterns and then organizing them into a knowledgebase has attained a great attention due to its vast array of immediate applications in business. Businesses thus demand new and efficient algorithms for leveraging potentially useful patterns from heterogeneous data sources that produce huge volumes of unstructured data. Due to the ability to bring out hidden themes from large text repositories, topic modeling algorithms attained significant attention in the recent past. This paper proposes an efficient and scalable method which is guided by topic modeling for extracting concepts and relationships from e-commerce product descriptions and organizing them into knowledgebase. Semantic graphs can be generated from such a knowledgebase on which meaning aware product discovery experience can be built for potential buyers. Extensive experiments using proposed unsupervised algorithms with e-commerce product descriptions collected from open web shows that our proposed method outperforms some of the existing methods of leveraging concepts and relationships so that efficient knowledgebase construction is possible.

Downloads

Download data is not yet available.

References

Asharaf, S., Anoop, V. S., and Afzal, A. L. “A Framework for Meaning Aware Product Discovery in E-Commerce”. In I. Lee (Ed.), Encyclopedia of E-Commerce Development, Implementation, and Management (pp. 1386–1398). Hershey, PA: Business Science, 2016. doi:10.4018/978-1-4666-9787-4.ch098.

Nasery, M., Braunhofer, M., and Ricci, F. “Recommendations with Optimal Combination of Feature-Based and Item-Based Preferences”. In Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization (pp. 269–273). ACM, 2016.

Wallach, H. M. “Topic modeling: beyond bag-of-words”. In Proceedings of the 23rd International Conference on Machine Learning (pp. 977–984). ACM, 2006.

Blei, D. M. “Probabilistic topic models”. Communications of the ACM, 55(4), 77–84, 2012.

Blei, D. M., Ng, A. Y., and Jordan, M. I. “Latent Dirichlet Allocation”. Journal of Machine Learning Research, 3(Jan), 993–1022, 2003.

Wang, X., McCallum, A., and Wei, X. “Topical n-grams: Phrase and topic discovery, with an application to information retrieval”. In Seventh IEEE International Conference on Data Mining (pp. 697–702). IEEE, 2007.

Lindsey, R. V., Headden III, W. P., and Stipicevic, M. J. “A phrase-discovering topic model using hierarchical Pitman–Yor processes”. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 214–222). Association for Computational Linguistics, 2012.

Jameel, S., and Lam, W. “An unsupervised topic segmentation model incorporating word order”. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 203–212). ACM, 2013.

El-Kishky, A., Song, Y., Wang, C., Voss, C. R., and Han, J. “Scalable topical phrase mining from text corpora”. Proceedings of the VLDB Endowment, 8(3), 305–316, 2014.

Griffiths, T. L., and Steyvers, M. “Finding scientific topics”. Proceedings of the National Academy of Sciences, 101(suppl. 1), 5228–5235, 2004.

Rajagopal, D., Cambria, E., Olsher, D., and Kwok, K. “A graph-based approach to commonsense concept extraction and semantic similarity detection”. In Proceedings of the 22nd International Conference on World Wide Web (pp. 565–570). ACM, 2013.

Ramirez, P. M., and Mattmann, C. A. “ACE: improving search engines via Automatic Concept Extraction”. In Information Reuse and Integration, 2004. IRI 2004. Proceedings of the 2004 IEEE International Conference (pp. 229–234). IEEE, 2004.

Turney, P. D. “Learning algorithms for keyphrase extraction”. Information Retrieval, 2(4), 303–336, 2000.

Witten, I. H., Paynter, G. W., Frank, E., Gutwin, C., and Nevill-Manning, C. G. “KEA: Practical automatic keyphrase extraction”. In Proceedings of the Fourth ACM Conference on Digital Libraries (pp. 254–255). ACM, 1999.

Frantzi, K., Ananiadou, S., and Mima, H. “Automatic recognition of multi-word terms: the c-value/nc-value method”. International Journal on Digital Libraries, 3(2), 115–130, 2000.

Parameswaran, A., Garcia-Molina, H., and Rajaraman, A. “Towards the web of concepts: Extracting concepts from large datasets”. Proceedings of the VLDB Endowment, 3(1–2), 566–577, 2010.

Gelfand, B., Wulfekuler, M., and Punch, W. F. “Automated concept extraction from plain text”. In AAAI 1998 Workshop on Text Categorization (pp. 13–17), 1998.

Brin, S. “Extracting patterns and relations from the World Wide Web”. In International Workshop on the World Wide Web and Databases (pp. 172–183). Springer, 1998.

Agichtein, E., and Gravano, L. “Snowball: Extracting relations from large plain-text collections”. In Proceedings of the Fifth ACM Conference on Digital Libraries (pp. 85–94). ACM, 2000.

Etzioni, O., Banko, M., Soderland, S., and Weld, D. S. “Open information extraction from the web”. Communications of the ACM, 51(12), 68–74, 2008.

Yates, A., Cafarella, M., Banko, M., Etzioni, O., Broadhead, M., and Soderland, S. “Textrunner: open information extraction on the web”. In Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations (pp. 25–26). Association for Computational Linguistics, 2007.

Sun, A. “A Two-stage Bootstrapping Algorithm for Relation Extraction”. In RANLP (pp. 76–82), 2009.

Hofmann, T. “Probabilistic latent semantic indexing”. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 50–57). ACM, 1999.

George, E. I., and McCulloch, R. E. “Variable selection via Gibbs sampling”. Journal of the American Statistical Association, 88(423), 881–889, 1993.

Bird, S. “NLTK: the natural language toolkit”. In Proceedings of the COLING/ACL on Interactive Presentation Sessions (pp. 69–72). Association for Computational Linguistics, 2006.

Röder, M., Both, A., and Hinneburg, A. “Exploring the space of topic coherence measures”. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining (pp. 399–408). ACM, 2005.

Bouma, G. “Normalized (pointwise) mutual information in collocation extraction”. Proceedings of GSCL, 31–40, 2009.

Semwal, V. B., et al. “An optimized feature selection technique based on incremental feature analysis for biometric gait data classification”. Multimedia Tools and Applications, 1–19, 2016.

Downloads

Published

2017-12-01
Metrics
Views/Downloads
  • Abstract
    28
  • PDF
    18

How to Cite

Anoop, V. S. and Asharaf, S. (2017). A Topic Modeling Guided Approach for Semantic Knowledge Discovery in e-Commerce. International Journal of Interactive Multimedia and Artificial Intelligence, 4(6), 40–47. https://doi.org/10.9781/ijimai.2017.03.014