An Efficient Probabilistic Methodology to Evaluate Web Sources as Data Source for Warehousing.

Authors

DOI:

https://doi.org/10.9781/ijimai.2023.02.012

Keywords:

Mean-Standard-Deviation (MSD) Method, Multi-Criteria Decision Method (MCDM), Probabilistic Method, Standard Deviation of Score, Web Source

Abstract

Internet is the largest source of data and the requirement of data analytics have fueled the data warehouse to switch from structured conventional Data Warehouse to complex Web Data Warehouse. The dynamic and complex nature of web poses various types of complexities during synthesis of web data into a conventional warehouse. Multi-Criteria-Decision Making (MCDM) is a prominent mechanism to select the best data for storing into the data-warehouse. In this article, a method, based on the probabilistic analysis of SAW and TOPSIS methods, has been proposed to select web data sources as data sources for web data warehouse. This method deals more efficiently with the dynamic and complex nature of web. Here, the result of the selection employs the analysis of both the methods (SAW and TOPSIS) to evaluate the probability of selection of respective score (1-9) for each feature. With these probability values, the probability of selection of the next web sources has been be determined. Moreover, using the same probability values, mean score and standard deviation of the scores of respective features of selected web sources have been deduced, which are further used to fix the standard score of each feature for selection of web sources. The standard score is a parameter of the proposed Mean-Standard-Deviation (MSD) method to check the suitability of web sources individually, whereas others do the same on comparative basis. The proposed method cuts down the cost of the repetitive comparison operation, once after computation of the Standard score using Mean and Standard deviation of each individual feature. Here, the respective value of the standard score of each feature is only compared with the score of each respective feature of the next web sources, so it reduces the cost of computation and selects the web sources faster as well.

Downloads

Download data is not yet available.

References

S. I. Amari, H. Nagaoka, and D. Harda, “Methods of information geometry. Translation of mathematical monographs,” Oxford University Press, 2000. ISBN: 978-1-4704-4605-5 https://bookstore.ams.org/mmono-191

O. Boussaid, J. Darmont, F. Bentayeb, and S. Loudcher, “Warehousing complex data with from the web,” International Journal of Web Engineering and Technology, vol. 4, no. 4, pp. 408-433, 2008. doi: 10.1504/IJWET.2008.019942.

O. Boussaid, A. Tanasescu, F. Bentayeb, and J. Darmont, “Integration and dimensional modeling approaches for complex data warehousing,” Journal of Global Optimization, vol. 37, pp. 571-591, 2007. https://doi.org/10.1007/s10898-006-9064-6

T. Y. Chen, “Comparative analysis of SAW and TOPSIS based on interval valued fuzzy sets: Discussion on score functions and weights constraints,” Expert Systems with Applications, vol. 39, pp. 1848–1861, 2012. doi: 10.1016/j.eswa.2011.08.065.

J. L. Devore, “Probability and statistics for engineering and the sciences,” Cengage Learning, 2012. ISBN: 978-8131518397.

A. Doan, A. Halevy, and Z. Ives, “Principles of data integration,” Elsevier, 2012. ISBN: 978-0-12-416044-6.

X. L. Dong, B. Saha, and D. Srivastava, “Less is more: Selecting sources wisely for integration,” Proceeding of the VLDB Endowment, vol. 6, pp. 37- 48, 2012. doi: https://doi.org/10.14778/2535568.2448938

H. Fan, “Investigating a heterogeneous data integration approach for data warehousing,” PhD Thesis, School of Computer Science & Information Systems, Birkbeck College, University of London, 2005. Accessed: Jan. 15, 2023. [Online]. Available: https://www.dcs.bbk.ac.uk/site/assets/files/1025/haofan.pdf

R. D. Hackathorn, “Web framing for the data warehouse,” Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1999. ISBN: 978-1558605039.

J. L. Johnson, “Probability and statistics for computer science,” Wiley, 2008. ISBN: 978-0470383421.

D. Le, J. Rahayu, and E. Pardede, “Dynamic approach for integrating web data warehouses,” Computational Science and Its Applications, ICCSA-2006, Springer, 2006. ISBN: 0302-9743.

A. Marshall, “Principles of Economics,” Prometheus Books, 1890. Accessed: Jan. 15, 2023. [Online]. Available: https://eet.pixel-online.org/files/etranslation/original/Marshall,%20Principles%20of%20Economics.pdf

B. H. Massam, “Massam. Multi-criteria decision making (mcdm) techniques in planning,” Progress in planning, vol. 30, no. 1, pp. 1–84, 1988.

A. Mehedintu, I. Buligiu, and C. Pirvu, “Web-enabled data warehouse and data webhouse,” Revista Informatica Economica nr, vol. 1, no. 45, pp. 96- 102, 2008. https://core.ac.uk/download/pdf/6612753.pdf

A. Memariani, A. Amini, and A. Alinezhad, “Sensitivity analysis of simple additive weighting method (saw): the results of change in the weight of one attribute on the final ranking of alternatives,” Journal of Industrial Engineering, vol. 4, pp. 13-18, 2009.

F. Naumann, “Data fusion and data quality,” 1998.

J. M. Perez, R. Berlanga, M. J. Aramburu, and T. B. Pedersen, “Integrating data warehouses with web data: A survey,” IEEE Transactions on Knowledge and Engineering, vol. 20, no. 7, pp. 940-955, 2008. doi: 10.1109TKDE.2007.190746.

S. Rizzi, A. Abello, J. Lechtenborger, and J. Trujillo, “Research in data warehouse modeling and design: dead or alive?” Proceedings of the 9th ACM international workshop on Data warehousing and OLAP (DOLAP ’06), pp. 3-10. IEEE Computer Society, 2006. doi: 10.1145/1183512.1183515.

S. Ross, “Introduction to Probability Models,” Academic Press/Elsevier, 2012. ISBN: 978-0-12-407948-9.

R. Simanaviciene and L. Ustinovichius, “Quality-driven integration of heterogeneous information systems,” Informatik-Berichte, vol. 117, pp. 1-21, 1999. https://www.vldb.org/conf/1999/P43.pdf

R. Simanaviciene and L. Ustinovichius, “Sensitivity analysis for multiple criteria decision making methods: Topsis and saw,” Procedia Social and Behavioral Sciences, vol. 2, pp. 7743-7744, 2010.

X. Tan, D. C. Yen, and X. Fang, “Web warehousing: Web technology meets data warehousing,” Technology in Society, vol. 25, no. 131-148, 2003.

E. Triantaphyllou, B. Shu, S. Sanchez, and T. Ray, “Multi-criteria decision making: an operations research approach,” Encyclopedia of Electrical and Electronics Engineering, vol. 15, pp. 175-186, 1998.

K. S. Trivedi, “Probability and Statistics with Reliability, Queuing and Computer Science Applications,” Wiley, 2013.

Y. Zhu and A. Buchmann, “Evaluating and selecting web sources as external information resources of a data warehouse,” Proceedings of the 3rd International Conference on Web Information Systems Engineering (WISE202), pp. 140-160. IEEE Computer Society, 2002. doi: 10.1109/WISE.2002.1181652.

G. Xu, “The Construction Site Management of Concrete Prefabricated Building by ISM-ANP Network Structure Model and BIM Under Big Data Text Mining,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 6, no. 4, pp. 138-145, 2020. doi: 10.9781/ijimai.2020.11.013.

S. Kumar, V. K. Solanki, S. K. Choudhary, A. Selamat and R. G. Crespo, “Comparative Study on Ant Colony Optimization (ACO) and K-Means Clustering Approaches for Jobs Scheduling and Energy Optimization Model in Internet of Things (IoT),” International Journal of Interactive Multimedia and Artificial Intelligence (Special Issues on Soft Computing), vol. 6, no. 1, pp. 107-116, 2020. doi: 10.9781/ijimai.2020.01.003.

S. Zhang, L. Genga, H. Yan, H. Nie, X. Lu and U. Kaymak, “Towards Multi-perspective Conference Checking with Fuzzy Sets,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 6, no. 5, pp. 134-141, 2021. doi: 10.9781/ijimai.2021.02.013.

Y. Wu, L. Zhang, G. Ding, T. Xue and F. Zhang, “Modeling of Performance Creative Evaluation Driven by Multimodal Affective Data,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 6, no. 7, pp. 90-100, 2021. doi: 10.9781/ijimai.2021.08.005.

D. Burgos, “Ritual and Data Analytics: A Mixed-Methods Model to Process Personal Belief,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 7, no. 1, pp. 52-61, 2021. doi: 10.9781/ijimai.2021.07.002.

S. K. Choudhary, K. Singh and V. K. Solanki, “Spiking Activity of LIF Neuron in Distributed Delay Framework,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 3, no. 7, pp. 70-76, 2016. doi: 10.9781/ijimai.2016.3710.

I. Lopez-Plata, C. Exposito-Izquierdo, E. Lalla-Ruiz, B. Melian-Batista, J. Marcos-Vega, “A Greedy Randomized Adaptive Search With Probabilistic Learning for solving the Uncapacitated Plant Cycle Location Problem,” International Journal of Interactive Multimedia and Artificial Intelligence, 2022, (In Press), doi: 10.9781/ijimai.2022.04.003.

N. S. Houari & N. Taghezout, “An Efficient Tool for the Experts’ Recommendation Based on PROMETHEE II and Negotiation: Application to the Industrial Maintenance,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 6, no. 6, pp.67-77, 2021, doi: 10.9781/ijimai.2021.01.002.

A. Baczkiewicz, B. Kizielewicz, A. Shekhovtsov, J. Watrobski & W. Salabun, “Methodical Aspects of MCDM Based E-Commerce Recommender System,” Journal of Theoretical and Applied Electronic Commerce Research, vol. 192, pp. 4991-5002, 2021. doi: https://doi.org/10.1016/j.procs.2021.09.277

Downloads

Published

2023-03-01
Metrics
Views/Downloads
  • Abstract
    184
  • PDF
    39

How to Cite

Sharan Sinha, H., Kumar Choudhary, S., and Kumar Solanki, V. (2023). An Efficient Probabilistic Methodology to Evaluate Web Sources as Data Source for Warehousing. International Journal of Interactive Multimedia and Artificial Intelligence, 8(1), 95–104. https://doi.org/10.9781/ijimai.2023.02.012