02594nas a2200241 4500000000100000000000100001008004100002260001200043653004100055653004200096653002500138653003200163653001500195100002400210700002600234700002700260245009800287856008100385300001100466490000600477520185500483022001402338 2023 d c03/202310aMean-Standard-Deviation (MSD) Method10aMulti-Criteria Decision Method (MCDM)10aProbabilistic Method10aStandard Deviation of Score10aWeb Source1 aHariom Sharan Sinha1 aSaket Kumar Choudhary1 aVijender Kumar Solanki00aAn Efficient Probabilistic Methodology to Evaluate Web Sources as Data Source for Warehousing uhttps://www.ijimai.org/journal/sites/default/files/2023-02/ijimai8_1_9_0.pdf a95-1040 v83 aInternet is the largest source of data and the requirement of data analytics have fueled the data warehouse to switch from structured conventional Data Warehouse to complex Web Data Warehouse. The dynamic and complex nature of web poses various types of complexities during synthesis of web data into a conventional warehouse. Multi-Criteria-Decision Making (MCDM) is a prominent mechanism to select the best data for storing into the data-warehouse. In this article, a method, based on the probabilistic analysis of SAW and TOPSIS methods, has been proposed to select web data sources as data sources for web data warehouse. This method deals more efficiently with the dynamic and complex nature of web. Here, the result of the selection employs the analysis of both the methods (SAW and TOPSIS) to evaluate the probability of selection of respective score (1-9) for each feature. With these probability values, the probability of selection of the next web sources has been be determined. Moreover, using the same probability values, mean score and standard deviation of the scores of respective features of selected web sources have been deduced, which are further used to fix the standard score of each feature for selection of web sources. The standard score is a parameter of the proposed Mean-Standard-Deviation (MSD) method to check the suitability of web sources individually, whereas others do the same on comparative basis. The proposed method cuts down the cost of the repetitive comparison operation, once after computation of the Standard score using Mean and Standard deviation of each individual feature. Here, the respective value of the standard score of each feature is only compared with the score of each respective feature of the next web sources, so it reduces the cost of computation and selects the web sources faster as well. a1989-1660