TKU-PSO: An Efficient Particle Swarm Optimization Model for Top-K High-Utility Itemset Mining.

Simen Carstensen; Jerry Chun Wei Lin

doi:10.9781/ijimai.2024.01.002

Authors

Simen Carstensen University of Bergen
Jerry Chun Wei Lin Western Norway University of Applied Sciences

DOI:

https://doi.org/10.9781/ijimai.2024.01.002

Keywords:

Data Mining, Evolutionary Computation, Fitness Estimation, Particle Swarm Optimization, Threshold-Raising Strategy, Top-k High-Utility Itemset

Abstract

Top-k high-utility itemset mining (top- HUIM) is a data mining procedure used to identify the most valuable patterns within transactional data. Although many algorithms are proposed for this purpose, they require substantial execution times when the search space is vast. For this reason, several meta-heuristic models have been applied in similar utility mining problems, particularly evolutionary computation (EC). These algorithms are beneficial as they can find optimal solutions without exploring the search space exhaustively. However, there are currently no evolutionary heuristics available for top-k HUIM. This paper addresses this issue by proposing an EC-based particle swarm optimization model for top-k HUIM, which we call TKU-PSO. In addition, we have developed several strategies to relieve the computational complexity throughout the algorithm. First, redundant and unnecessary candidate evaluations are avoided by utilizing explored solutions and estimating itemset utilities. Second, unpromising items are pruned during execution based on a thresholdraising concept we call minimum solution fitness. Finally, the traditional population initialization approach is revised to improve the model’s ability to find optimal solutions in huge search spaces. Our results show that TKU-PSO is faster than state-of-the-art competitors in all datasets tested. Most notably, existing algorithms could not complete certain experiments due to excessive runtimes, whereas our model discovered the correct solutions within seconds. Moreover, TKU-PSO achieved an overall accuracy of 99.8% compared to 16.5% with the current heuristic approach, while memory usage was the smallest in 2/3 of all tests.

Downloads

Download data is not yet available.

References

1]R. Agrawal, T. Imieliński, A. Swami, “Mining association rules between sets of items in large databases,” ACM SIGMOD Record, vol. 22, pp. 207– 216, 1993, doi: https://doi.org/10.1145/170036.170072.

H. Yao, H. J. Hamilton, “Mining itemset utilities from transaction databases,” Data & Knowledge Engineering, vol. 59, pp. 603–626, 2006, doi: https://doi.org/10.1016/j.datak.2005.10.004.

P. Fournier-Viger, J. Chun-Wei Lin, T. Truong-Chi, R. Nkambou, A Survey of High Utility Itemset Mining, pp. 1–45. Cham: Springer International Publishing, 2019.

H.-J. Choi, C. H. Park, “Emerging topic detection in twitter stream based on high utility pattern mining,” Expert Systems with Applications, vol. 115, pp. 27–36, 2019, doi: https://doi.org/10.1016/j.eswa.2018.07.051.

H. Q. Vu, G. Li, R. Law, “Discovering highly profitable travel patterns by high-utility pattern mining,” Tourism Management, vol. 77, p. 104008, 2020, doi: https://doi.org/10.1016/j.tourman.2019.104008.

M. Iqbal, M. N. Setiawan, M. I. Irawan, K. M. N. K. Khalif, N. Muhammad, M. K. B. M. Aziz, “Cardiovascular disease detection from high utility rare rule mining,” Artificial Intelligence in Medicine, vol. 131, p. 102347, 2022, doi: https://doi.org/10.1016/j.artmed.2022.102347.

C. W. Wu, B.-E. Shie, V. S. Tseng, P. S. Yu, “Mining top-k high utility itemsets,” in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012, p. 78–86.

D. Fogel, “What is evolutionary computation?,” IEEE Spectrum, vol. 37, no. 2, pp. 26–32, 2000, doi: https://doi.org/10.1109/6.819926.

J. Kennedy, R. Eberhart, “Particle swarm optimization,” in Proceedings of ICNN’95 - International Conference on Neural Networks, vol. 4, 1995, pp. 1942–1948.

J. C.-W. Lin, L. Yang, P. Fournier-Viger, J. M.-T. Wu, T.-P. Hong, L. S.-L. Wang, J. Zhan, “Mining high-utility itemsets based on particle swarm optimization,” Engineering Applications of Artificial Intelligence, vol. 55, pp. 320–330, 2016, doi: https://doi.org/10.1016/j.engappai.2016.07.006.

H. Rezk, J. Arfaoui, M. R. Gomaa, “Optimal parameter estimation of solar pv panel based on hybrid particle swarm and grey wolf optimization algorithms,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 6, no. 6, pp. 145–155, 2021, doi: https://doi.org/10.9781/ijimai.2020.12.001.

K. K. Verma, B. M. Singh, “Deep multi- model fusion for human activity recognition using evolutionary algorithms,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 7, no. 2, pp. 44–58, 2021, doi: https://doi.org/10.9781/ijimai.2021.08.008.

Y. Liu, W.-k. Liao, A. Choudhary, “A two-phase algorithm for fast discovery of high utility itemsets,” in Advances in Knowledge Discovery and Data Mining, 2005, pp. 689–695.

C. F. Ahmed, S. K. Tanbeer, B.-S. Jeong, Y.-K. Lee, “Efficient tree structures for high utility pattern mining in incremental databases,” IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 12, pp. 1708–1721, 2009, doi: https://doi.org/10.1109/TKDE.2009.46.

V. S. Tseng, C.-W. Wu, B.-E. Shie, P. S. Yu, “Up-growth: An efficient algorithm for high utility itemset mining,” in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2010, p. 253–262.

U. Yun, H. Ryang, K. H. Ryu, “High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates,” Expert Systems with Applications, vol. 41, no. 8, pp. 3861–3878, 2014, doi: https://doi.org/10.1016/j.eswa.2013.11.038.

M. Liu, J. Qu, “Mining high utility itemsets without candidate generation,”in Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012, p. 55–64.

P. Fournier-Viger, C.-W. Wu, S. Zida, V. S. Tseng, “Fhm: Faster high-utility itemset mining using estimated utility co-occurrence pruning,” in Foundations of Intelligent Systems, 2014, pp. 83–92.

S. Krishnamoorthy, “Pruning strategies for mining high utility itemsets,” Expert Systems with Applications, vol. 42, no. 5, pp. 2371–2381, 2015, doi: https://doi.org/10.1016/j.eswa.2014.11.001.

P. Wu, X. Niu, P. Fournier-Viger, C. Huang, B. Wang, “Ubp-miner: An efficient bit based high utility itemset mining algorithm,” Knowledge- Based Systems, vol. 248, p. 108865, 2022, doi: https://doi.org/10.1016/j.knosys.2022.108865.

]J. Liu, K. Wang, B. C. Fung, “Mining high utility patterns in one phasewithout generating candidates,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 5, pp. 1245–1257, 2016, doi: https://doi.org/10.1109/TKDE.2015.2510012.

S. Zida, P. Fournier Viger, J. C.-W. Lin, C.-W. Wu, V. Tseng, “EFIM: a fast and memory efficient algorithm for high-utility itemset mining,” Knowledge and Information Systems, vol. 51, pp. 595–625, 2017, doi: https://doi.org/10.1007/s10115-016-0986-0.

H. Ryang, U. Yun, “Top-k high utility pattern mining with effective threshold raising strategies,” Knowledge- Based Systems, vol. 76, pp. 109–126, 2015, doi: https://doi.org/10.1016/j.knosys.2014.12.010.

V. S. Tseng, C.-W. Wu, P. Fournier-Viger, P. S. Yu, “Efficient algorithms for mining top-k high utility itemsets,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 1, pp. 54–67, 2016, doi: https://doi.org/10.1109/TKDE.2015.2458860.

Q.-H. Duong, B. Liao, P. Fournier-Viger, T.-L. Dam, “An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies,” Knowledge-Based Systems, vol. 104, pp. 106–122, 2016, doi: https://doi.org/10.1016/j.knosys.2016.04.016.

K. Singh, S. S. Singh, A. Kumar, B. Biswas, “TKEH: an efficient algorithmfor mining top-k high utility itemsets,” Applied Intelligence, vol. 49, no. 3, pp. 1078–1097, 2019, doi: https://doi.org/10.1007/s10489-018- 1316-x.

J. Liu, X. Zhang, B. C. Fung, J. Li, F. Iqbal, “Opportunistic mining of top-n high utility patterns,” Information Sciences, vol. 441, pp. 171–186, 2018, doi: https://doi.org/10.1016/j.ins.2018.02.035.

S. Krishnamoorthy, “Mining top-k high utility itemsets with effective threshold raising strategies,” Expert Systems with Applications, vol. 117, pp. 148–165, 2019, doi: https://doi.org/10.1016/j.eswa.2018.09.051.

X. Han, X. Liu, J. Li, H. Gao, “Efficient top-k high utility itemset miningon massive data,” Information Sciences, vol. 557, pp. 382–406, 2021, doi: https://doi.org/10.1016/j.ins.2020.08.028.

M. Ashraf, T. Abdelkader, S. Rady, T. F. Gharib, “TKN: an efficient approach for discovering top-k high utility itemsets with positive or negative profits,” Information Sciences, vol. 587, pp. 654–678, 2022, doi: https://doi.org/10.1016/j.ins.2021.12.024.

C. Zhang, Z. Du, W. Gan, P. S. Yu, “Tkus: Mining top-k high utility sequential patterns,” Information Sciences, vol. 570, pp. 342–359, 2021, doi: https://doi.org/10.1016/j.ins.2021.04.035.

W. Song, C. Zheng, C. Huang, L. Liu, “Heuristically mining the top-k high-utility itemsets with cross- entropy optimization,” Applied Intelligence, pp. 1–16, 2021, doi: https://doi.org/10.1007/s10489-021-02576- z.

J. C.-W. Lin, L. Yang, P. Fournier-Viger, T.-P. Hong, M. Voznak, “A binary pso approach to mine high-utility itemsets,” Soft Computing, vol. 21, no. 17, p. 5103–5121, 2017, doi: https://doi.org/10.1007/s00500-016-2106-1.

W. Song, C. Huang, “Mining high utility itemsets using bio-inspired algorithms: A diverse optimal value framework,” IEEE Access, vol. 6, pp. 19568–19582, 2018, doi: https://doi.org/10.1109/ACCESS.2018.2819162.

W. Fang, Q. Zhang, H. Lu, J. C.-W. Lin, “High-utility itemsets mining based on binary particle swarm optimization with multiple adjustment strategies,” Applied Soft Computing, vol. 124, p. 109073, 2022, doi: https://doi.org/10.1016/j.asoc.2022.109073.

S. Kannimuthu, K. Premalatha, “Discovery of high utility itemsets using genetic algorithm with ranked mutation,” Applied Artificial Intelligence, vol. 28, no. 4, pp. 337–359, 2014, doi: https://doi.org/10.1080/08839514.2014.891839.

Q. Zhang, W. Fang, J. Sun, Q. Wang, “Improved genetic algorithm for high-utility itemset mining,” IEEE Access, vol. 7, pp. 176799–176813, 2019, doi: https://doi.org/10.1109/ACCESS.2019.2958150.

J. M.-T. Wu, J. Zhan, J. C.-W. Lin, “An aco-based approach to mine high-utility itemsets,” Knowledge- Based Systems, vol. 116, pp. 102–113, 2017, doi: https://doi.org/10.1016/j.knosys.2016.10.027.

W. Song, C. Huang, “Discovering high utility itemsets based on the artificial bee colony algorithm,” in Advances in Knowledge Discovery and Data Mining - 22nd Pacific-Asia Conference, vol. 10939, 2018, pp. 3–14.

W. Song, J. Li, C. Huang, “Artificial fish swarm algorithm for mining high utility itemsets,” in Advances in Swarm Intelligence - 12th International Conference, vol. 12690, 2021, pp. 407–419.

M. S. Nawaz, P. Fournier-Viger, U. Yun, Y. Wu, W. Song, “Mining high utility itemsets with hill climbing and simulated annealing,” ACM Transactions on Management Information Systems, vol. 13, no. 1, 2021, doi: https://doi.org/10.1145/3462636.

R. Rubinstein, “The cross-entropy method for combinatorial and continuous optimization,” Methodology and computing in applied probability, vol. 1, no. 2, pp. 127–190, 1999, doi: https://doi.org/10.1023/A:1010091220143.

P. Fournier-Viger, J. C.-W. Lin, A. Gomariz, T. Gueniche, A. Soltani, Z. Deng, H. T. Lam, “The SPMF open-source data mining library version 2,” in Machine Learning and Knowledge Discovery in Databases - European Conference, vol. 9853, 2016, pp. 36–40.