Comparative Analysis of Building Insurance Prediction Using Some Machine Learning Algorithms.
DOI:
https://doi.org/10.9781/ijimai.2022.02.005Keywords:
Machine Learning, Prediction, RegressionAbstract
In finance and management, insurance is a product that tends to reduce or eliminate in totality or partially the loss caused due to different risks. Various factors affect house insurance claims, some of which contribute to formulating insurance policies including specific features that the house has. Machine Learning (ML) when brought into the field of insurance would enable seamless formulation of insurance policies with a better performance which will also save time. Various classification algorithms have been used since they have a long history and have also got some modifications for optimum functionality. To illustrate the performance of each of the ML algorithms that we used here, we analyzed an insurance dataset drawn from Zindi Africa competition which is said to be from Olusola Insurance Company in Lagos Nigeria. This study therefore, compares the performance of Logistic Regression (LR), Decision Tree (DT), K-Nearest Neighbor (KNN), Kernel Support Vector Machine (kSVM), Naïve Bayes (NB), and Random Forest (RF) Regressors on a dataset got from Zindi.africa competition and their performances are checked using not only accuracy and precision metrics but also recall, and F1 score metrics, all displayed on the confusion matrix. The accuracy result shows that logistic regression and Kernel SVM both gave 78% but kSVM outperformed LR in precision with a percentage of 70.8% for kSVM and 64.8% for LR showing that kSVM offered the best result.
Downloads
References
H. Sufriyana, Y. W. Wu, and E. C. Y. Su, “Artificial intelligence-assisted prediction of preeclampsia: Development and external validation of a nationwide health insurance dataset of the BPJS Kesehatan in Indonesia,” EBioMedicine, vol. 54, 2020, doi: 10.1016/j.ebiom.2020.102710.
Y. Huang and S. Meng, “A Bayesian nonparametric model and its application in insurance loss prediction,” Insurance: Mathematics and Economics, vol. 93, pp. 84–94, 2020, doi: 10.1016/j.insmatheco.2020.04.010.
P. Li, S. Li, T. Bi, and Y. Liu, “Telecom customer churn prediction method based on cluster stratified sampling logistic regression,” IET Conference Publications, vol. 2014, no. CP660, pp. 282–287, 2014, doi: 10.1049/CP.2014.1576.
Z. Kai-Hui, L. Lei, and L. Peng, “Customer churn prediction based on cluster stratified sampling logistic regression,” International Journal of Digital Content Technology and its Applications, 2011, doi: 10.4156/jdcta.vol5.issue10.45.
L. Tao, D. Zhu, L. Yan, and P. Zhang, “The traffic accident hotspot prediction: Based on the logistic regression method,” ICTIS 2015 - 3rd International Conference on Transportation Information and Safety, Proceedings, pp. 107–110, Aug. 2015, doi: 10.1109/ICTIS.2015.7232194.
H. Lan and Y. Pan, “A crowdsourcing quality prediction model based on random forests,” Proceedings - 18th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2019, pp. 315–319, Jun. 2019, doi: 10.1109/ICIS46139.2019.8940306.
X. Ye, X. Wu, and Y. Guo, “Real-time Quality Prediction of Casting Billet Based on Random Forest Algorithm,” Proceedings of the 2018 IEEE International Conference on Progress in Informatics and Computing, PIC 2018, pp. 140–143, Jul. 2018, doi: 10.1109/PIC.2018.8706306.
Y. Liu and H. Wu, “Prediction of road traffic congestion based on random forest,” Proceedings - 2017 10th International Symposium on Computational Intelligence and Design, ISCID 2017, vol. 2, pp. 361–364, Feb. 2018, doi: 10.1109/ISCID.2017.216.
J. Guo, H. Liu, Y. Luan, and Y. Wu, “Application of birth defect prediction model based on c5.0 decision tree algorithm,” Proceedings - IEEE 2018 International Congress on Cybermatics: 2018 IEEE Conferences on Internet of Things, Green Computing and Communications, Cyber, Physical and Social Computing, Smart Data, Blockchain, Computer and Information Technology, iThings/Gree, 2018, doi: 10.1109/Cybermatics_2018.2018.00310.
X. Hu, Y. Yang, L. Chen, and S. Zhu, “Research on a Customer Churn Combination Prediction Model Based on Decision Tree and Neural Network,” 2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics, ICCCBDA 2020, pp. 129–132, 2020, doi: 10.1109/ICCCBDA49378.2020.9095611.
R. K. Gupta, S. S. Lathwal, A. P. Ruhil, T. K. Mohanty, and Y. Singh, “Lameness prediction in Karan fries cross-bred cows using decision tree models,” 2015 International Conference on Computing for Sustainable Global Development, INDIACom 2015, 2015.
I. A. A. Amra and A. Y. A. Maghari, “Students performance prediction using KNN and Naïve Bayesian,” ICIT 2017 - 8th International Conference on Information Technology, Proceedings, 2017, doi: 10.1109/ICITECH.2017.8079967.
G. A. Bhatt and P. R. Gandhi, “Statistical and ANN based prediction of wind power with uncertainty,” Proceedings of the International Conference on Trends in Electronics and Informatics, ICOEI 2019, vol. 2019-April, no. Icoei, pp. 622–627, 2019, doi: 10.1109/icoei.2019.8862551.
A. Yusof and S. Ismail, “Multiple Regressions in Analysing House Price Variations,” Communications of the IBIMA, 2012, doi: 10.5171/2012.383101.
M. A. Babyak, “What You See May Not Be What You Get: A Brief, Nontechnical Introduction to Overfitting in Regression-Type Models,” Psychosomatic Medicine, 2004, doi: 10.1097/00006842-200405000-00021.
Mislan, Haviluddin, S. Hardwinarto, Sumaryono, and M. Aipassa, “Rainfall Monthly Prediction Based on Artificial Neural Network: A Case Study in Tenggarong Station, East Kalimantan - Indonesia,” Procedia Computer Science, 2015, doi: 10.1016/j.procs.2015.07.528.
A. Chogle, P. Khaire, A. Gaud, and J. Jain, “House Price Forecasting using Data Mining Techniques,” House Price Forecasting using Data Mining Techniques, 2017, doi: 10.17148/IJARCCE.2017.61216.
D. Sangani, K. Erickson, and M. Al Hasan, “Predicting Zillow Estimation Error Using Linear Regression and Gradient Boosting,” Proceedings - 14th IEEE International Conference on Mobile Ad Hoc and Sensor Systems, MASS 2017, 2017, doi: 10.1109/MASS.2017.88.
A. Nur, R. Ema, H. Taufiq, and W. Firdaus, “Modeling House Price Prediction using Regression Analysis and Particle Swarm Optimization Case Study: Malang, East Java, Indonesia,” International Journal of Advanced Computer Science and Applications, 2017, doi: 10.14569/ijacsa.2017.081042.
A. Khalafallah, “Neural Network Based Model for Predicting Housing Market Performance,” Tsinghua Science and Technology, 2008, doi: 10.1016/S1007-0214(08)70169-X.
N. Bhagat, A. Mohokar, and S. Mane, “House Price Forecasting using Data Mining,” International Journal of Computer Applications, 2016, doi: 10.5120/ijca2016911775.
S. C. Bourassa, E. Cantoni, and M. Hoesli, “Spatial dependence, housing submarkets, and house price prediction,” Journal of Real Estate Finance and Economics, 2007, doi: 10.1007/s11146-007-9036-8.
C. Brunsdon, A. S. Fotheringham, and M. E. Charlton, “Geographically weighted regression: a method for exploring spatial nonstationarity,” Geographical Analysis, 1996, doi: 10.1111/j.1538-4632.1996.tb00936.x.
A. Onan, “Classifier and feature set ensembles for web page classification,” Journal of Information Science, 2016, doi: 10.1177/0165551515591724.
A. Onan and S. KorukoGlu, “A feature selection model based on genetic rank aggregation for text sentiment classification,” Journal of Information Science, 2017, doi: 10.1177/0165551515613226.
A. Onan, “An ensemble scheme based on language function analysis and feature engineering for text genre classification,” Journal of Information Science, 2018, doi: 10.1177/0165551516677911.
A. Onan, S. Korukoǧlu, and H. Bulut, “Ensemble of keyword extraction methods and classifiers in text classification,” Expert Systems with Applications, 2016, doi: 10.1016/j.eswa.2016.03.045.
A. Sharaff and S. R. Roy, “Comparative Analysis of Temperature Prediction Using Regression Methods and Back Propagation Neural Network,” Proceedings of the 2nd International Conference on Trends in Electronics and Informatics, ICOEI 2018, no. Icoei, pp. 739–742, 2018, doi: 10.1109/ICOEI.2018.8553803.
K. Abhishek, M. P. Singh, S. Ghosh, and A. Anand, “Weather Forecasting Model using Artificial Neural Network,” Procedia Technology, 2012, doi: 10.1016/j.protcy.2012.05.047.
N. Shobha and T. Asha, “Monitoring weather based meteorological data: Clustering approach for analysis,” Proceedings-IEEE International Conference on Innovative Mechanisms for Industry Applications, ICIMIA 2017-, 2017, doi: 10.1109/ICIMIA.2017.7975575.
J. Gill, B. Singh, and S. Singh, “Training back propagation neural networks with genetic algorithm for weather forecasting,” SIISY 2010 - 8th IEEE International Symposium on Intelligent Systems and Informatics, 2010, doi: 10.1109/SISY.2010.5647319.
A. Paniagua-Tineo, S. Salcedo-Sanz, C. Casanova-Mateo, E. G. Ortiz García, M. A. Cony, and E. Hernández-Martín, “Prediction of daily maximum temperature using a support vector regression algorithm,” Renewable Energy, 2011, doi: 10.1016/j.renene.2011.03.030.
R. E. Abdel-Aal, “Hourly temperature forecasting using abductive networks,” Engineering Applications of Artificial Intelligence, 2004, doi: 10.1016/j.engappai.2004.04.002.
L. Houthuys, Z. Karevan, and J. A. K. Suykens, “Multi-view LS-SVM regression for black-box temperature prediction in weather forecasting,” Proceedings of the International Joint Conference on Neural Networks, 2017, doi: 10.1109/IJCNN.2017.7965975.
H. Xie, M. Ding, L. Chen, J. An, Z. Chen, and M. Wu, “Short-term wind power prediction by using empirical mode decomposition based GA-SYR,” Chinese Control Conference, CCC, 2017, doi: 10.23919/ChiCC.2017.8028818.
X. Peng, D. Deng, J. Wen, L. Xiong, S. Feng, and B. Wang, “A very short term wind power forecasting approach based on numerical weather prediction and error correction method,” China International Conference on Electricity Distribution, CICED, 2016, doi: 10.1109/CICED.2016.7576362.
W. Zhang, H. Zhang, J. Liu, K. Li, D. Yang, and H. Tian, “Weather prediction with multiclass support vector machines in the fault detection of photovoltaic system,” IEEE/CAA Journal of Automatica Sinica, 2017, doi: 10.1109/JAS.2017.7510562.
S. Papantoniou and D. D. Kolokotsa, “Prediction of outdoor air temperature using neural networks: Application in 4 European cities,” Energy and Buildings, 2016, doi: 10.1016/j.enbuild.2015.06.054.
D. C. Wu, B. Bahrami Asl, A. Razban, and J. Chen, “Air compressor load forecasting using artificial neural network,” Expert Systems with Applications, no. October, p. 114209, 2020, doi: 10.1016/j.eswa.2020.114209.
P. Patil, N. Yaligar, and S. Meena, “Comparision of Performance of Classifiers - SVM, RF and ANN in Potato Blight Disease Detection Using Leaf Images,” 2017 IEEE International Conference on Computational Intelligence and Computing Research, ICCIC 2017, pp. 3–7, 2018, doi: 10.1109/ICCIC.2017.8524301.
X. Tang, Z. Liu, T. Li, W. Wu, and Z. Wei, “The application of decision tree in the prediction of winning team,” Proceedings - 2018 International Conference on Virtual Reality and Intelligent Systems, ICVRIS 2018, 2018, doi: 10.1109/ICVRIS.2018.00065.
N. I. Nwulu, “A decision trees approach to oil price prediction,” IDAP 2017 - International Artificial Intelligence and Data Processing Symposium, pp. 0–4, 2017, doi: 10.1109/IDAP.2017.8090313.
M. A. Hassonah, A. Rodan, A. K. Al-Tamimi, and J. Alsakran, “Churn Prediction: A Comparative Study Using KNN and Decision Trees,” ITT 2019 - Information Technology Trends: Emerging Technologies Blockchain and IoT, 2019, doi: 10.1109/ITT48889.2019.9075077.
R. S. Raj, D. S. Sanjay, M. Kusuma, and S. Sampath, “Comparison of Support Vector Machine and Naïve Bayes Classifiers for Predicting Diabetes,” 1st International Conference on Advanced Technologies in Intelligent Control, Environment, Computing and Communication Engineering, ICATIECE 2019, pp. 41–45, 2019, doi: 10.1109/ICATIECE45860.2019.9063792.
R. Bayindir, M. Yesilbudak, M. Colak, and N. Genc, “A novel application of naive bayes classifier in photovoltaic energy prediction,” Proceedings - 16th IEEE International Conference on Machine Learning and Applications, ICMLA 2017, vol. 2017-Decem, pp. 523–527, 2017, doi: 10.1109/ICMLA.2017.0-108.
A. P. Salim, K. A. Laksitowening, and I. Asror, “Time Series Prediction on College Graduation Using KNN Algorithm,” 2020 8th International Conference on Information and Communication Technology, ICoICT 2020, 2020, doi: 10.1109/ICoICT49345.2020.9166238.
B. Zhang and D. Ma, “Flight delay prediciton at an airport using maching learning,” Proceedings - 2020 5th International Conference on Electromechanical Control Technology and Transportation, ICECTT 2020, 2020, doi: 10.1109/ICECTT50890.2020.00128.
H. Khaksar and A. Sheikholeslami, “Airline delay prediction by machine learning algorithms,” Scientia Iranica, 2019, doi: 10.24200/sci.2017.20020.
L. Belcastro, F. Marozzo, D. Talia, and P. Trunfio, “Using scalable data mining for predicting flight delays,” ACM Transactions on Intelligent Systems and Technology, 2016, doi: 10.1145/2888402.
S. Choi, Y. J. Kim, S. Briceno, and D. Mavris, “Prediction of weather-induced airline delays based on machine learning algorithms,” AIAA/IEEE Digital Avionics Systems Conference - Proceedings, 2016, doi: 10.1109/DASC.2016.7777956.
M. S. Acharya, A. Armaan, and A. S. Antony, “A comparison of regression models for prediction of graduate admissions,” ICCIDS 2019 - 2nd International Conference on Computational Intelligence in Data Science, Proceedings, 2019, doi: 10.1109/ICCIDS.2019.8862140.
C. J. Ejiyi, O. Bamisile, N. Ugochi, Q. Zhen, N. Ilakoze, and C. Ijeoma, “Systematic Advancement of Yolo Object Detector For Real-Time Detection of Objects,” 2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), pp. 279–284, Dec. 2021, doi: 10.1109/ICCWAMTIP53232.2021.9674163.
C. J. Ejiyi, J. Deng, T. U. Ejiyi, A. A. Salako, M. B. Ejiyi, and C. G. Anomihe, “Design and Development of Android Application for Educational Institutes,” Journal of Physics: Conference Series, 2021, doi: 10.1088/1742-6596/1769/1/012066.
R. D. Cook and S. Weisberg, “Criticism and Influence Analysis in Regression,” Sociological Methodology, 1982, doi: 10.2307/270724.
S. O. Bamisile Olusola, Ariyo Oluwasanmi, Chukwuebuka Joseph Ejiyi, Nasser Yimen, “Comparison of machine learning and deep learning algorithms for hourly global / diffuse solar radiation predictions,” International Journal of Energy Research, no. January, pp. 1–22, 2021, doi: 10.1002/er.6529.
K. Methaprayoon, C. Yingvivatanapong, W. J. Lee, and J. R. Liao, “An integration of ANN wind power estimation into unit commitment considering the forecasting uncertainty,” IEEE Transactions on Industry Applications, 2007, doi: 10.1109/TIA.2007.908203.
M. A. F. Azlah, L. S. Chua, F. R. Rahmad, F. I. Abdullah, and S. R. W. Alwi, “Review on techniques for plant leaf classification and recognition,” Computers. 2019, doi: 10.3390/computers8040077.
A. Ramil, A. J. López, J. S. Pozo-Antonio, and T. Rivas, “A computer vision system for identification of granite-forming minerals based on RGB data and artificial neural networks,” Measurement: Journal of the International Measurement Confederation, 2018, doi: 10.1016/j.measurement.2017.12.006.
S. Sperandei, “Understanding logistic regression analysis,” Biochemia Medica, 2014, doi: 10.11613/BM.2014.003.
Y. Y. Song and Y. Lu, “Decision tree methods: applications for classification and prediction,” Shanghai Archives of Psychiatry, 2015, doi: 10.11919/j.issn.1002-0829.215044.
L. Song, “Research on the application of data mining algorithm based on decision tree,” Metallurgical and Mining Industry, 2015.
D. T. Ahneman, J. G. Estrada, S. Lin, S. D. Dreher, and A. G. Doyle, “Predicting reaction performance in C–N cross-coupling using machine learning,” Science, 2018, doi: 10.1126/science.aar5169.
Y. L. Pavlov, “Random forests,” De Gruyter, 2019, doi: https://doi.org/10.1515/9783110941975
M. Kantardzic, “Data Mining: Concepts, Models, Methods, and Algorithms: Second Edition,” John Wiley & Sons, Inc., Hoboken, New Jersey, 2011, doi: 10.1002/9781118029145.
K. L. Priya, M. S. Charan Reddy Kypa, M. M. Sudhan Reddy, and G. R. Mohan Reddy, “A Novel Approach to Predict Diabetes by Using Naive Bayes Classifier,” Proceedings of the 4th International Conference on Trends in Electronics and Informatics, ICOEI 2020, no. Icoei, pp. 603–607, 2020, doi: 10.1109/ICOEI48184.2020.9142959.
Mahima and N. B. Padmavathi, “Comparative study of kernel SVM and ANN classifiers for brain neoplasm classification,” 2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies, ICICICT 2017, 2018, doi: 10.1109/ICICICT1.2017.8342608.
Y. Zhang and L. Wu, “An MR brain images classifier via principal component analysis and kernel support vector machine,” Progress in Electromagnetics Research, 2012, doi: 10.2528/PIER12061410.
“Competitions - Zindi.” https://zindi.africa/competitions (accessed Jul. 17, 2020).
S. Sharma and A. Bhagat, “Data preprocessing algorithm for Web Structure Mining,” Proceedings on 5th International Conference on Eco-Friendly Computing and Communication Systems, ICECCS 2016, 2017, doi: 10.1109/Eco-friendly.2016.7893249.
S. Samsani, “An RST based efficient preprocessing technique for handling inconsistent data,” 2016 IEEE International Conference on Computational Intelligence and Computing Research, ICCIC 2016, 2017, doi: 10.1109/ICCIC.2016.7919591.
J. Han, M. Kamber, and J. Pei, “Data Mining: Concepts and Techniques,” 3rd Edition Morgan Kaufmann Publishers, Waltham., 2012, doi: 10.1016/C2009-0-61819-5.
S. M. Lundberg and S. I. Lee, “A unified approach to interpreting model predictions,” Advances in Neural Information Processing Systems, 2017.
Downloads
Published
-
Abstract206
-
PDF102






