Atta Yaw Agyeman, Samuel Gbli Tetteh, 2024. "Technical Evaluation of Machine Learning Models: An Empirical Study" ESP International Journal of Advancements in Computational Technology (ESP-IJACT) Volume 2, Issue 1: 1-9.
In the current era of technological advancement, the proliferation of diverse data sources has revolutionised decision-making processes across the globe. This exponential growth in data availability has reshaped decision-making paradigms and unlocked unprecedented opportunities for applying machine learning methodologies. Mainly, domains such as disease detection and intricate economic analysis have witnessed a significant transformation due to the advent of machine learning algorithms. Amidst these developments, the incidence of breast cancer continues to surge in both developed and developing nations, posing significant challenges to healthcare systems worldwide. In response to this pressing concern, this study endeavours to amalgamate these trends by comprehensively analysing major machine learning models to classify breast cancer tissues. Utilising the Wisconsin Breast Cancer Dataset as the foundational framework, this research aims to evaluate the efficacy of various machine learning algorithms in distinguishing between benign and malignant tissues. The repertoire of machine learning models under scrutiny encompasses Logistic Regression, Gaussian Naïve Bayes, K-Nearest Neighbors (KNN), as well as two variants of Support Vector Machine (SVM) — Radial Basis Function (RBF) and Linear classifier. Additionally, the study incorporates Decision Tree Classifier and Random Forest (RF) algorithms into its comparative analysis. The study's findings underscore the pivotal role of Random Forest (RF) and the diverse variations of Support Vector Machine (SVM) in achieving remarkable classification accuracy. Moreover, these models exhibit superior precision, recall, and f1-score performance metrics, highlighting their efficacy in breast cancer tissue classification tasks.
[1] Agarwal, S. (2014). Data mining: Data mining concepts and techniques. In Proceedings - 2013 International Conference on Machine Intelligence Research and Advancement, ICMIRA 2013. https://doi.org/10.1109/ICMIRA.2013.45
[2] AhmedMedjahed, S., Ait Saadi, T., & Benyettou, A. (2013). Breast Cancer Diagnosis by using k-Nearest Neighbor with Different Distances and Classification Rules. International Journal of Computer Applications, 62(1), 1–5. https://doi.org/10.5120/10041-4635
[3] Al-Hadidi, M. R., Alarabeyyat, A., & Alhanahnah, M. (2017). Breast Cancer Detection Using K-Nearest Neighbor Machine Learning Algorithm. Proceedings - 2016 9th International Conference on Developments in ESystems Engineering, DeSE 2016, August, 35–39. https://doi.org/10.1109/DeSE.2016.8
[4] Ali, J., Khan, R., Ahmad, N., & Maqsood, I. (2012). Random Forests and Decision Trees. International Journal of Computer Science Issues, 9(5), 272–278.
[5] Asri, H., Mousannif, H., Al Moatassime, H., & Noel, T. (2016). Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis. Procedia Computer Science, 83(Fams), 1064–1069. https://doi.org/10.1016/j.procs.2016.04.224
[6] Boiy, E., & Moens, M. F. (2009). A machine learning approach to sentiment analysis in multilingual web texts. Information Retrieval, 12(5), 526–558. https://doi.org/10.1007/s10791-008-9070-z
[7] Burges, C. J. C. (1998). A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery 2, 121-167, 1998, 2, 121–167. https://doi.org/10.1111/sms.12977
[8] Campbell, C. (2002). Kernel methods: A survey of current techniques. Neurocomputing, 48(1–4), 63–84. https://doi.org/10.1016/S0925-2312(01)00643-9
[9] Dey, A. (2016). Machine Learning Algorithms: A Review. International Journal of Computer Science and Information Technologies, 7(3), 1174–1179. www.ijcsit.com
[10] Edriss, E., Ali, E., & Feng, W. Z. (2016). Breast Cancer Classification using Support Vector Machine and Neural Network. International Journal of Science and Research (IJSR), 5(3), 1–6. https://doi.org/10.21275/v5i3.nov161719
[11] Güzel, C., & Engineering, F. (2013). Breast Cancer Diagnosis Based on Naïve Bayes Machine Learning Classifier with KNN Missing Data Imputation. AWERProcedia Information Technology & Computer Science, 04(May), 401–407.
[12] Houts, P. S., Lenhard, R. E., & Varricchio, C. (2000). ACS cancer facts and figures. Cancer Practice, 8(3), 105–108. https://doi.org/10.1046/j.1523-5394.2000.83001.x
[13] Karthika, S., & Sairam, N. (2015). A Naïve Bayesian classifier for educational qualification. Indian Journal of Science and Technology, 8(16). https://doi.org/10.17485/ijst/2015/v8i16/62055
[14] Kaur, P., Sharma, M., & Mittal, M. (2018). Big Data and Machine Learning Based Secure Healthcare Framework. Procedia Computer Science, 132, 1049–1059. https://doi.org/10.1016/j.procs.2018.05.020
[15] Kharya, S., & Soni, S. (2016). Weighted Naive Bayes Classifier: A Predictive Model for Breast Cancer Detection. International Journal of Computer Applications, 133(9), 32–37. https://doi.org/10.5120/ijca2016908023
[16] Kotsiantis, S. B. (2007). Supervised Machine Learning : A Review of Classification Techniques. 31, 249–268.
[17] Loussaief, S., & Abdelkrim, A. (2017). Machine learning framework for image classification. 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications, SETIT 2016, 3(1), 58–61. https://doi.org/10.1109/SETIT.2016.7939841
[18] Max, W. (2011). A First Encounter with Machine Learning. 11(1), 24–32. https://doi.org/10.1145/134304.134306
[19] Naku Ghartey Jnr, F., Anyanful, A., Eliason, S., Mohammed Adamu, S., & Debrah, S. (2016). Pattern of Breast Cancer Distribution in Ghana: A Survey to Enhance Early Detection, Diagnosis, and Treatment. International Journal of Breast Cancer, 2016. https://doi.org/10.1155/2016/3645308
[20] Nguyen, C., Wang, Y., & Nguyen, H. N. (2013). Random forest classifier combined with feature selection for breast cancer diagnosis and prognostic. Journal of Biomedical Science and Engineering, 06(05), 551–560. https://doi.org/10.4236/jbise.2013.65070
[21] Ozgur, C., Kleckner, M., & Li, Y. (2015). Selection of Statistical Software for Solving Big Data Problems: A Guide for Businesses, Students, and Universities. SAGE Open, 5(2). https://doi.org/10.1177/2158244015584379
[22] Safavian, S. R., & Landgrebe, D. (1991). A Survey of Decision Wee Classifier Methodology. IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, 21(3).
[23] Seddik, A. F., & Shawky, D. M. (2015). Logistic regression model for breast cancer automatic diagnosis. IntelliSys 2015 - Proceedings of 2015 SAI Intelligent Systems Conference, 150–154. https://doi.org/10.1109/IntelliSys.2015.7361138
[24] Shravya, C. H., Pravalika, K., & Subhani, S. (2019). Prediction of breast cancer using supervised machine learning techniques. International Journal of Innovative Technology and Exploring Engineering, 8(6), 1106–1110.
[25] Suleymanov, U., & Rustamov, S. (2018). Automated News Categorization using Machine Learning methods. IOP Conference Series: Materials Science and Engineering, 459(1). https://doi.org/10.1088/1757-899X/459/1/012006
[26] Walker, A. R. P., Adam, F. I., & Walker, B. F. (2004). Breast cancer in black African women: A changing situation. Journal of The Royal Society for the Promotion of Health, 124(2), 81–85. https://doi.org/10.1177/146642400412400212
[27] Wang, H., Ma, C., & Zhou, L. (2009). A brief review of machine learning and its application. Proceedings - 2009 International Conference on Information Engineering and Computer Science, ICIECS 2009. https://doi.org/10.1109/ICIECS.2009.5362936
[28] Zhao, Y., & Zhang, Y. (2007). Comparison of decision tree methods for finding active objects. Advances of Space Research.
Support Vector Machine (SVM), Logistic Regression, Gaussian Naïve Bayes, K-Nearest Neighbor (KNN), Decision Tree Classifier, Random Forest (RF).