การจำแนกข้อมูลเพื่อวินิจฉัยความเสี่ยงการเป็นโรคเบาหวานโดยใช้เทคนิคเหมืองข้อมูล Data Classifying to Diagnose Diabetes Risk Using Data Mining Techniques

การจำแนกข้อมูลเพื่อวินิจฉัยความเสี่ยงการเป็นโรคเบาหวานโดยใช้เทคนิคเหมืองข้อมูล
Data Classifying to Diagnose Diabetes Risk Using Data Mining Techniques

Nopparat Nonsiri, Ratree Manassila, Krit Somkanta

Abstract

งานวิจัยนี้มีวัตถุประสงค์เพื่อสร้างแบบจำลองการจำแนกข้อมูลเพื่อวินิจฉัยความเสี่ยงการเป็นโรคเบาหวานโดยใช้เทคนิคเหมืองข้อมูล 4 วิธี ซึ่งประกอบด้วย วิธีนาอีฟเบย์ (Naïve Bayes) วิธีซัพพอร์ทเวกเตอร์แมชชีน (Support Vector Machine) วิธีความใกล้เคียงกันที่สุด (K-Nearest Neighbor) และวิธีต้นไม้ตัดสินใจ (Decision Tree) โดยใช้ข้อมูลของผู้ป่วยโรคเบาหวานโรงพยาบาลสมเด็จพระยุพราชบ้านดุงสร้างชุดตัวแบบและชุดทดสอบตัวแบบ เป็นข้อมูลที่เกิดจากการทบทวนเวชระเบียนผู้ป่วยโรคเบาหวานย้อนหลัง จำนวน 1,435 ชุดข้อมูล 16 คุณลักษณะ จากนั้นทำการหาค่าความถูกต้องของแบบจำลอง (Accuracy) โดยใช้วิธี 10- Fold cross validation ผลการเปรียบเทียบพบว่า วิธีต้นไม้ตัดสินใจให้ค่าประสิทธิภาพสูงสุดโดยมีค่าความถูกต้อง 93.73% วิธีนาอีฟเบย์ค่าความถูกต้อง 88.92% วิธีความใกล้เคียงกันที่สุดและวิธีซัพพอร์ทเวกเตอร์แมชชีนค่าความถูกต้อง 86.97% และ 86.13% ตามลำดับ จะพบว่าวิธีต้นไม้ตัดสินใจมีประสิทธิภาพในการสร้างแบบจำลองมากที่สุดเมื่อเทียบกับวิธีที่ใช้เปรียบเทียบร่วมกัน เนื่องจากเป็นวิธีที่ไม่มีการแจกแจงหรือไม่ใช้พารามิเตอร์ซึ่งไม่ได้ขึ้นอยู่กับสมมุติฐานการแจกแจงความน่าจะเป็น อีกทั้งสามารถจัดการกับข้อมูลที่มีมิติสูงได้อย่างแม่นยำ เหมาะสมที่จะนำแบบจำลองไปพัฒนาระบบจำแนกข้อมูลเพื่อวินิจฉัยความเสี่ยงการเป็นโรคเบาหวาน เพื่อเป็นแนวทางในการสนับสนุนการตัดสินใจทางการแพทย์ในการวินิจฉัยความเสี่ยงการเป็นโรคเบาหวานต่อไป

This research aims to create a data classification model for diagnosing diabetes risk by using four data mining techniques, which are Naïve Bayes Method, Support Vector Machine Method, K-Nearest Neighbor Method, and Decision Tree Method. The study employed data on diabetic patients from Somdej Phra Yuparat Hospital, Ban Dung to create a model and a model test kit. The data was derived from a retrospective review of diabetes medical records of 1,435 data sets with 16 attributes. Then the accuracy of the model was determined using the 10-fold cross validation method. The decision tree method yielded the highest efficiency with 93.73% accuracy, Naïve Bay method of 88.92% accuracy, closest approximation, and support vector machine method accuracy values of 86.97% and 86.13% respectively. It was found that the decision tree method was the most efficient in modeling compared to the comparative approach. This is because it is a non-distribution or nonparametric method which does not depend on the probability distribution hypothesis. It can also handle high-dimensional data with precision. It is appropriate to use the model to develop a classification system for diagnosing diabetes risk and as a guideline to support medical decision-making in the diagnosis of diabetes risk.

Keywords

References

[1] X. Li, Z. Zhao, C. Gao, L. Rao, P. Hao, D. Jian, W. Li, H. Tang, and M. Li, “The diagnostic value of whole blood lncRNA ENST00000550337. 1 for prediabetes and type 2 diabetes mellitus,” Experimental and Clinical Endocrinology & Diabetes, vol. 125, no. 6, pp. 377–383, 2017.

[2] WHO and IDF. (2006, November). Definition and diagnosis of diabetes mellitus and intermediate hyperglycaemia; Report of a WHO/IDF consultation. [Online]. Available: https://www.who.int/diabetes/publications/ diagnosis_diabetes2006/en

[3] A. Petersmann, M. Nauck, D. Müller-Wieland, W. Kerner, U.A. Müller, R. Landgraf, G. Freckmann, and. L. Heinemann, “Definition, classification and diagnosis of diabetes mellitus,” Exp Clin Endocrinol Diabetes, vol. 126, pp. 406–410, July 2018.

[4] T. Daghistani and R. Alshammari, “Diagnosis of diabetes by applying data mining classification techniques,” International Journal of Advanced Computer Science and Applications, vol. 7, no. 7, pp. 329–332, July 2016.

[5] H. Wu, S. Yang, Z. Huang, J. He, and X. Wang, “Type 2 diabetes mellitus prediction model based on data mining,” Informatics in Medicine Unlocked, vol. 10, pp. 100–107, 2018.

[6] J. Tuomilehto, J. Lindström, J. G. Eriksson, T. T. Valle, H. Hämäläinen, P. Ilanne-Parikka, S. Keinänen-Kiukaanniemi, M. Laakso, A. Louheranta, and M. Rastas, “Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance,” New England Journal of Medicine, vol. 344, no. 18, pp. 1343–1350, 2001.

[7] K. Faranak, “Type2 diabetes mellitus prediction using data mining algorithms based on the long noncoding RNAs expression: A comparison of four data mining approaches,” BMC Bioinformatics, vol. 21, no. 1, pp. 372–386, 2020.

[8] Q. Zou, K. Qu, Y. Luo, D. Yin, Y. Ju, and H. Tang, “Predicting diabetes mellitus with machine learning techniques,” Front Genet, vol. 9, pp. 515–525, 2018.

[9] A. Kemal and S. Baha, “Diabetes mellitus data classification by cascading of feature selection methods and ensemble learning algorithms,” International Journal of Modern Education and Computer Science, vol. 10, no. 6, pp. 10–16, 2018.

[10] X.-H. Meng, Y.-X. Huang, D.-P. Rao, Q. Zhang, and Q. Liu, “Comparison of three data mining models for predicting diabetes or prediabetes by risk factors,” The Kaohsiung Journal of Medical Sciences, vol. 29, no. 2, pp. 93–99, 2013.

[11] V. Vijayan and A. Ravikumar, “Study of data mining algorithms for prediction and diagnosis of diabetes mellitus,” International Journal of Computer Applications, vol. 95, no. 17, pp. 12–16, 2014.

[12] B. Kakillioglu, R. Sharma, and V. Jindal, “Diabetes determination using retraining neural network,” presented at the International Conference on Artificial Intelligence and Data Processing (IDAP), Malatya, Turkey, 2018.

[13] Y. Hayashi and S. Yukita, “Rule extraction using Recursive-Rule extraction algorithm with J48 graft combined with sampling selection techniques for the diagnosis of type2 diabetes mellitus in the Pima Indian dataset,” Informatics in Medicine Unlocked, vol. 2, pp. 92–104, 2016.

[14] W. Bethany, Casey M. Rebholz, S. Yingyin, A. K. Lee, C. Josef, S. Elizabeth, and M. E. Grams, “Diabetes and trajectories of estimated glomerular filtration rate: A prospective cohort analysis of the atherosclerosis risk in communities study,” Diabetes Care, vol. 41, pp. 1646–1653, 2018.

Full Text: PDF

DOI: 10.14416/j.kmutnb.2022.10.004

ISSN: 2985-2145

Username
Password
Remember me

The Journal of King Mongkut's University of Technology North Bangkokวารสารวิชาการพระจอมเกล้าพระนครเหนือ

Abstract

Keywords

References