Type 2 Diabetes Mellitus Automated Risk Detection Based on UAE National Health Survey Data: A Framework for the Construction and Optimization of Binary Classification Machine Learning Models Based on Dimensionality Reduction
MetadataShow full item record
Machine Learning (ML) saw a great increase in general and domain specific research. ML in bioinformatics and epidemiology in particular grew drastically, powered by the proliferation of Electronic Medical Records (EMR) in healthcare systems worldwide and the efficiency of new programmatic and computational tools supporting Artificial Intelligence (AI) application. This research motivated by the unprecedented increase in diabetes and specifically Type 2 Diabetes Miletus (T2DM), proposes two significant contributions. The first is a comprehensive ML framework for the construction of diagnostic binary classification high accuracy models to predict T2DM in the United Arab Emirates based on STEPS style National Health Survey. The second major contribution is the design and construction of a Logistic Regression (LR) ML binary classification model with an accuracy of 87% and F1-score of 89%. A special consideration was given to data pre-processing and dimensionality reduction such Chi Squared (CS) and Recursive Feature Elimination (RFE) to improve progressively the proposed models performance. LR with the reduced feature set using the intersection between CS and RFE proved to be the best model among the tested algorithms. This model can be used in a clinical setting as a decision support system or for public health awareness as an informal risk prediction system. Many people can find difficulty accessing diagnostic healthcare services for many reasons including, but not limited to economical and regional factors. ML based informal diagnostic and decision support systems can provide a first line of detection to alert patients about potential disease risk. Early alert of T2DM risk using a free ML tool can help the patients and healthcare workers to manage the disease as early as possible, reducing the risk of complication and financial burden.