Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: Shirin Pinjari, Nilesh Vani
DOI Link: https://doi.org/10.22214/ijraset.2023.56214
Certificate: View Certificate
The timely detection of diabetes is of paramount importance for improving patient outcomes and reducing the overall healthcare burden. This research paper delves into the use of a wide range of machine-learning algorithms to achieve accurate diabetes detection. By leveraging diverse methods including K-Nearest Neighbors (KNN), Logistic Regression, Naive Bayes, Linear Discriminant Analysis, Decision Tree, Random Forest, AdaBoost with Random Forest, and AdaBoost with Logistic Regression, this study enhances our understanding of how these algorithms perform in the field of diabetes diagnosis. The paper presents comprehensive experimental results and comparative analyses, shedding light on the strengths and limitations of each algorithm within this critical medical domain. The findings contribute to the ongoing effort to develop effective tools for the early detection and management of diabetes, ultimately benefiting both patients and the healthcare system.
I. INTRODUCTION
Diabetes, known as Diabetes Ailment (DA), is a metabolic disorder marked by persistent high levels of blood glucose, accompanied by disruptions in the metabolism of carbohydrates, fats, and proteins. This ailment encompasses three primary types: Type 1 DA, Type 2 DA, and gestational diabetes.
Type 1 DA arises due to the body's inability to generate insulin, necessitating insulin injections or the use of insulin pumps. This variant was formerly termed "insulin-dependent diabetes Ailment" (IDDA) or "juvenile diabetes."
Type 2 DA emerges from insulin resistance, a condition in which cells do not properly respond to insulin, often accompanied by insufficient insulin production. In the past, it was referred to as a non-insulin-dependent diabetes ailment (NIDDA) or "adult-onset diabetes."
The third major form, gestational diabetes, materializes when pregnant individuals without a prior diabetes diagnosis experience elevated blood glucose levels. It can potentially precede the development of Type 2 Diabetes Mellitus (DM). In the year 2000, it was estimated that approximately 171 million individuals worldwide were affected by diabetes, constituting about 2.8% of the global population. Among the various types of diabetes, Type 2 diabetes was the most prevalent on a global scale. Data from 2007 revealed that the five countries with the highest numbers of diagnosed diabetes cases were India (40.9 million), China (38.9 million), the United States (19.2 million), Russia (9.6 million), and Germany (7.4 million) [1].
With the increasing volume of unstructured diabetic data originating from the healthcare industry and various other sources, there is a pressing need to organize and quantify this data effectively. Technological advancements have made it possible to amalgamate robust diabetic data sharing and electronic communication systems, enhancing access to healthcare services for patients at all levels of care. This necessitates the consolidation of all patient data into a single repository. The deployment of a Health Information Exchange (HIE) serves as a solution, as it can collect clinical information from disparate sources and integrate it into a unified patient health record accessible securely by all care providers. Predictive Analysis, a method employing techniques from data mining, statistics, and game theory, leverages historical and current data alongside statistical and analytical models to forecast future events. In the healthcare sector, big data analytics can play a vital role in making significant predictions and informed decisions.
This paper introduces the use of predictive analysis algorithms within a Hadoop/Map Reduce environment to predict prevalent diabetes types, associated complications, and appropriate treatment methods. Through this analysis, the system aims to offer an efficient approach to diagnosing and care for patients, emphasizing factors like affordability and availability to achieve better patient outcomes. Diabetes mellitus stands as a chronic metabolic disorder, which has experienced a remarkable surge in global prevalence within recent decades. The consequences of diabetes going undiagnosed or being inadequately managed are profound, resulting in severe health complications and heightened mortality rates.
The field of machine learning, renowned for its capacity to decipher intricate data patterns, has emerged as a promising tool for timely disease detection and accurate diagnosis. This research zeroes in on the potential of a diverse spectrum of machine learning algorithms, harnessed to precisely pinpoint cases of diabetes. The central objective of this study is to assess the efficacy of diverse algorithms in the realm of diabetes diagnosis. Our investigation delves into an extensive array of algorithms, encompassing K-Nearest Neighbors (KNN), Logistic Regression, Naive Bayes, Linear Discriminant Analysis, Decision Tree, Random Forest, AdaBoost with Random Forest, and AdaBoost with Logistic Regression. Each algorithm brings forth a distinctive classification approach, and we intend to foster understanding through a comprehensive comparative analysis of their respective advantages and application scopes in the context of diabetes detection.
II. RELATED WORK
In the paper authored by P. Yasodha and M. Kannan [2], the authors employ classification techniques on various types of datasets to determine whether an individual has diabetes or not. The dataset used for diabetic patients is sourced from a hospital warehouse, comprising 249 instances with seven attributes. These instances in the dataset are categorized into two groups, namely blood tests and urine tests. In the work by N. Niyati Gupta, A. Rawal, and V. Narasimhan [3], the primary objective is to assess the accuracy, sensitivity, and specificity percentages of various classification methods. The study also involves comparing and analyzing the results of these classification methods in different software tools, including WEKA, Rapidminer, and Matlab, using the same parameters (accuracy, sensitivity, and specificity). The authors applied JRIP, Jgraft, and BayesNet algorithms for their analysis.
Both papers appear to focus on utilizing classification techniques and machine learning algorithms to address the issue of diabetes diagnosis and evaluation of classification model performance. The first paper uses a specific dataset from a hospital, while the second paper explores the performance of various classification methods across different software platforms.
III. Proposed Methodology
A. Data Collection
Gather a comprehensive dataset that includes medical records of individuals, including features such as age, gender, family history of diabetes, lifestyle factors (diet, exercise), and most importantly, clinical measurements like fasting blood glucose levels, HbA1c levels, BMI, etc.
B. Data Preprocessing
Handle missing data: Impute missing values using techniques like mean imputation or advanced imputation methods.
Normalize or standardize numerical features to bring them to a consistent scale.
Encode categorical variables into numerical values using techniques like one-hot encoding or label encoding.
C. Feature Selection/Engineering
Select relevant features using techniques like feature importance ranking, correlation analysis, or domain knowledge.
Create new features that may be informative, such as BMI categories or a diabetes risk score.
D. Model Selection
Choose appropriate machine learning algorithms for classification, such as logistic regression, decision trees, random forests, support vector machines, or deep learning models.
Experiment with different models to determine which one performs best for your specific dataset is explained below:
E. Dataset Description
It's great to have an understanding of your dataset before diving into building machine-learning models. Here's a breakdown of the information you've provided about your dataset:
V. FUTURE SCOPE
The present study opens avenues for further research and development in the domain of diabetes detection using machine learning. Some potential directions for future investigations include:
In this study, we embarked on an exploration of various machine learning algorithms for the critical task of diabetes detection. Through extensive experimentation, we gained insights into the performance and applicability of K-Nearest Neighbors (KNN), Logistic Regression, Naive Bayes, Linear Discriminant Analysis, Decision Tree, Random Forest, AdaBoost with Random Forest, and AdaBoost with Logistic Regression in diagnosing diabetes. Our findings revealed that each algorithm presents a unique approach to diabetes detection, with varying degrees of accuracy and interpretability. KNN exhibited competitive results by leveraging neighborhood information, while Logistic Regression provided a simpler yet effective model. Naive Bayes demonstrated its strengths in probabilistic modeling, while Linear Discriminant Analysis showcased its potential in capturing class separation. Decision Tree offered transparency in decision-making, and Random Forest excelled in ensemble-based classification. AdaBoost with Random Forest and AdaBoost with Logistic Regression showcased the power of boosting techniques in improving the performance of base classifiers.
[1] P. T. Katzmarzyk, C. L. Craig, and L. Gauvin, \\\"Adiposity, physical fitness, and incident diabetes: The physical activity longitudinal study,” Diabetologia, vol. 50, no. 3, pp. 538–544, Mar. 2007. [2] Z. Xu, X. Qi, A. K. Dahl, and W. Xu, “Waist-to-height ratio is the best indicator for undiagnosed type 2 diabetes,” Diabetic Med., vol. 30, no. 6, pp. e201–e207, Jun. 2013. [3] R. N. Feng, C. Zhao, C. Wang, Y. C. Niu, K. Li, F. C. Guo, S. T. Li, C. H. Sun, and Y. Li, \\\"BMI is strongly associated with hypertension and waist circumference is strongly associated with type 2 diabetes and dyslipidemia, in northern Chinese adults,” J. Epidemiol., vol. 22, no. 4, pp. 317–323, May 2012. [4] A. Berber, R. G´omez-Santos, G. Fangh¨anel, and L. S´anchez-Reyes, \\\"Anthropometric indexes in the prediction of type 2 diabetes mellitus, hypertension and dyslipidemia in a Mexican population,” Int. J. Obes. Relat Metab. Disorders, vol. 25, no. 12, pp. 1794–1799, Dec. 2001. [5] B. Balkau, D. Sapinho, A. Petrella, L. Mhamdi, M. Cailleau, D. Arondel, and M. A. Charles, D. E. S. I. R. Study Group, “Prescreening tools for diabetes and obesity-associated dyslipidemia: Comparing BMI, waist and waist-hip ratio. The D.E.S.I.R. Study,” Eur. J. Clin. Nutr., vol. 60, no. 3, pp. 295–304, Mar. 2006. [6] I. S. Okosun, K. M. Chandra, S. Choi, J. Christman, G. E. Dever, and T. E. Prewitt, “Hypertension and type 2 diabetes comorbidity in adults in the United States: risk of overall and regional adiposity,” Obes. Res., vol. 9, no. 1, pp. 1–9, Jan. 2001. [7] L. A. Sargeant, F. I. Bennett, T. E. Forrester, R. S. Cooper, and R. J. Wilks, \\\"Predicting incident diabetes in Jamaica: the role of anthropometry,” Obes. Res., vol. 10, no. 8, pp. 792–798, Aug. 2002. [8] N. T. Duc Son le, T. T. Hanh, K. Kusama, D. Kunii, T. Sakai, N. T. Hung, and S. Yamamoto, “Anthropometric characteristics, dietary patterns and risk of type 2 diabetes mellitus in Vietnam,” J. Amer. Coll. Nutr., vol. 24, no. 4, pp. 229–234, Aug. 2005. [9] G. T. Ko, J. C. Chan, C. S. Cockram, and J. Woo, \\\"Prediction of hypertension, diabetes, dyslipidemia or albuminuria using simple anthropometric indexes in Hong Kong Chinese,” Int. J. Obes. Relat. Metab. Disorders, vol. 23, no. 11, pp. 1136–1142, Nov. 1999. [10] M. B. Snijder, P. Z. Zimmet, M. Visser, J. M. Dekker, J. C. Seidell, and J. E. Shaw, “Independent and opposite associations of waist and hip circumferences with diabetes, hypertension and dyslipidemia: The AusDiab study,” Int. J. Obes. Relat. Metab. Disorders, vol. 28, no. 3, pp. 402–409, Mar. 2004. [11] B. J. Lee, B. Ku, J. Nam, D. D. Pham, and J. Y. Kim, “Prediction of fasting plasma glucose status using anthropometric measures for diagnosing type 2 diabetes,” IEEE J. Biomed. Health Information., vol. 18, no. 2, pp. 555–561, Mar. 2014. [12] L. de Koning, H. C. Gerstein, J. Bosch, R. Diaz, V. Mohan, G. Dagenais, S. Yusuf, and S. S. Anand, EpiDREAM Investigators, “Anthropometric measures and glucose levels in a large multi-ethnic cohort of individuals at risk of developing type 2 diabetes,” Diabetologia, vol. 53, no. 7, pp. 1322–1330, Jul. 2010. [13] I. S. Okosuna and J.M.Boltrib, “Abdominal obesity, hypertriglyceridemia, hypertriglyceridemic waist phenotype and risk of type 2 diabetes in American adults,” Diabetes Metab. Syndrome, vol. 2, no. 4, pp. 273–281, Dec. 2008. [14] Z. Yu, L. Sun, Q. Qi, H. Wu, L. Lu, C. Liu, H. Li, and X. Lin, “Hypertriglyceridemic waist, cytokines and hyperglycemia in Chinese,\\\" Eur. J. Clin. Invest., vol. 42, no. 10, pp. 1100–1111, Oct. 2012. [15] T. Du, X. Sun, R. Huo, and X. Yu, \\\"Visceral adiposity index, hypertriglyceridemic waist and risk of diabetes: The China health and nutrition survey 2009,” Int. J. Obes. (Lond.), vol. 38, no. 6, pp. 840–847, Jun. 2014. [16] M. Solati, A. Ghanbarian, M. Rahmani, N. Sarbazi, S. Allahverdian, and F. Azizi, \\\"Cardiovascular risk factors in males with hypertriglycemicwaist (Tehran lipid and glucose study),” Int. J. Obes. Relat. Metab. Disorders, vol. 28, no. 5, pp. 706–709, May 2004. [17] I. Lemieux, A. Pascot, C. Couillard, B. Lamarche, A. Tchernof, N. Alm´eras, J. Bergeron, D. Gaudet, G. Tremblay, D. Prud’homme, A. Nadeau, and J. P. Despr´es, “Hypertriglyceridemic waist: A marker of the atherogenic metabolic triad (hyperinsulinemia; hyper apolipoprotein B; small, dense LDL) in men?\\\" Circulation, vol. 102, no. 2, pp. 179–184, Jul. 2000. [18] L. B. Tank´o, Y. Z. Bagger, G. Qin, P. Alexandersen, P. J. Larsen, and C. Christiansen, “Enlarged waist combined with elevated triglycerides is a strong predictor of accelerated atherogenesis and related cardiovascular mortality in postmenopausal women,” Circulation, vol. 111, no. 15, pp. 1883–1890, Apr. 2005. [19] I. F. Gazi, T. D. Filippatos, V. Tsimihodimos, V. G. Saougos, E. N. Liberopoulos, D. P. Mikhailidis, A. D. Tselepis, and M. Elisaf, “The hypertriglyceridemic waist phenotype is a predictor of elevated levels of small, dense LDL cholesterol,” Lipids, vol. 41, no. 7, pp. 647–654, Jul. 2006. [20] J. St-Pierre, I. Lemieux, M. C. Vohl, P. Perron, G. Tremblay, J. P. Despr´es, and D. Gaudet, “Contribution of abdominal obesity and hypertriglyceridemia to impaired fasting glucose and coronary artery disease,” Amer. J. Cardiol., vol. 90, no. 1, pp. 15–18, Jul. 2002. [21] P. Blackburn, I. Lemieux, N. Alm´eras, J. Bergeron, M. Cˆot´e, A. Tremblay, B. Lamarche, and J. P. Despres, \\\"The hypertriglyceridemic waist phenotype versus the national cholesterol education program-adult treatment panel III and international diabetes federation clinical criteria to identify high-risk men with an altered cardiometabolic risk profile,” Metabolism, vol. 58, no. 8, pp. 1123–1130, Aug. 2009.
Copyright © 2023 Shirin Pinjari, Nilesh Vani. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET56214
Publish Date : 2023-10-18
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here