Heart Disease Prediction using Machine Learning

Authors: Raunak Verma, Shashank Tandon, Mr. Vinayak

DOI Link: https://doi.org/10.22214/ijraset.2022.42687

Abstract

The term \"heart disease\" refers to any heart disease or condition that can cause heart problems. Cardiovascular disease (CVD) is the leading cause of death worldwide, taking many lives each year. CVD is a group of cardiovascular diseases and includes heart disease, cerebrovascular disease, rheumatic heart disease and other conditions. According to the World Health Organization (WHO), more than 17.9 million people worldwide die each year from coronary heart disease. If we take the example of India, every year the number of deaths due to heart disease has increased. Studies show that, from 2014 to 2019 the number of deaths from heart disease increased by 53%. Many threatening factors such as personal and work habits and genetic predisposition are major causes of heart disease. A variety of harmful habits such as smoking, alcohol and caffeine overdose, stress, and inactivity as well as other physical factors such as obesity, high blood pressure, high blood cholesterol, and pre-existing heart conditions are the main causes of heart disease. Over time, these harmful substances cause changes in the heart and blood vessels that can lead to heart attacks and strokes. Therefore, prevention of heart disease is very important to prevent these dangerous events and other potential complications of heart disease. Machine learning is a flexible part of AI that helps predict heart disease. In this research work, we will use the UCI database with 14 attributes to predict heart disease. The main goal of this study is to use ML algorithms to improve the heart disease prediction system and to more accurately predict these diseases in patients, thereby reducing the number of deaths by alerting patients.

Introduction

I. INTRODUCTION

The most important part of the human body is the heart. There are several types of heart disease. These heart attacks have been the leading cause of death in the past few years. According to the WHO, the number of deaths has risen sharply in the past few years. For example, this mysterious disease is even more worrisome as some qualified celebrities have died of heart disease in the past few years.

Therefore, the prognosis for heart disease should be accurate so that patients can be informed and deaths can be avoided in the future. In particular, these diseases are predictable when they disappear and thus become a major cause of death for patients due to inadequacy. This is the main reason why there is a need for analysis about algorithms that are compatible with predictor conditions for heart disease.

One of the most effective and efficient of these is Machine Learning, “Machine Learning is the process of managing and disclosing confidential information, previously unknown / known and potentially useful information”. Machine Learning is a large and deep field and its scope and use is increasing day by day.

Machine learning includes various categories of supervised learning, unsupervised learning, and integrated learning, which are used to predict and obtain the accuracy of a given database. We can use this knowledge to project cardiovascular disease prediction systems. Because it will help a lot of people.

It compares various machine learning algorithms such as Random Forest , Support Vector Classifier, K-Nearest Neighbour, and Decision Tree to find the most accurate model by analysing the cardiovascular disease database of the UCI repository. The purpose of this project is to test whether a patient can be diagnosed with cardiovascular disease based on medical characteristics such as gender, age, chest pain, fasting blood sugar, etc.

The database is selected from the patient history and eligible UCI database. We use this database to predict whether a patient has heart disease or not. To predict this, the medical characteristics of patient are used to classify the risk of cardiovascular disease. These treatment properties are trained on four algorithms Random Forest, Support Vector Classifier, K-Nearest Neighbour and Decision Tree. The best performance of these algorithms is KNN, which gives an accuracy of 89%. It will also ultimately determine if a patient is at risk for cardiovascular disease, and this approach will warn patients of the future and prevent the spread of the disease.

II. ITERATURE REVIEW

Gomathi et al. using the Naive Bayes and data mining techniques for decision-making to predict different types of diseases. They focus on predicting heart disease, diabetes and breast cancer. Results are taken from the metrics for confusion.
Miranda et al. a proposed way to divide the Naive Bayes to predict heart disease. The authors consider a few important risk factors for determining heart disease.
Himanshu st al. briefly discussed the big data set and the small data set to predict heart disease. They shared that a small data set takes less time to train and test and predict using the SVM and KNN algorithm. It has been talked about predicting heart disease and proves that some machine learning algorithms do not work better in predicting accuracy even though it creates good accuracy in combination.
Mohan ET explained that the prognosis of heart disease needs to be done very carefully and to be processed by different methods such as naïve Bayes, generic algorithm, decision tree and KNN. They also proposed a hybrid algorithm and obtained 88% accuracy using a hybrid algorithm.
Avinash Golande and ET. al .; learns various ML algorithms that can be used to differentiate heart disease. Research was conducted to study the algorithms of Decision Tree, KNN and K-Means that could be used to differentiate and its accuracy was compared. The study concluded that the accuracy obtained by the Decision Tree was very high and it was thought that it could be made more efficient by combining different techniques with parameter tuning.

III. DATA RESOURCE

A cardiac database from the UCI machine learning repository was used for testing. The database contains 14 attributes. There are 8 category attributes and 6 number attributes. The description of the database is shown in the table.

Patients from 29 to 79 years were selected from this database. Male patients are defined as gender 1 and female patients are defined as gender 0. Four types of chest pain can be considered as indicators of heart disease. Type 1 angina is caused by decreased blood flow to the heart muscle due to narrowing of the coronary arteries. Type 1 Angina is a chest pain that occurs during stress or depression. Non angina chest pain may be caused by a variety of factors and may not be usually due to a real heart condition. The fourth type, Asymptomatic, may not be a symptom of heart disease. The next trestbps feature is a relaxed blood pressure reading. Chol is the level of cholesterol. Fbs are the level of fasting blood sugar; value is rated as 1 if fasting blood sugar is less than 120mg / dl and 0 if higher. Restecg is a restorative electrocardiographic effect, thalach is a high rate of heart rate, exang exercise for angina rated as 1 if there is pain and 0 if there is no pain, oldpeak ST depression caused by exercise, slope, ca is the number of large vessels stained fluoroscopy, th is the duration of motion tests per minute, and num is the grade attribute. The class attribute has an average of 0 and 1 for patients diagnosed with heart disease.

IV. APPROACH METHODOLOGY

A. Classification Algorithms

A widely used supervised learning algorithm for predicting outcomes based on existing data is called organizing. In this research paper, we came up with a method for diagnosing heart disease using ML classification algorithms. The database is split into an average of eighty and twenty training and test sets, and individual delimiters are trained using the training database. The effectiveness of the divisor is evaluated against a test database. The performance of individual separators is described in the continuation section.

Random Forest: Random Forest algorithm is a supervised classification algorithmic technique. It can be used for both Classification and Regression problems in ML. It is based on the concept of ensemble learning, which is a process of combining multiple classifiers to solve a complex problem and to improve the performance of the model. Random Forest is a classifier that contains a number of decision trees on various subsets of the given dataset and takes the average to improve the predictive accuracy of that dataset.

In this algorithm, several trees create a forest. Each individual tree in random forest lets out a class expectation and the class with most votes turns into a model's forecast. In the random forest classifier, the greater number of trees give higher accuracy. The three common methodologies are:

a. Forest RI (random input choice)

b. Forest RC (random blend)

c. Combination of forest RI and forest RC

2. The KNN Algorithm: In K-NN algorithm a data point is taken whose classification is not available, then the number of neighbors, k is defined. After that k neighbors are selected according to the lowest Euclidian distance between the selected data points and their neighbors. The selected data point is then classified into a category, which is same as the category which has majority of neighbors among the K neighbors.

3. Support Vector Classifier: In multidimensional space, the SVC model represents the various stages of the hyperplane. The hyperplane is rendered by the SVC to minimize errors. In SVC, we divide the dataset into classes to obtain a high-density hyperplane (MMH).

4. Decision Tree Algorithm: A partitioning algorithm that operates on class and numeric data is the Decision Tree algorithm. It creates tree structures and is very easy to use and analyze data in tree graphs.

This algorithm helps to partition the data into two or more related sets based on the most important metrics. First we compute the entropy for each attribute, and then we partition the data by predictable information advantage or low entropy. The results obtained are easy to read and interpret. This algorithm is more accurate than other algorithms because it analyzes the data as a tree graph. However, the data is fragmented multiple times and only one attribute is checked at a time to make a decision.

V. RESULT AND ANALYSIS

The purpose of this research paper is to analyze the effectiveness of various machine learning algorithms and to predict the most accurate prediction of a patient's risk of heart disease. This study was conducted using the techniques of Random Forest, Decision Tree, Support Vector Machine, K-Nearest Neighbor in the UCI database. The set of data was divided into training and test data and models were trained and accuracy points were detected using Python. Comparison of the performance of the algorithms is shown below and their accuracy points are presented in the table.

Algorithm	Accuracy
K-Nearest Neighbor	89%
Support Vector Classifier	81%
Decision Tree Classifier	79%
Random Forest	82%

Conclusion

The overall purpose is to describe the various ML techniques that are useful in predicting heart disease. Effective and accurate predictions with a small number of characteristics and evaluation of the purpose of this study. The data was previously processed and used in the model. K-Nearest Neighbor with 89% , Random Forest with 82% and Support Vector Classifier with 81% algorithms working very well. However, the Decision Tree Classifier provides a slight accuracy of 79%. We can continue to expand this research that integrates other ML strategies such as time series, integration rules and integration with other integration strategies. Considering the limitations of this study, there is a need to use complexity and combination of models to achieve high accuracy in predicting early heart disease. The proposed system is GUI-based, user-friendly, scalable, reliable and an expandable system. The proposed working model can also help in reducing treatment costs by providing initial diagnostics in time. The model can also serve the purpose of training tool for medical students and will be a soft diagnostic tool available for physician and cardiologist. General physicians can utilize this tool for initial diagnosis of cardio-patients. There are many possible improvements that could be explored to improve the scalability and accuracy of this prediction system. As we have developed a generalized system, in future we can use this system for the analysis of different data sets. The performance of the health’s diagnosis can be improved significantly by handling numerous class labels in the prediction process, and it can be another positive direction of research.

References

[1] Seckeler MD, Hoke TR. The worldwide epidemiology of acute rheumatic fever and rheumatic heart disease. Clin Epidemiol. 2011;3:67. [2] Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N. Can machine learning improve cardiovascular risk prediction using routine clinical data? PLoS ONE. 2017;12(4):e0174944. [3] Ramalingam VV, Dandapath A, Raja MK. Heart disease prediction using machine learning techniques: a survey. Int J Eng Technol. 2018;7(2.8):684–7. [4] Pouriyeh S, Vahid S, Sannino G, De Pietro G, Arabnia H, Gutierrez J. A comprehensive investigation and comparison of machine learning techniques in the domain of heart disease. In: 2017 IEEE symposium on computers and communications (ISCC). IEEE. p. 204–207. [5] Mohan, S., Thirumalai, C., & Srivastava, G. (2019). Effective heart disease prediction using hybrid machine learning techniques. IEEE Access, 7, 81542-81554. [6] Goel R., Jain A. (2020) Improved Detection of Kidney Stone in Ultrasound Images Using Segmentation Techniques. In: Kolhe M., Tiwari S., Trivedi M., Mishra K. (eds) Advances in Data and Information Sciences. Lecture Notes in Networks and Systems, vol 94. Springer, Singapore. https://doi.org/10.1007/978-981-15-0694-9_58

Copyright

Copyright © 2022 Raunak Verma, Shashank Tandon, Mr. Vinayak. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET42687

Publish Date : 2022-05-14

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here