In recent times, heart disease prediction is one of the most complicated tasks in medical field. In the modern era, approximately one person dies per minute due to heart disease. Data science plays a crucial role in processing huge amount of data in the field of healthcare. As heart disease prediction is a complex task, there is a need to automate the prediction process to avoid risks associated with it and alert the patient well in advance. This project makes use of heart disease clinical dataset available in UCI machine learning repository. The proposed Work predicts the chances of heart disease and classifies patient’s risk level by implementing different data Mining techniques such as Decision Tree, Logistic Regression, support vector machine, k-Nearest Neighbor, Random Forest and Gradient Boosting. Thus, this paper Presents a comparative study by analyzing the performance of different machine learning algorithms. With the increasing number of deaths due to heart diseases, it has become mandatory to develop a system to Predict heart diseases effectively and accurately. The Aim of the study was to find the most efficient ML Algorithm for detection of heart diseases. This study compares the accuracy score of all above mentioned machine learning algorithms for predicting heart disease using UCI machine learning repository dataset. The result of this study indicates that the Random Forest algorithm is the most efficient algorithm with accuracy score of 85.24% for prediction of heart disease when Compared to all other Classification algorithms used in this analysis which will help to provide better Results and help health professionals in predicting the heart disease effectively and efficiently.
Introduction
I. INTRODUCTION
Heart Diseases have shown a tremendous hit in this modern age. As doctors deal with precious human life, it is very important for them to be perfect in their results. Thus, an application was developed which can predict the vulnerability of heart disease, given basic symptoms like age, gender, pulse rate, resting blood pressure, cholesterol, fasting blood sugar, resting electrocardiographic results, exercise induced angina, ST depression ST segment the slope at peak exercise, number of major vessels colored by fluoroscopy and maximum heart rate achieved. This can be used by doctors to recheck and confirm on their patient’s condition. In the existing surveys they have considered only 10 features for prediction, but in this proposed research work 14 necessary features are taken into consideration. Also, this paper presents a comparative analysis of machine learning techniques like Random Forest (RF), Logistic Regression, Support Vector Machine (SVM) in the classification of cardiovascular disease. By the comparative analysis, machine learning algorithm Random Forest has proven to be the most accurate and reliable algorithm and hence used in the proposed system. Coronary illness has the biggest level of passing on the planet. In 2012, around 17.5 million individuals kicked the bucket from coronary illness, implying that it comprises of the 31% of every single worldwide passing. Besides, coronary illness loss of life rises each year, It is relied upon to develop more than 23.6 million by 2030. The exploration from the January 2017 demonstrated that the main source of death worldwide is cardiovascular infections. The cardiovascular malady is considered as a world's biggest killer and is currently taking the top position in the record of ten reasons for passing in the previous 15 years and in 2015 was numeration for fifteen million passing. Various human lives could be spared by diagnosing on schedule. Along these lines, diagnosing the syndrome is significant and an exceptionally muddled undertaking. Mechanizing this procedure would conquer the issues with the diagnosis. The utilization of AI in ailment arrangement is normal and researchers are especially fascinated in the advancement of such frameworks for simpler following and analysis of cardiovascular diseases. Since ML permits PC projects to ponder from information, building up a model to perceive ordinary examples and having the option to settle on choices dependent on assembled data, it doesn't have hitches with the deficiency of utilized medicinal database. The proposed model is to amass significant information relating all components identified with coronary illness and parameters impacting it, train the information according to the proposed calculation of AI and foresee how solid is there a probability for a patient to get a coronary illness. The relationship with the diabetes related credits is considered to set up the impact.
The World Health Organization estimates that heart disease causes 12 million deaths worldwide each year. One of the main causes of illness and death among the world's population is heart disease. The prediction of cardiovascular illness is one of the most important subjects in the field of data analysis. The prevalence of cardiovascular disease has been rapidly increasing worldwide since a few years ago. Numerous research have been conducted in an effort to pinpoint the most crucial heart disease risk factors and precisely calculate the overall risk. Because it results in death without any overt symptoms, heart disease is sometimes known as the "silent killer." The ability to make decisions about lifestyle changes for high-risk individuals significantly depends on the early detection of cardiac disease, which reduces consequences. The vast amount of data produced by the healthcare industry has made machine learning an effective tool for prediction and decision-making. By evaluating patient data that uses a machine-learning algorithm to categorise whether a patient has heart disease or not, this study hopes to predict future cases of heart disease. Machine learning methods can be extremely helpful in this situation. There is a common set of basic risk factors that determine whether or not someone will ultimately be at risk for heart disease, despite the fact that heart disease can manifest itself in various ways. We may say that this technique can be very well adapted to accomplish the prediction of heart disease by gathering the data from many sources, classifying them under appropriate algorithms, and then analysing using the different dataset.
II. METHODOLOGY
This paper shows the analysis of various machine learning algorithms, the algorithms that are used in this paper are K nearest neighbors (KNN), Logistic Regression, Random Forest Classifiers, etc which can be helpful for practitioners or medical analysts to accurately diagnose Heart Disease. This paperwork includes examining the journals, published paper and the data of cardiovascular disease of the recent times. Methodology gives a framework for the proposed model. The methodology is a process which includes steps that transform given data into recognized data patterns for the knowledge of the users. The proposed methodology (Figure 1.) includes steps, where first step is referred as the collection of the data than in second stage it extracts significant values than the 3rd is the preprocessing stage where we explore the data. Data preprocessing deals with the missing values, cleaning of data and normalization depending on algorithms used. After pre-processing of data, classifier is used to classify the pre-processed data. The classifier used in the proposed model are KNN, Logistic Regression, Random Forest Classifier, etc. Finally, the proposed model is undertaken, where we evaluated our model on the basis of accuracy and performance using various performance metrics. Here in this model, an effective Heart Disease Prediction System (EHDPS) has been developed using different classifiers. This model uses 14 medical parameters such as chest pain, fasting sugar, blood pressure, cholesterol, age, sex etc. for prediction
Conclusion
The primary reason for conducting this study is to propose a model for predicting the development of heart disease. Additionally, the goal of this research is to determine the optimum classification method for detecting the likelihood of cardiac disease. Six classification algorithms, namely Logistic Regression, Decision Tree, and Random Forest, etc are employed at various levels of evaluations in a comparative study and analysis to support this work. Although these machine learning methods are widely utilised, predicting cardiac disease is a crucial task requiring the highest level of accuracy. Consequently, a variety of levels and assessment strategy types are used to evaluate these algorithms. This will enable scientists and medical professionals to create a better world.
Making forecasts and diagnosing ailments has never been simple for medical professionals when it comes to heart conditions. Due to this, people can take the necessary action to treat heart disease before it gets worse if it is discovered in its early stages anywhere in the world. The three main causes of heart disease-drinking alcohol, smoking cigarettes, and not exercising-have become serious issues in recent years. The health care industry has produced a substantial amount of data over time, which has made machine learning capable of providing effective outcomes in prediction and decision-making.
Our goal in this study is to identify the best factors that can improve the prediction accuracy of heart disease and finding the most effective variables to raise the accuracy of heart disease prediction. Evaluation criteria, namely accuracy, specificity, sensitivity, and area under the ROC curve, are employed to verify the efficacy of the proposed approach on a public dataset comprising patients of both genders. The primary benefits of applying machine learning for heart disease prediction are that it reduces the complexity of the doctors time, is patient- and cost-friendly, and manages the largest (enormous) amount of data through feature selection and the random forest algorithm. Early diagnosis of cardiovascular disease can help with lifestyle modifications for high-risk patients, which can lower complications and be a significant medical milestone.
References
[1] A comparative study on heart disease prediction using machine learning techniques\" by J. V. Eswari, M. Hemalatha, and S. Indumathi, published in the International Journal of Computer Applications in 2017.
[2] \"Heart Disease Prediction System Using Machine Learning\" by Nikhil Kumar Singh, V. K. Jain, and Manoj Diwakar, published in the International Journal of Scientific Research in 2017.
[3] \"Prediction of heart disease using machine learning algorithms\" by Bharath Bhushan Natarajan, Navjyoti Singh, and Karthik Balasubramanian, published in the International Journal of Engineering Technology Science and Research in 2016.
[4] \"Comparative Study of Machine Learning Algorithms for Predictive Analysis of Heart Disease\" by R. P. Santosh Kumar, M. Ravi Teja, and G. Lavanya, published in the International Journal of Advanced Research in Computer and Communication Engineering in 2017.
[5] \"Comparison of Data Mining Techniques for Predictive Modeling of Heart Disease\" by Priti Chandra, Tanupriya Choudhury, and Prashant Singh Rana, published in the International Journal of Computer Applications in 2012.
[6] \"Heart Disease Prediction Using Machine Learning Algorithms\" by Ankita Shukla and Bharti W. Gawali, presented at the 2019 International Conference on Communication and Electronics Systems (ICCES).