Heart Disease Prediction Using Hybrid Model
Heart Disease Prediction Using Hybrid Model
ABSTRACT
Heart disease has emerged as a serious health concern for many individuals due to its high mortality rate
throughout the world. Detecting cardiovascular disorders including heart attacks, coronary artery diseases, etc.
by routine clinical data analysis is a critical task; early detection of heart disease may save many lives. The
application of machine learning techniques in the medical sector has advanced significantly. Many researchers
tried to predict heart disease using standard classification algorithm with feature selection. Many others
employed hyperparameter optimization using gridSearchCV, and even ensemble voting technique has been
used. A unique machine learning approach is put forth in the proposed work to forecast cardiac disease. The
UCI heart disease dataset was employed in the proposed study, and data mining techniques including
regression and classification were applied. Random Forest and Decision Tree machine learning algorithms
along with hyperparameter tuning are used. Three machine learning algorithms—Random Forest (RF),
Decision Tree (DT), and Hybrid Model (a hybrid of RF and DT) - are employed in the implementation. According
to experimental findings, the hybrid model's heart disease prediction accuracy rate is 85.61% and conclude that
hybrid models are better than standard classification models.
Keywords: UCI Heart Disease Dataset, Decision Trees, Random Forest, Hybrid algorithm, Machine learning,
Hyperparameter optimization.
I. INTRODUCTION
A lot of data may be studied and understood with the help of data mining. It is used to extract data and to
decide whether to move forward with additional applications. Data mining methods include clustering,
association rule mining, and classifications are the most often used methods. These data mining approaches can
be implemented using a wide variety of algorithms.
In the field of medical diagnostics, where computer analysis may reduce manual error and increase accuracy,
the use of machine learning is rapidly expanding. Through the use of machine learning techniques, disease such
as heart disease, liver disease, diabetes, and tumor predictions are made. Regression algorithms, such as
Random Forest, lasso, and logistic regressions, were employed in the medical sector.
According to survey results, cardiovascular diseases account for close to 17 million fatalities annually (CVD). If
patients take their prescribed medications on schedule, mortality can be decreased and many lives may be
saved by early disease identification. This study uses an automated medical diagnosis method to predict heart
disease using machine learning.
Dataset
Author Year Aim Methodology Result Limitation
Used
Each
technique
Palanippan, Combination of has a special
Sellappan, DT, NB, and NN advantage in
and Rafiah data mining achieving the
Awang approaches. specified
mining
goals.
Researchers are currently studying a wide variety of contemporary works on heart disease analysis and
prediction. The works mentioned below are a few examples.
Palaniappan, Sellappan, and Rafiah Awang [1] created a prototype Intelligent Heart Disease Prediction System
(IHDPS) by combining DT, NB, and NN data mining approaches. Results demonstrate that each technique has a
special advantage in achieving the specified mining goals.
Hashi, E.K. and Zaman [5] suggested use of a cognitive strategy for heart disease prediction. Five machine
learning algorithms are taken into consideration for prediction in this work, and each is accurately assessed. To
improve prediction outcomes, a logistic model tree is applied.
Dr. M. Kavitha, G. Gnaneswar, R. Dinesh, Y. Rohith Sai, R. Sai Suraj [6] developed a hybrid model as combination
of decision tree and random forest which outperforms the individual model.
A.Lakshmanarao, A. Srisaila, T.Srinivasa Ravi Kiran [7] proposed an ensemble classifier model for heart disease
prediction. Various classifier techniques with sampling techniques are applied and a good detection rate with
ensemble classifier is achieved.
III. METHODOLOGY
Data Sources
In this paper, UCI's machine learning repository's data on cardiac illness is processed. Researchers interested in
machine learning frequently view this dataset. There are 303 total examples in this collection, 164 of which
have heart-disease, and 139 are healthy, and there are around 14 clinical characteristics.
Data cleaning: Data cleaning is the first and most crucial step in the project's methods and data models. An
organised dataset is built using the gathered data. The data is coded in accordance with the attribute domain
value after the fields are identified, duplicates are removed, and missing values are filled in.
Hyperparameter Optimization: Selecting the best collection of hyperparameters for a learning algorithm is
known as hyperparameter optimization or tuning. In order to produce the best model that minimizes a
predetermined loss function on provided independent data, hyperparameter optimization seeks out a tuple of
hyperparameters.
Classification algorithm:
The design of the suggested system for predicting heart disease using machine learning algorithm models is
depicted in Figure 3 and is briefly explained below.
This dataset comes from UCI. It has been divided into training and testing sets. In order to fit the model, we
used 70% of the dataset as training data for the machine learning techniques. the remaining 30% as test results
for the prognosis of cardiac disease. We made use of the DT, RF, and Hybrid Model. For a 30% test input,
models are used to predict heart disease, and the predicted values are plotted and compared for accuracy.
IV. RESULTS AND DISCUSSION
To improve the work and novelty of the work, we implemented a hybrid model of Decision Tree and Random
Forest. The result shows that heart disease detection is effective using the hybrid model. Hyper-tuned Decision
Tree achieves around 75% accuracy, and Hyper-tuned Random Forest achieves 82.4% accuracy, Hybrid model
achieves 85.6% accuracy.
Table 2: Classification report of the models
Hybrid
3. 85.6% 84.5% 84.5% 85%
Model (DT+RF)
Figure 4: Classification report of Decision Tree Figure 5: Classification report of Random Forest
Hyper-tuned Decision Tree achieves around 74% precision ,73% F1-score, 74% Recall and Hyper-tuned
Random Forest achieves 84% precision, 83% F1-score, 82% Recall. Whereas Hybrid model achieves 84.5%
precision ,84.5% F1-score, 85% Recall. Thus, Hybrid model leads in every single term as compared to DT and
RF models.
V. CONCLUSION
One of the potentially fatal diseases that is prevalent around the world is heart disease. The threat to condition
increases as a result of changing lifestyles and a lack of physical activity. The medical sector offers a variety of
diagnostic procedures. However, machine learning is thought to be the best option in terms of accuracy. The
suggested approach employs a hybrid model that combines Decision Tree and Random Forest for the
prediction of heart disease. For this investigation, the UCI's machine learning repository's data on cardiac
illness is utilised. We have got a higher accuracy, 85.61% using hybrid model as compared to individual models
for the prediction of the heart diseases. Here, we have used a small dataset of 303 entries. Moreover, we have
only two standard classification model, RF and DT. In future, we can further improve this model using a large
dataset. We can employ the other models also, and can compare to find out the most accurate model.
VI. REFERENCES
[1] “A Knowledge-Based Clinical Decision Support System Utilizing an Intelligent Ensemble Voting Scheme for
Improved Cardiovascular Disease Prediction.” A Knowledge-Based Clinical Decision Support System Utilizing
an Intelligent Ensemble Voting Scheme for Improved Cardiovascular Disease Prediction | IEEE Journals &
Magazine | IEEE Xplore, ieeexplore.ieee.org/document/9530429. Accessed 15 Nov. 2022.
[2] “Efficient Medical Diagnosis of Human Heart Diseases Using Machine Learning Techniques With and
Without GridSearchCV.” Efficient Medical Diagnosis of Human Heart Diseases Using Machine Learning
Techniques With and Without GridSearchCV | IEEE Journals & Magazine | IEEE Xplore,
ieeexplore.ieee.org/abstract/document/9751602. Accessed 15 Nov. 2022.