top of page
Search
Writer's picturetejapolimetla050

------------------PROJECT------------------

Updated: Feb 7

H2O


HEART DISEASE PREDICTON

BY

MACHINE LEARNING......

ABSTRACT:

Heart disease is the major cause of deaths worldwide. To give treatment for heart disease, a lot of advanced technologies are used. In medical center it is the most common problem because many medical persons do not have equal knowledge and expertise to treat their patient so they deduce their own decision and as a result it shows poor outcome and sometimes lead to death.

To overcome these problems ,prediction of heart disease is being done by using machine learning algorithms and data mining techniques, it has become easy to perform automatic diagnosis in hospitals as they are playing vital role in this regard.

Heart disease can be predicted by performing analysis on patient’s different health parameters.

The main reason to solve the problem through the domain of machine learning is the availability of various learning techniques and algorithms . By which our model can produce accurate results.


INTRODUCTION:

ML plays a very important role to detect the hidden discrete patterns

and there by analyze the given data. After analysis of data ML techniques

help in heart disease prediction and early diagnosis.


Heart Disease is the leading cause of death in the world over the past 10 years.

In world 41% of the deaths are caused by HEART DISEASES.


Several Different Symptoms are associated with heart diseases, which makes

it difficult to diagnosis it quicker and better. Based on the symptom data given

to the machine the machine can predict it easier by using the respective

Algorithms and formulas.

MODEL WORKING PROCESS


DATA

We focus our work on three main datasets.

The first one comes from the 2017 Challenge from Physionet , which consists of a collection of 8828 recordings together with the corresponding labels (Normal, AF, Other and Noisy) given from expert cardiologists.


This dataset has been recorded through Alive or , a portable device able to record electrocardiograms. The second one comes from the 2019 Tianchi Hefei High-Tech Cup ECG Human-Machine Intelligence Competition , and contains 20019 observations across 5 different labels (Normal, Tachycardia, Bradycardia, Arrhythmia, AF) annotated by expert cardiologists.

The third one comes from Chapman University and Shoaling People’s Hospital and contains 10 646 observation with the corresponding labels (Bradycardia, Normal, AF, Tachycardia) given by expert cardiologists.


our work on three main datasets.



METHODS:

The third one comes from Chapman University and Shoaling People’s Hospital and contains 10 646 observation with the corresponding labels (Bradycardia, Normal, AF, Tachycardia) given by expert cardiologists.


In our work, we leverage the XG-Boost algorithm to train our models and the Optuna optimization framework to tune its parameters. Finally, we use SHAP to explain which features have the most explanatory power for each label present in the training data.


1)XG-Boost: XG-Boost is one of the most popular and performing tree-ensemble

methods for binary classification. It is based on an iterative procedure that leverages

a large number of trees.


2)Optuna: Optuna is a leading optimization framework leveraging Tree-structured

Parson Estimator (TPE) to optimize an objective function over a defined parameter

space.


3)SHAP: SHAP is a method to interpret Machine Learning models through a game

theory approach. This method is helpful to dig further into how the final predicted, so

that it highlights the most important features and explains how they drive the results in

an understandable way.


CONCLUSION


The correct prediction of heart disease can prevent life threats, and incorrect prediction can prove to be fatal at the same time. In this paper different machine learning algorithms and deep learning are applied to compare the results and analysis of the UCI Machine Learning Heart Disease dataset.

The conclusion which we found is that machine learning algorithms performed better in this analysis. Many researchers have previously suggested that we should use ML where the dataset is not that large, which is proved in this paper. The methods which are used for comparison are confusion matrix, precision, specificity, sensitivity, and F1 score. For the 13 features which were in the dataset, K-Neighbors classifier performed better in the ML approach when data preprocessing is applied.

The computational time was also reduced which is helpful when deploying a model. It was also found out that the dataset should be normalized; otherwise, the training model gets overfitted sometimes and the accuracy achieved is not sufficient when a model is evaluated for real-world data problems which can vary drastically to the dataset on which the model was trained.

THIS IS GROUP PROJECT THE ENTIRE PROJRCT IS DONE WITH THE CONTRIBUTION OF MY TEAM MEMBERS AND OUR PROJECT GUIDE (D.SREE LAKSHIMI) ALL THE MEMBERS GIVEN THEIR BEST FOR THIS PROJECT AS IT IS FINAL YEAR AND I FEEL LUCKY TO HAVE THEM AS MY TEAM MEMBERS.

TEAM LEADER ,DESIGNER,DATA ANALYST,PROGRAMMER:

DIVYA SRI

TEJA POLIMETLA

PATHURI SATHVIKA

TUMATI VIDYA CHARAN


95 views0 comments

Comments


bottom of page