A Load Balancing Hierarchical Clustering Based Heart Disease Prediction Using Data Lake Architecture

Authors

  • Dilli Babu M., Sambath M.

DOI:

https://doi.org/10.17762/msea.v71i3.249

Abstract

The identification of illness is a vital and complex work in
medicine. The identification of heartdiseasefrom varied features
is a foremost concern which is not liberated from
counterfeitbeliefalong with unpredictable effects. The healthcare
industry congregates huge quantity of heart disease data that are
stored in Data Lake and the gathered information utilized for
effective diagnosing. Since the amount of stored data augment,
mining the data becomes more significant in loading, locating,
extracting, managing and querying information to offer improved
medical care to patients, thus resulting in efficient diagnosis of
the disease. The health care data is retrieved from varied data
sources and it is usually large scale. The existing architecture like
RDBMS related data management is futile. The Hadoop
framework offers solutions related to enormous data processing.
There exist cases of data missing during transit. The missing data
can be replaced using imputation method. In this paper, the
Expectation Maximization algorithm is used for preprocessing
and non negative matrix factorization with hierarchical clustering
(NMF- HC) algorithm is used for clustering. This paper also
proposes a load balancing algorithm for heterogeneous
MapReduce environment using the Hadoop simulator HSim. The
results demonstrate an enormous enhancement in the
performance of the simulated cluster. The proposed NMF-HC
algorithm produces promising result in predicting heart disease.

Downloads

Published

2022-06-09

How to Cite

Dilli Babu M., Sambath M. (2022). A Load Balancing Hierarchical Clustering Based Heart Disease Prediction Using Data Lake Architecture. Mathematical Statistician and Engineering Applications, 71(3), 870 –. https://doi.org/10.17762/msea.v71i3.249

Issue

Section

Articles