Hybrid Missing Value Imputation Algorithm- KLR


  • Deepti Sharma, Rajneesh Kumar, Anurag Jain




The data mining process mainly deals with the estimation, prediction, pattern extraction, and classification in big databases. The presence of missing values in the dataset decreases the accuracy of data mining classifiers. Therefore it is necessary to deal with missing values in the dataset to achieve accurate results. To improve the quality of data and prediction accuracy in the classification process, the authors have proposed a new hybrid missing value prediction algorithm, KLR, by combining the KNN and linear regression approach. The proposed KLR algorithm has been used for class validation and missing values imputation. Wisconsin Breast Cancer Diagnostic Dataset of 569 instances with 32 attributes from the machine learning repository of UCI, Irvinewasused to conduct the study. The Pearson Coefficient Correlation method is used for feature selection.Data normalization is performed using Min-max scaling technique. The Scikit-learn library for machine learning in python is used to complete all the experiments as the experimental framework. The mean square error method is used to evaluate the performance of the model. The proposed KLR algorithm with 450 nearest neighbors out of 569 gives the lowest MSE ie 0.00188 and more accurately predicts the missing values as compared to the classic models.




How to Cite

Rajneesh Kumar, Anurag Jain, D. S. . (2022). Hybrid Missing Value Imputation Algorithm- KLR. Mathematical Statistician and Engineering Applications, 71(2), 60 –. https://doi.org/10.17762/msea.v71i2.67