Breast Cancer Survival Prediction from Imbalanced Dataset with Machine Learning Algorithms

Authors

  • Aditi Kajala, Sandeep Jaiswal

Abstract

Breast cancer has surpassed heart disease as the leading cause of mortality among women. Analysis of the duration of the death of an individual after breast surgery can be used to forecast a patient's chances of surviving for a given period. Standard statistical approaches give predictions without elucidating the meaning of the forecast or the relationships between many factors that may affect the patient's survival. With SEER, a publicly available dataset, Shapely Additive Explanation (SHAP) feature of Machine learning algorithms is used to get the representation of predictions. Under-sampling and oversampling approaches are used to balance the imbalanced dataset. Support Vector Machine (SVM) model and Random over sampler outperformed all other machine learning methods and dataset balancing strategies respectively. The SVM model achieved the values of 1 for the precision and 0.9935 for the Area Under Curve (AUC) score.

Downloads

Published

2022-06-09

How to Cite

Sandeep Jaiswal, A. K. . (2022). Breast Cancer Survival Prediction from Imbalanced Dataset with Machine Learning Algorithms. Mathematical Statistician and Engineering Applications, 71(3), 167–172. Retrieved from https://philstat.org.ph/index.php/MSEA/article/view/125

Issue

Section

Articles