A Narrative Review on Machine Learning Methods for Pre-Malignant Blood Cells Identification in Hematological Malignancies
Samuel Bamidele Afolabi
*
Federal University, Oye-Ekiti, Nigeria.
Tobechi Brendan Nnanna
Aston Pharmacy School, College of Life and Health Sciences, Aston University, Birmingham B4 7ET, UK.
Glory Ojoma, Simon
Department of Information Technology, AI, Data Science and Machine Learning, Halmstad University, Halmstad, Sweden.
Damian Ndubuisi Nwajei
Department of Microbiology, University of Port Harcourt, Port Harcourt, Nigeria.
Victor Damilare Oladele
Department of Biomedical Science, University of Salford, Salford, UK.
Adibia, Umoroye Nathan
Department of Microbiology, University of Port Harcourt, Port Harcourt, Nigeria.
Tobiloba Philip Olatokun
MPH Environmental Health, University of Illinois Springfield, Springfield, Illinois, USA.
*Author to whom correspondence should be addressed.
Abstract
Hematological malignancies, including leukemias, lymphomas, myelomas, and myelodysplastic syndromes, impose a substantial global health burden, accounting for approximately 10% of all new cancer diagnoses worldwide. Early identification of pre-malignant blood cells a critical window for preventive intervention remains clinically challenging due to the limitations of conventional diagnostic tools such as light microscopy, flow cytometry, and next-generation sequencing, none of which is individually optimized for risk stratification prior to overt disease manifestation. This review examines machine learning (ML) approaches for classifying pre-malignant blood cells, synthesizing evidence from 25 studies encompassing 38,417 participants across diverse clinical settings. Ensemble methods and Random Forest algorithms demonstrated consistently strong discriminative performance, achieving AUC-ROC values ranging from 0.856 to 0.932. Multi-omics integration combining morphological, immunophenotypic, genetic, and epigenetic data systematically outperformed single-domain approaches, underscoring the biological complexity of pre-malignant transformation. Key predictive biomarkers identified across studies included CD34 expression levels, telomere length attrition, and TP53 mutation status, consistent with established pathways of clonal hematopoietic evolution. Despite these promising findings, significant methodological limitations were identified: external validation was reported in only 44% of studies, and open-source code availability was documented in just 40%, raising concerns about reproducibility and generalizability. Additionally, most training cohorts lacked demographic diversity, limiting applicability across varied populations. Successful translation of ML-based pre-malignant cell classification into routine clinical practice will require prospective validation trials, standardized reporting frameworks aligned with existing diagnostic criteria, and the development of ethnically and geographically diverse training datasets.
Keywords: Machine learning, hematological malignancies, pre-malignant blood cells, cancer risk stratification, multi-omics integration