An Approximate Anytime Hierarchical Clustering Earth Mover’s Distance (AAHC-EMD)

dc.contributor.advisorProf Sherief Abdallah
dc.contributor.authorABDEL SALAM, MINAT ALLAH ESSAM HAMZA
dc.date.accessioned2026-02-19T05:18:42Z
dc.date.issued2025-06
dc.description.abstractFlow Cytometry (FC) is a crucial tool for analysing soluble substances such as blood, where added biomarkers help highlight any available abnormalities or diseases. This results in numerical datasets that are analysed to assess similarities or dissimilarities between samples. However, applying machine learning to hierarchical FC datasets presents challenges due to their two-level structure: the top level contains blood cells, while the bottom level holds cell attributes. In this study, the dataset was reduced from 30,000 cells to 2,500 and from 8 to 4 attributes per sample to manage time consumption. Despite this reduction, the analysis still required handling 10,000 attributes at the lower level, which is computationally impractical. While dimensionality reduction could help, it risks losing critical information. This thesis proposes treating each sample as a cluster configuration, using Earth Mover’s Distance (EMD), which is robust to instrumental drift but computationally expensive. The solution employs an Approximate Anytime Hierarchical Clustering-based EMD lower bound (AAHC-EMD) algorithm to calculate similarity by measuring the distance between cluster centroids instead of individual cells. This method ranks testing samples to each query at each stage of the hierarchy, reducing computational time by 48% to 89%, and achieves 100% ranking accuracy when given more time. Using the same approach also identifies the Best-Fit testing sample for each query from a list of 20 testing samples, producing 100% accuracy and a 72.5% time saving compared to traditional EMD. This approach improves diagnostic precision and computational efficiency in analysing complex hierarchical datasets such as FC. Keywords: Anytime Algorithm, Earth Mover’s Distance, Flow Cytometry, Hierarchical Clustering, k-means, Lower bound.
dc.identifier.other2015246091
dc.identifier.urihttps://bspace.buid.ac.ae/handle/1234/3822
dc.language.isoen
dc.publisherThe British University in Dubai (BUiD)
dc.subjectanytime algorithm
dc.subjectearth mover’s distance
dc.subjectflow cytometry
dc.subjecthierarchical clustering
dc.subjectk-means
dc.subjectlower bound
dc.titleAn Approximate Anytime Hierarchical Clustering Earth Mover’s Distance (AAHC-EMD)
dc.typeThesis

Files