Data mining approach to predict student's selection of program majors

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
The British University in Dubai (BUiD)
Students in higher education do not have access to sufficient information when selecting their program major. Program administrators cannot easily predict majors that will be undersubscribed early enough to take corrective actions. At the same time, institutional databases have large volumes of data relating to student demographic profiles, course grades and academic performance. There is an opportunity to apply data mining to arrive at a model to predict student selection of a major. The nature of academic data relating to student majors is multi class and imbalanced – there is always a niche major with few students enrolled. Hence this needs special considerations within the area of data mining. The purpose of this study is to develop a data mining approach for predicting student's selection of program majors. The approach includes a methodology to manage data mining projects, sampling techniques to handle imbalanced data and multiclass data, a set of classification algorithms to predict and measures to evaluate performance of models. The methodology used in this study is the systematic literature review to source, evaluate and synthesize current information in this domain and the CRISP-DM to deploy data mining activities. Several data mining techniques such as data exploration, visualization, sampling and evaluation are presented and applied to the academic data. Datamining experiments are deployed in RapidMiner using Decision Trees, Naïve Bayes, Random Forest, Support Vector Machines, Artificial Neural Networks and Gradient Boosted Trees. Balanced sampling, SMOTE – oversampling of minority classes is used to compare results using the confusion matrix, F1-score and the balanced accuracy. Cross validation is applied to train and test performance of models. Naïve Bayes, Decision Trees offered the best predictions across the different sampling techniques. This study presents an approach to design and deploy a data mining project that can be used as a basis for developing systems to enable the selection of student majors.
higher education, artificial neural networks, program administrators, institutional databases, academic performance, data mining