Please use this identifier to cite or link to this item: https://bspace.buid.ac.ae/handle/1234/1509
Title: Data mining approach to predict student's selection of program majors
Authors: SIDDARTHA, SHARMILA
Keywords: higher education
artificial neural networks
program administrators
institutional databases
academic performance
data mining
Issue Date: Jun-2019
Publisher: The British University in Dubai (BUiD)
Abstract: Students in higher education do not have access to sufficient information when selecting their program major. Program administrators cannot easily predict majors that will be undersubscribed early enough to take corrective actions. At the same time, institutional databases have large volumes of data relating to student demographic profiles, course grades and academic performance. There is an opportunity to apply data mining to arrive at a model to predict student selection of a major. The nature of academic data relating to student majors is multi class and imbalanced – there is always a niche major with few students enrolled. Hence this needs special considerations within the area of data mining. The purpose of this study is to develop a data mining approach for predicting student's selection of program majors. The approach includes a methodology to manage data mining projects, sampling techniques to handle imbalanced data and multiclass data, a set of classification algorithms to predict and measures to evaluate performance of models. The methodology used in this study is the systematic literature review to source, evaluate and synthesize current information in this domain and the CRISP-DM to deploy data mining activities. Several data mining techniques such as data exploration, visualization, sampling and evaluation are presented and applied to the academic data. Datamining experiments are deployed in RapidMiner using Decision Trees, Naïve Bayes, Random Forest, Support Vector Machines, Artificial Neural Networks and Gradient Boosted Trees. Balanced sampling, SMOTE – oversampling of minority classes is used to compare results using the confusion matrix, F1-score and the balanced accuracy. Cross validation is applied to train and test performance of models. Naïve Bayes, Decision Trees offered the best predictions across the different sampling techniques. This study presents an approach to design and deploy a data mining project that can be used as a basis for developing systems to enable the selection of student majors.
URI: https://bspace.buid.ac.ae/handle/1234/1509
Appears in Collections:Dissertations for Informatics (Knowledge and Data Management)

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.