Performance Prediction Using Classification

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
The British University in Dubai (BUiD)
The use of classification as a data mining approach for performance prediction has been studied by many eminent researchers. The objective of this study is to determine the best classification models for predicting At Risk status of students in their first semester of an undergraduate degree program. A comprehensive evaluation requires that multiple models with different algorithms were analyzed using key performance measures. Principal component analysis and feature selection by weights using information gain ratio, Gini index, correlation and PCA is used to determine the relevant predictors of the datasets used. This study also addresses gaps in the current available literature on performance prediction, such as data imbalance and the use of Ensemble models. Sampling and weighting techniques were included using Rapid Miner operators for SMOTE, stratified, bootstrap sampling and weighting. Ensemble models using bagging, boosting and the vote operator in addition to Gradient Boosted trees and Random Forest were compared to the individual classifiers to measure model efficiency. The best models were then used to ascertain how early at-risk prediction can be employed using data on student performance in the course assessments. The results show that Ensemble models and the use of sampling and weighting clearly improves model performance. The early risk prediction as expected is most accurate with all the coursework and final grades in a semester. Interestingly the variance in the performance measure values are not very significant for some of the models and it can be concluded that early risk prediction can happen earlier in the semester when intervention and associated benefits of improving student performance is more probable.
performance prediction, undergraduate degree, student performance, key performance measures