Learning to program is difficult and can result in high drop out and failure rates. Numerous research studies have attempted to determine the factors that influence programming success and to develop suitable prediction models.

The models built tend to be statistical, with linear regression the most common technique used. Over a three year period a multi-institutional, multivariate study was performed to determine factors that influence programming success. In this paper an investigation of six machine learning algorithms for predicting programming success, using the pre-determined factors, is described. Naïve Bayes was found to have the highest prediction accuracy.

However, no significant statistical differences were found between the accuracy of this algorithm and logistic regression, SMO (support vector machine), back propagation (artificial neural network) and C4.5 (decision tree). The paper concludes with a recent epilogue study that revalidates the factors and the performance of the naïve Bayes model.

Read Research Paper