Student Performance Prediction System Using Machine Learning

Education is rapidly moving toward data-driven decision making, and one of the most impactful applications of artificial intelligence in academics is student performance prediction. This project was developed to predict whether a student is likely to pass or fail based on academic and behavioral factors such as study hours, attendance, assignment scores, and previous marks.

The main goal of the system is to help teachers and educational institutions identify weak students at an early stage so they can provide additional support before final examinations. Instead of relying only on manual observation, this system uses machine learning to analyze patterns in student data and generate accurate predictions.

The project was built using Python and implemented in Google Colab. Libraries such as Pandas and NumPy were used for handling and processing the dataset, while Matplotlib and Seaborn were used for data visualization and exploratory data analysis. Machine learning models were developed using Scikit-learn.

The dataset contains student academic records including study hours, attendance percentage, previous marks, assignment scores, internal marks, and the final result. These features help the machine learning model understand the learning behavior and academic consistency of students.

The project starts with data collection and preprocessing. During preprocessing, missing values are checked, column names are cleaned, and important features are selected. Numerical values are normalized using feature scaling to improve model accuracy and training performance.

Exploratory Data Analysis plays a major role in the project. Different visualizations were created to understand how study hours affect marks and how attendance impacts student performance. These graphs help in identifying trends and patterns within the dataset.

One of the advanced aspects of the project is the use of multiple machine learning algorithms instead of relying on a single model. Logistic Regression, Naive Bayes, Support Vector Machine, and Decision Tree algorithms were implemented and compared. In the improved version of the project, an ensemble learning approach was used by combining Logistic Regression and Naive Bayes through a Voting Classifier. This hybrid model improves prediction stability and overall performance.

The dataset was divided into training data and testing data using an 80-20 split strategy. The model learns patterns from the training dataset and is later evaluated using unseen testing data. This helps measure how accurately the system can predict student outcomes in real-world scenarios.

Several evaluation metrics were used to analyze model performance, including Accuracy, Precision, Recall, and F1-Score. A confusion matrix was also generated to visualize correct and incorrect predictions. These metrics help in understanding how reliable the prediction system is.

Another important feature implemented in the project is the identification of at-risk students. Students predicted as fail cases are automatically filtered and displayed separately. This allows institutions to focus on students who require academic intervention and mentoring.

To make the system interactive and user-friendly, a modern prediction interface was developed using Gradio. The interface allows users to enter student details such as study hours, attendance, and marks through sliders and instantly receive prediction results. The system also displays risk levels such as Low Risk, Medium Risk, and High Risk along with grade prediction.

The UI was designed differently from traditional student projects by using a modern card-based layout and intelligent prediction logic. The project not only predicts pass or fail status but also estimates grades and confidence levels based on student performance indicators.

One of the biggest strengths of this project is that it demonstrates how machine learning can support smarter education systems. By automating performance analysis, institutions can reduce manual effort, improve teaching strategies, and make informed academic decisions using data.

Although the project produces effective predictions, it also has some limitations. The accuracy of predictions depends on the quality of the dataset, and the system cannot fully capture emotional or psychological factors that may influence student performance.

Future enhancements can make the project even more powerful. Features such as real-time performance tracking, mobile application integration, personalized AI-based learning recommendations, and LMS integration can transform the system into a complete smart education platform.

Overall, the Student Performance Prediction System shows how artificial intelligence and machine learning can improve modern education by identifying weak students early, supporting teachers with data insights, and creating a smarter academic environment.

RESULTS