Welcome! This repository showcases my machine learning work completed during my internship with Elevvo. The projects here highlight my skills in data analysis, Python programming, Jupyter Notebook development, and end-to-end project execution.
This collection contains a variety of projects and notebooks developed during my internship. These projects demonstrate my hands-on experience with real-world datasets, analytical techniques, and code documentation.
- Objective: Predict students' exam scores based on study habits and other factors.
- Dataset: Student Performance Factors (Kaggle)
- Key Steps:
- Data cleaning and handling missing values.
- Exploratory Data Analysis (EDA): visualizations of study time, attendance, and exam scores.
- Feature selection and engineering.
- Train-test data split.
- Built a regression pipeline (StandardScaler + LinearRegression).
- Evaluated model performance (MSE, R², MAE) and visualized results.
- Skills Demonstrated: Data cleaning, visualization, regression modeling, performance evaluation.
- Objective: Forecast future sales based on historical Walmart sales data.
- Dataset: Walmart Sales Forecast (Kaggle)
- Key Steps:
- Data loading and aggregation to weekly sales.
- Feature engineering: time-based features (year, month, week), lag values, rolling means, cyclical encodings.
- Time series train-test split.
- Model training: Linear Regression and Random Forest Regressor.
- Evaluated and visualized actual vs. predicted sales.
- Skills Demonstrated: Time series forecasting, feature engineering, regression, model comparison.
- Objective: Predict house prices in Melbourne using machine learning regression techniques.
- Dataset: Mall Customer Segmentation Data (Kaggle)
- Key Steps:
- Downloaded the Mall Customers dataset from Kaggle and loaded it into a pandas DataFrame.
- Selected Annual Income and Spending Score as features and applied standard scaling for normalization.
- Used the elbow method to analyze inertia and decide the optimal number of clusters for K-Means.
- Applied K-Means clustering with the chosen number of clusters, assigned cluster labels to each customer, and visualized clusters with centroids.
- Reviewed and summarized the characteristics of each customer segment by calculating the mean income and spending score per cluster.
- Skills Demonstrated: Data exploration, feature selection, regression modeling, prediction, and Python with scikit-learn.
- Objective: Classify German traffic signs from images using deep learning.
- Dataset: GTSRB - German Traffic Sign Recognition Benchmark (Kaggle)
- Key Steps:
- Downloaded and preprocessed image data (resize, normalize, one-hot encode labels).
- Performed data augmentation to improve model generalization.
- Built a Convolutional Neural Network (CNN) using TensorFlow/Keras with multiple convolutional, pooling, dense, and dropout layers.
- Trained and evaluated the model using accuracy and confusion matrix.
- Visualized training history and performance.
- Skills Demonstrated: Deep learning, computer vision, CNNs, image preprocessing, TensorFlow/Keras.
- Jupyter Notebook (primary format)
- Python
- TensorFlow/Keras, scikit-learn, pandas, numpy, matplotlib, seaborn, OpenCV
- Browse the project folders and notebooks.
- Open
.ipynbfiles directly in Jupyter Notebook, JupyterLab, or GitHub for an interactive view. - Review the code, analysis, and results in each notebook.
I thought it was overall pretty fun to get a hands on experience with machine learning & AI! I hope this repository reflects my ability to solve problems, communicate findings, and write clean, effective code
Thank you for visiting!