Building an AutoML system from scratch was one of the most challenging yet rewarding projects I've undertaken. I wanted full control and deep understanding of each stage, from data ingestion to model deployment.
Python, pandas, scikit-learn, xgboost, lightgbm, shap, optuna, Flask, React, Render
I handled missing values, categorical encoding, scaling, and train-test splits manually. Then trained models like Logistic Regression, Random Forest, XGBoost, LightGBM, SVM.
I used Optuna for efficient tuning. Each model had its own defined search space and used F1-score or AUC as the optimization metric.
All models were evaluated using cross-validation and ranked based on their performance. A leaderboard view was built to compare them.
SHAP was used to explain predictions and show feature importances, which made the system transparent and trustworthy.
I first created a CLI tool for loading the best model and predicting on new data. Later, I built a Flask app and migrated to React for a polished UI.
Note: This project helped me deeply understand and appreciate the internal mechanisms of modern AutoML systems.
You can explore the full code and project structure here: 🔗 github.com/custom-automl-repo