How I Built My Own AutoML System From Scratch

1st June, 2025

Building an AutoML system from scratch was one of the most challenging yet rewarding projects I've undertaken. I wanted full control and deep understanding of each stage, from data ingestion to model deployment.

✨ Why I Built My Own AutoML System

To understand what happens behind the scenes in AutoML tools
To control and customize every step
To enhance my portfolio with a real-world, modular project

🛠️ Tools & Technologies Used

Python, pandas, scikit-learn, xgboost, lightgbm, shap, optuna, Flask, React, Render

⚙️ System Architecture Overview

Data Ingestion
EDA
Preprocessing
Model Training
Hyperparameter Tuning
Evaluation & Leaderboard
Model Explainability (SHAP)
Prediction Interface

🏗️ Preprocessing and Modeling

I handled missing values, categorical encoding, scaling, and train-test splits manually. Then trained models like Logistic Regression, Random Forest, XGBoost, LightGBM, SVM.

🔍 Hyperparameter Tuning

I used Optuna for efficient tuning. Each model had its own defined search space and used F1-score or AUC as the optimization metric.

📊 Evaluation and Leaderboard

All models were evaluated using cross-validation and ranked based on their performance. A leaderboard view was built to compare them.

🧠 Model Explainability

SHAP was used to explain predictions and show feature importances, which made the system transparent and trustworthy.

🖥️ CLI & Web Interface

I first created a CLI tool for loading the best model and predicting on new data. Later, I built a Flask app and migrated to React for a polished UI.

🚧 Challenges Faced

Preprocessing edge cases (e.g., unseen categories)
Maintaining modularity
Optimizing tuning loops for speed

🚀 Future Enhancements

Add support for unstructured data
Integrate experiment tracking (MLflow/W&B)
Dockerize for scalable deployments

Note: This project helped me deeply understand and appreciate the internal mechanisms of modern AutoML systems.

📂 GitHub Repository

You can explore the full code and project structure here: 🔗 github.com/custom-automl-repo