Statisitical Learning for Data Mining Project Portfolio
  • Home
  • Bias and Variance in Linear Regression
  • Ensemble Learning Techniques for Fair Classification
  • Property Valuations in a Mid-Sized U.S. City:A SHAP-Gain Feature Selection and ElasticNet-Ensembled Approach with Optuna-Tuned XGBoost
  • End-to-End Quant Pipeline for Equity Alphas:SHAP-Pruned Features with ElasticNet Ensembles and Optuna-Tuned Gradient Boosting

Srijith Reddy | Portfolio

Welcome to My Portfolio

Hi, I’m Srijith Reddy, a data science enthusiast currently exploring the intersection of statistics, optimization, and machine learning. This portfolio showcases three major projects from my Statistical Learning for Data Mining course—each grounded in rigorous modeling, evaluation, and interpretation. These projects reflect not just applied techniques, but a careful balance between theory and performance.

You can also check out my resume at: srijith-reddy.github.io/resume


Featured Projects

Property Valuation Modeling

Used Ridge Regression, LightGBM, and XGBoost to predict 2019 assessed property values from structured historical real estate data. Extensive feature engineering (including SHAP-based selection) and ensemble stacking yielded an RMSE of ~36K on the test set.

Fair NBA Draft Predictor

Developed a fairness-aware ensemble model (FairStacks) to predict NBA draft selection probabilities. Trained base models like Naive Bayes, LDA, SVM, and penalized logistic regressions, while optimizing a fairness-constrained loss to reduce TPR gap across school tiers.

Bias-Variance Tradeoff in Ridge Regression

Simulated performance of Ridge vs OLS across linear and nonlinear data-generating processes. Illustrated the bias-variance decomposition with clean visualizations to explain when and why Ridge outperforms OLS under regularization.

End-to-End Quant Alpha Pipeline

Developed as a summer research project, this full-stack workflow builds and evaluates equity alpha factors across multiple time horizons. Integrated OHLCV, options, and macroeconomic data, and applied SHAP- and IC-based feature pruning, ElasticNet ensembles, and Optuna-tuned XGBoost/LightGBM models to forecast 1D/5D/21D forward returns. The pipeline was later extended with HMM-based regime detection and CVXPY portfolio optimization, achieving robust out-of-sample IC and Sharpe improvements.


More of My Work

Outside structured coursework, I enjoy building and experimenting with applied machine learning and generative AI systems — exploring how models learn patterns, make predictions, and create new possibilities from data. My GitHub repositories feature ongoing work in quantitative finance, model interpretability, and intelligent automation, along with experiments in LLM-based and generative workflows.

About the Course

These projects were completed as part of STA 9890: Statistical Learning for Data Mining—a graduate-level course emphasizing predictive modeling, generalization error, model interpretability, and hands-on experimentation. The coursework combined theoretical insights with real-world data challenges.


Thanks for visiting!