Statisitical Learning for Data Mining Project Portfolio
  • Home
  • Bias and Variance in Linear Regression
  • Ensemble Learning Techniques for Fair Classification
  • Forecasting Property Valuations in a Mid-Sized U.S. City:A SHAP-Gain Feature Selection and ElasticNet-Ensembled Approach with Optuna-Tuned XGBoost
  • End-to-End Quant Pipeline for Equity Alphas:SHAP-Pruned Features with ElasticNet Ensembles and Optuna-Tuned Gradient Boosting

Srijith Reddy | Portfolio

Welcome to My Portfolio

Hi, I’m Srijith Reddy, a data science enthusiast currently exploring the intersection of statistics, optimization, and machine learning. This portfolio showcases three major projects from my Statistical Learning for Data Mining course—each grounded in rigorous modeling, evaluation, and interpretation. These projects reflect not just applied techniques, but a careful balance between theory and performance.

You can also check out my resume at: srijith-reddy.github.io/resume


Featured Projects

Property Valuation Modeling

Used Ridge Regression, LightGBM, and XGBoost to predict 2019 assessed property values from structured historical real estate data. Extensive feature engineering (including SHAP-based selection) and ensemble stacking yielded an RMSE of ~36K on the test set.

Fair NBA Draft Predictor

Developed a fairness-aware ensemble model (FairStacks) to predict NBA draft selection probabilities. Trained base models like Naive Bayes, LDA, SVM, and penalized logistic regressions, while optimizing a fairness-constrained loss to reduce TPR gap across school tiers.

Bias-Variance Tradeoff in Ridge Regression

Simulated performance of Ridge vs OLS across linear and nonlinear data-generating processes. Illustrated the bias-variance decomposition with clean visualizations to explain when and why Ridge outperforms OLS under regularization.

End-to-End Quant Alpha Pipeline

Built a full-stack research pipeline for equity alphas, integrating OHLCV, options, and macroeconomic data. Applied SHAP- and IC-based feature pruning, ElasticNet ensembles, and Optuna-tuned XGBoost/LightGBM models to forecast 1D/5D/21D forward returns. Extended the workflow with HMM-based regime detection and CVXPY portfolio optimization, achieving robust out-of-sample IC and Sharpe improvements.


About the Course

These projects were completed as part of STA 9890: Statistical Learning for Data Mining—a graduate-level course emphasizing predictive modeling, generalization error, model interpretability, and hands-on experimentation. The coursework combined theoretical insights with real-world data challenges.


Thanks for visiting!