Adaptive and Scalable Nonparametric Estimation via Stochastic Optimization

Event Date: 

Friday, January 24, 2025 - 3:30pm to 4:45pm

Event Date Details: 

Friday, January 24th, 2025

Event Location: 

  • HSSB 1174

Event Price: 

FREE

Event Contact: 

Dr. Tianyu Zhang 

Carnegie Mellon University 

  • Department Seminar Series

Abstract

Nonparametric procedures are frequently employed in predictive and inferential modeling to relate random variables without imposing specific parametric forms. In supervised learning, for instance, our focus is often on the conditional mean function that links predictive covariates to a numerical outcome of interest. While many existing statistical learning methods achieve this with optimal statistical performance, their computational expenses often do not scale favorably with increasing sample sizes. This challenge is exacerbated in certain “online settings,” where data is continuously collected and estimates require frequent updates.

In this talk, I will discuss a class of nonparametric stochastic optimization methods. The estimates are constructed using stochastic gradient descent (SGD) over a function space of varying capacity. Combining this computational approach with compact function approximation strategies—such as utilizing eigenfunctions in a reproducing kernel Hilbert space—certain nonparametric estimators can attain both optimal statistical properties and minimal (computational) space expense. Additionally, I will introduce a rolling validation procedure, an online adaptation of cross-validation, designed for hyperparameter tuning. This model selection process naturally integrates with incremental SGD algorithms, imposing a negligible extra computational burden.

 

 

Bio: 

Tianyu Zhang is a postdoctoral research at the Department of Statistics & Data Science at Carnegie Mellon University. He received his PhD in 2022 from the University of Washington. His current research interests are model selection, nonparametric statistical learning, and high-dimensional statistical problems motivated by genomic data.