Speaker: 

Bin Yu

Institution: 

U Berkeley, Statistics Dept

Time: 

Monday, May 24, 2010 - 4:00pm

Location: 

RH 306

Information technology has enabled collection of massive amounts of data in science, engineering, social science, finance and beyond. Extracting useful information from massive and high-dimensional data is the focus of today's statistical research and practice. After broad success of statistical machine learning on prediction through regularization, interpretability is gaining attention and sparsity is being used as its proxy. With the virtues of both regularization and sparsity, sparse modeling methods (e.g. Lasso) has attracted much attention for theoretial research and for data modeling.

In this talk, I would like to discuss both theory and pratcice of sparse modeling. First, I will present some recent theoretical results on bounding L2-estimation error (when p>>n) for a class of M-estimation methods with decomposable penalities. As special cases, our results cover Lasso, L1-penalized GLMs, grouped Lasso, and low-rank sparse matrix estimation. Second, I will present on-going research on "word-imaging" supported by an NSF-CDI grant. This project employs sparse logistic regression to derive a list of words ("word-image") that associate with a particular word (e.g. "Microsoft") in paragraphs of New York Times articles. The validity of such a list is supported by human subject experiment results when compared with some other methods.