On algorithm design for constrained optimization problems in machine learning

Speaker: 

Yue Xie

Institution: 

University of Hong Kong

Time: 

Thursday, June 20, 2024 - 3:00pm to 4:00pm

Host: 

Location: 

306

In this talk, I will focus on resolution of two important subclasses of constrained optimization: bound-constrained problems and linear programming. They are motivated by popular machine learning topics including nonnegative matrix factorization and optimal transport (OT). To resolve the former subclass, I will introduce a two-metric projection method which effectively exploit Hessian information of the objective function. This method inspires several algorithms including a projected Newton-CG equipped with optimal worst-case complexity guarantees, and an adaptive two-metric projection method designed to address l1-norm regularization. For the linear programming formulation of OT, I will discuss random block coordinate descent (RBCD) methods. A direct advantage of these methods is to save memory and we demonstrate its efficiency by comparison with competitors including the classical Sinkhorn algorithm.

Enhancing Model Efficiency: Applications of Tensor Train Decomposition in Machine Learning

Speaker: 

Eric Liu

Institution: 

SDSU and UCI

Time: 

Tuesday, May 28, 2024 - 3:00pm to 4:00pm

Location: 

RH 440R

The application of Tensor Train (TT) decomposition in machine learning models provides a promising approach to addressing challenges related to model size and computational complexity. TT decomposition, by breaking down high-dimensional weight tensors into smaller, more manageable tensor cores, allows for significant reductions in model size while maintaining performance. This presentation will explore how TT decomposition can be effectively used in different types of models.

TT decomposition is adopted differently in recurrent models, Convolutional Neural Networks (CNN), and Binary Neural Networks (BNN). In recurrent models like Long Short-Term Memory (LSTM), large weight matrices are transformed into smaller, manageable tensor cores, reducing the number of parameters and computational load. For CNNs, TT decomposition targets the convolutional layers, transforming convolutional filters into tensor cores to preserve spatial structure while significantly reducing parameters. In BNNs, TT decomposition is combined with weight binarization, resulting in extremely compact models that retain essential information for accurate predictions even with minimal computational power and memory.

The primary aim of this presentation is to explore the theoretical foundations and practical applications of TT decomposition, demonstrating how this technique optimizes various machine learning models. The findings suggest that TT decomposition can greatly enhance model efficiency and scalability, making it a valuable tool for a wide range of applications.

DeepParticle: learning multiscale PDEs with data generated from interacting particle methods

Speaker: 

Jack Xin

Institution: 

UCI

Time: 

Tuesday, April 30, 2024 - 3:00pm to 4:00pm

Location: 

440R

Multiscale time dependent partial differential equations (PDE) are challenging to  compute by traditional mesh based methods especially when their solutions develop  large gradients or concentrations at unknown locations.  Particle methods, based on microscopic aspects of the PDEs, are mesh free and self-adaptive,  yet still expensive when a long time or a resolved computation is necessary. 

We present DeepParticle, an integrated deep learning, optimal transport (OT),  and interacting particle (IP) approach, to speed up  generation and prediction of PDE dynamics through two case studies on transport  in fluid flows with chaotic streamlines:  

1) large time front speeds of Fisher-Kolmogorov-Petrovsky-Piskunov equation (FKPP); 

2) Keller-Segel (KS) chemotaxis system modeling bacteria evolution in the presence of a chemical attractant. 

Analysis of FKPP reduces the problem to a computation of principal eigenvalue of an advection-diffusion operator. A normalized Feynman-Kac representation  makes possible a genetic IP algorithm to  evolve the initial uniform particle distribution to a large time invariant measure  from which to extract front speeds. The invariant measure is parameterized  by a physical parameter (the Peclet number). We train a light weight deep neural network with local and global skip connections  to learn this family of invariant measures. The training data come  from IP computation in three dimensions at a few sample Peclet numbers. 

The training objective being minimized is a discrete Wasserstein distance in OT theory. The trained network predicts a more concentrated invariant measure at a larger Peclet number  and also serves as a warm start to accelerate IP computation. The KS is formulated as a McKean-Vlasov equation (macroscopic limit) of a stochastic IP system. The DeepParticle framework extends and  learns to generate various finite time bacterial aggregation patterns.

Subscribe to RSS - Machine Learning