It’s not the last day of the workshop, but it’s my last day here in snowy Minneapolis. I’ve had a fun time, and my trip is ending on a good note because of today’s good talks.
Michael Overton: Crouzeix’s Conjecture
Michael told us about Crouzeix’s conjecture which is an inequality, conjectured to be true by Michel Crouziex, relating the spectral norm of a polynomially transformed matrix to the maximum of that same polynomial over the field of values of that matrix, which is the set of complex numbers v^TAv ranging over all unit complex vectors v (here, I use T to denote conjugate transpose). This conjecture has little bearing on parsimonious modeling, but it’s an example where, to a surprising degree of success, nonsmooth analysis is used to prove properties of a function, called Crouzeix’s ratio, which suggest the verity of Crouzeix’s Conjecture and where nonsmooth BFGS seems to work quite well.
Lek-Heng Lim: Grothendieck Constant and Structured Matrix Computations
Lek-Heng wants to determine the minimal number of operations needed to, for example, multiply two matrices together, and after he finds the minimal number of operations, he’d like to give us an algorithm that achieves that number. We still lack an understanding of general matrix-matrix multiplication, but for structured matrices, for example, circulant matrices or Toeplitz matrices, Lek-Heng has the answer. The intriguing processes by which he arrives at these answers takes two steps: (1) represent the matrix-matrix multiplication rule as a bilinear map, which itself can be viewed as a tensor, and (2) compute the rank of that tensor, which allows you to both measure the complexity of the multiplication and provides an algorithm to compute the multiplication. You see, computing the rank of a tensor means simplifying it into a linear combination of simpler tensors, which suggests an algorithm. And the fewer the tensors used in this decomposition, the fewer the operations needed to compute that tensor.
Laura Balzano: Low-rank Matrix Completion under Monotonic Transformation
Laura presented an issue with matrix completion, which, briefly stated, is the task of partly observing a matrix, and then, filling in the unobserved entries of the matrix. We don’t want just any completion of the observed matrix entries, we want a low-rank one because…Occam’s Razor. But there is a problem with this approach: when we apply matrix completion in practice, our solution must be a matrix of integers, for example, each entry may correspond to a ranking (1 to 4 stars) that some user assigned some product in Netflix’s movie catalogue; but, Laura says, matrices of integers are not low-rank! So she changes the model. Laura’s new model assumes that a Lipschitz transformation of the matrix, not the original matrix, is low rank. Unfortunately, we do not often know the Lipschitz mapping in advance, so Laura proposes the scariest optimization problem of the workshop, one that requires optimizing over a set of Lipschitz monotonic functions. She provides a few heuristics, which don’t actually solve the problem under consideration, but provide interesting experimental results nonetheless.
Michael Friedlander: Level-set Methods for Convex Optimization
Michael showed us how to solve constrained optimization problems, for example, one in which we minimize a function f subject to another function g being less than sigma, by approximately solving a series of flipped optimization problems in which we minimize g subject to f being less than tau; the series of optimization problems result from varying tau, and ultimately, we’re searching for the tau for which the minimum value of g subject to f being smaller than tau is sigma, i.e., we want to tau to be the optimal value of the original optimization problem—that’s a root finding problem, hence the title of the talk. The novelty of this new approach, which has been building up in several of Michael’s and his coauthor’s older papers, is to make the root-finding algorithm practical, so they propose inexact newton and secant algorithms for finding the root.
Rachel Ward: Globally Optimal and Robust k-means Clustering via Convex Optimization
In her talk, Rachel revisited the k-means clustering problem. She works from the assumption that the minimizers of the k-means clustering problem are good clusterings. And from this assumption, she proposes a generative model of clustering, which neatly captures the types of clustering that those k-means minimizers should cluster appropriately, namely translates of isotropic distributions with sufficiently spread out cluster centers. Then, generative model in hand, she proposes a convex relaxation of the nonconvex k-means objective function, that has some nice theoretical properties, in particular, solutions to convex problem can provide certificates of optimality for the nonconvex k-means clustering problem. Rachel emphasizes that more work is left to be done, though, namely because her relaxation is a huge semi-definite optimization problem which will take a lot of time to solve.
René Vidal: Global Optimality in Matrix and Tensor Factorization, Deep Learning, and Beyond
René explained to us how deep learning and matrix factorization are a lot alike. In matrix factorization, we want to factor a matrix X approximately into product of matrices UV, but in (admittedly shallow) deep learning, we want to factor a matrix X into psi(U) V, where psi is some nonlinear function. And with this analogy, René proposed several deep learning models in which all local minimizers are global minimizers, by associating a convex problem to the notoriously nonconvex deep learning problem such that both problems have the same local minimizers. The key to the association is the analogy with matrix factorization; indeed, a similar correspondences is already known there. I think we optimizers have yet to assess the implications of René’s talk, who admits he doesn’t have the background to completely tackle the problem, but for sure, everyone sat on the edge of seats throughout the talk, frequently interrupting René with deep interest.