Day 1 at IMA Workshop on Optimization and Parsimonious Modeling

This week, I’m participating in the IMA workshop on Optimization and Parsimonious Modeling in Minneapolis. This is my first time at IMA and in Minneapolis, and I’ve been pretty impressed by the university and the surrounding area, which is laid out like a vast brick fortress of restaurants—many of which are good or at least familiar to me. The IMA, in particular, seems to be really good at hosting workshops; all events are perfectly organized, and you get the feeling that the workshop organizers are living up to IMA standards, not the other way around.

Bernd Sturmfels: Eigenvectors of Tensors

Today we learned from Bernd how to define the rank, eigenvectors, and singular values of tensors, those multi dimensional arrays, which, in the 3x3x3 case, Bernd delightfully calls Rubik’s cubes. He mostly restricted himself to symmetric tensors, which are like symmetric matrices, in that all the dimensions are equal and the entries of the tensor are invariant under permutation of the indices. I found the definition of eigenvalues to be interesting because they’re connected to the stationary points of certain polynomials when minimized over a sphere. But to Bernd, the point of the talk was to convince all of us that the word variety is not scary.

Aleksandr Aravkin: Conjugate Interior Point Method for Large-Scale Problems

Alex told us about interior point methods for minimizing piecewise linear-quadratic (PLQ) functions—a class of functions that behaves well under several operations, for example, the sum of two PLQs is a PLQ, conjugates of PLQs are PLQs, and smoothed PLQs (e.g., Moreau envelopes of PLQs) are PLQs. Alex and many other collaborators implemented a general interior point solver for PLQs, called ipSolve that exploits properties, for example, sparsity, of the matrices and vectors involved. Because they exploit this structure, Alex can solve fairly large problems (e.g., with 4 million unknowns) on his laptop. Not content with current methods for choosing model parameters, which many times comes down to trial and error, Alex and his collaborators add a final cherry on top, as he says, which fuses their ipSolver with a hyperparameter selection tool.

Discussion 1

Rene Vidal’s flight was cancelled due to weather, so in place of his presentation, which was moved to Wednesday, all 60 or so participants at IMA introduced themselves, including their name and their interests, and if they were on the job market, they were given some extra time to advertise their work. My favorite introduction came from a professor (I didn’t catch his name) who said “I’m from Korea. Obviously I mean South Korea,” at which the room erupted in laughter, but on second thought, it actually made me reflect on how awful things might be in the Northern half of Korea.

Wotao Yin: Asynchronous Parallel Computing in Signal Processing and Machine Learning

Wotao Yin told us about how, in the 1990s, cpu clock speed determined the price and quality of computers—and in fact, that that such speed used to appear in advertisements. He contrasted this fact with several current computer advertisements, in particular for the Microsoft surface, which do not contain clock speed or any hardware specs whatsoever. He told us that Moore’s law has led Intel to manufacture, almost exclusively, multicore processors because single core processing power has effectively stagnated. We can only make faster algorithms, he says, if we learn to use parallel computing effectively. But, he says, parallel computing is most effective when there is low overhead in the software and the hardware layers.

One source of overhead is synchronization; I like to visual synchronization with a game called average time relay: imagine two teams of five people running on a ten lane track; lanes ones through five hold team one, and lanes six through ten hold team two. The team that wins is the one in which the average of their individual times is smallest. Team one is superstitious and has decided to place ten barriers on the track, and at each of these barriers, the whole team must stop and wait for the slowest runner; they call this system synchronization. In contrast, all members of team two just run straight for the finish line. Who do you think will win?

Wotao says that if the team members are parallel subproblems of a larger parallelizable optimization algorithm, he’d put his money on the team without the barriers. And this is what he and others do in their paper on the ARock algorithm. It appears to work quite well in practice, and it motivated me to introduce SMART.

Sebastian Bubeck: Revisiting Nesterov’s Acceleration

Nesterov’s acceleration of the gradient descent algorithm is mysterious to a lot of people, particularly because there appears to be little geometric intuition behind it. So Sebastian made a new accelerated gradient method based entirely on a geometric principle which involves intersecting closed balls. This new algorithm nearly achieves the rate of convergence of Nesterov’s method (due to an exact line search, there is a log^2(1/eps) instead of a log(1/eps)). I’ll leave it to Sebastian to explain the proof to you.

Discussion 2

Most of the discussion surrounded Bernd’s talk on eigenvectors of tensors. The most interesting question came from Bubeck who asked whether it is generally hard to find eigenvectors of tensors in which each entry of the tensor is gaussian. The answer appears to be unknown.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s