SMART: The Stochastic Monotone Aggregated Root-Finding Algorithm

Last night, I uploaded a new paper on the Stochastic Monotone Aggregated Root-Finding (SMART) algorithm. The algorithm excites me for a few reasons:

  1. SMART extends SAGA, SVRG, Finito, and SDCA, all of which solve problems like

    \displaystyle \text{minimize}_{x \in \mathbb{R}^d}\; \frac{1}{N} \sum_{i=1}^N f_i(x),

    to allow asynchronous parallel implementations, arbitrary block-coordinate updates, mini batching, and importance sampling.

  2. SMART replaces function gradients, {\nabla f_i}, with black-boxes, {S_i}, called operators, and arrives at the root-finding problem:

    \displaystyle \text{Find }x^\ast \in \mathbb{R}^d\text{ such that }\frac{1}{N} \sum_{i=1}^N S_i(x^\ast) = 0.

    For SMART to converge, these operators need only satisfy a weak property, which we call the coherence condition.

  3. Because SMART works with operators, it generates some new algorithms for large-scale optimization problems like

    \displaystyle \text{minimize}_{x \in \mathbb{R}^d}\; \frac{1}{M}\sum_{j=1}^M g_j(A_j x) + \frac{1}{n} \sum_{i=1}^N f_i(x),

    where the functions {g_j} are proximable and the maps {A_j} are linear—these problems are hot in machine learning right now.

In the coming weeks, I’ll devote some blog posts to implementations of SMART on problems like logistic regression, support vector machines, collaborative filtering, feasibility problems, and more. In the meantime, check out the paper; comments are welcome.

This material is based upon work supported by the National Science Foundation under Award No. 1502405. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.