Lehigh University participates in workshop on new developments in big data optimization algorithms, theory, applications and systems

After shopping at your favorite grocery store week after week, you finally earned a free turkey.

The cashier scanned your loyalty card at every checkout, rewarding you with points toward a holiday turkey or ham - while at the same time sending an itemization of everything you bought to a database.

The grocery analyzes the "big data" it collects from you and other shoppers to uncover hidden patterns, correlations and other insights. The result is smarter business moves, more efficient operations, higher profits and the promise of happier customers.

Researchers estimate that more than 2.5 exabytes (that's 2.5 billion gigabytes) of data are generated throughout the world every day. The use of loyalty cards, fitness trackers, web-based email, public services and social media - including every post, comment, like, photo and geotag - all contribute to this vast warehouse of information.

Big data involves not only collecting data, but storing, analyzing, searching, sharing, transferring, visualizing, querying and updating it. In fact, big data is so voluminous and complex that traditional ways of processing have proved inadequate. Hundreds or even thousands of computers running in parallel are needed for proper analyses.

To help address these computational bottlenecks, a team from the Industrial and Systems Engineering department at Lehigh University gathered with their colleagues at King Abdullah Univ. of Science and Technology in Saudi Arabia Feb. 5-7, 2018.

The KAUST Research Workshop on Optimization and Big Data brought researchers from across academia and industry to discuss big data optimization algorithms, theory, applications and systems.

Tamás Terlaky, the George N. and Soteria Kledaras '87 Endowed Chair professor, was the keynote speaker at KAUST. Terlaky opened the workshop with his presentation, "60 Years of Interior Point Methods: From Periphery to Glory."

Terlaky's keynote focused on a technique pioneered in 1984 known as Interior Point Method (IPM). This novel methodology ignited far-reaching, intensive research toward discovering effective ways to solve large-scale optimization problems such as those found in big data analytics.

"Increasingly, we are getting different kinds of solutions in optimization," Terlaky said. "Computation has become ubiquitous, and thanks also to the 'Interior Point Revolution' we have seen tremendous advances in computing."

The concepts of IPMs and "machine learning" - where computers acquire the ability to learn and make decisions - were first proposed in the '50s and were ahead of their time, Terlaky said. With computer technology still in its infancy, they failed to make any real impact. By the '80s, however, the stars aligned to make the IPM revolution possible.

Now that we are in the era of big data, Terlaky said, recent advances in computer and information technology both enables and requires revolutionary advances in machine learning methodologies. "History always repeats itself," Terlaky said. "You should learn from it."

Terlaky concluded the day's session with "A Polynomial-time Rescaled von Neumann Algorithm for Linear Feasibility Problems."

Furthering the discussion on optimization was Katya Scheinberg, Harvey E. Wagner Endowed Chair professor, with her presentation "Direct and Efficient Optimization of Prediction Error and AUC of Linear Classifiers." Also presenting was Martin Takáč, assistant professor, with "SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient."

Xi He, a fourth-year Ph.D. candidate, gave the poster presentation "An Inexact Regularized Stochastic Newton Method for Nonconvex Optimization." Takáč is his advisor.

Abstracts of the four talks and the poster presentation are given below.

**60 Years of Interior Point Methods: From Periphery to Glory
**

*Tamás Terlaky*

The basic concepts of Interior Point Methods (IPMs) were introduced by Frish in 1950s, and further developed in the 1960s, among others by Fiacco-McCormick (SUMT) and Dikin (Affince scaling). By the early 70s it was concluded that, mostly due to numerical instability, IPMs most probably will not be viable algorithms for solving large scale optimization problems. Karmarkar's 1984 paper and the subsequent "Interior Point Revolution" fundamentally changed the landscape of optimization. IPMs become the method of choice to solve large-scale linear optimization problems and new classes of conic and convex optimization problems become efficiently solvable. The new powerful algorithmic and software tools opened new areas of applications. In this talk we walk through the history of IPMs, highlight the scientific and computer technology advances that make the Interior Point revolution possible.

**A Polynomial-time Rescaled von Neumann Algorithm for Linear Feasibility Problems
**

*Tamás Terlaky*

The perceptron and von Neumann algorithms are known to be closely related, like duals. A deterministic rescaled version of the perceptron algorithm was proved to be polynomial by Pena and Soheil. Recently, Chubanov proposed a method which solves homogeneous linear equality systems with positive variables in polynomial time. Chubanov's method can be considered as a column-wise rescaling procedure. We adapt Chubanov's method to the von Neumann problem, and so we design a polynomial time column-wise rescaling von Neumann algorithm. This algorithm is the first variant of the von Neumann algorithm with polynomial complexity. Joint work with Dan Li and Kees Roos.

**Direct and Efficient Optimization of Prediction Error and AUC of Linear Classifiers
**

*Katya Scheinberg*

The predictive quality of most machine learning models is measured by expected prediction error or so-called Area Under the Curve (AUC). However, these functions are not used in the empirical loss minimization, because their empirical approximations are nonconvex and discontinuous, and more importantly have zero derivative almost everywhere. Instead, other loss functions are used, such as logistic loss. In this work, we show that in the case of linear predictors, and under the assumption that the data has normal distribution, the expected error and the expected AUC are not only smooth, but have well defined derivatives, which depend on the first and second moments of the distribution. We show that these derivatives can be approximated and used in empirical risk minimization, thus proposing a gradient-based optimization method for direct optimization of prediction error and AUC. Moreover, the proposed algorithm has no dependence on the size of the dataset, unlike logistic regression and all other well-known empirical risk minimization techniques.

**SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient
**

*Martin Takáč*

In this paper, we propose a StochAstic Recursive grAdient algoritHm (SARAH), as well as its practical variant SARAH+, as a novel approach to the finite-sum minimization problems. Different from the vanilla SGD and other modern stochastic methods such as SVRG, S2GD, SAG and SAGA, SARAH admits a simple recursive framework for updating stochastic gradient estimates; when comparing to SAG/SAGA, SARAH does not require a storage of past gradients. The linear convergence rate of SARAH is proven under strong convexity assumption. We also prove a linear convergence rate (in the strongly convex case) for an inner loop of SARAH, the property that SVRG does not possess. The convergence rate for convex and non-convex case is also discussed. Numerical experiments demonstrate the efficiency of our algorithm. See the paper published in Proceedings of Machine Learning Research.

**An Inexact Regularized Stochastic Newton Method for Nonconvex Optimization
**

*Xi He*

*February 15, 2018*