Toward a more robust theory for the equations that underpin modern society.
In the age of Big Data and instant everything, says Martin Takáč, human activities are placing increasingly onerous demands on the invisible algorithm.
Algorithms are sets of rules or mathematical equations that instruct computers to solve problems. Algorithms today help us trade stocks, search the Internet and find the cheapest vacation deals. They feed us the news we want, guess our movie and music preferences, help us find dates and spouses, and help the government find us, or at least keep tabs on us without our knowing.
To perform these tasks, says Takáč, an assistant professor of industrial and systems engineering, computers must solve huge problems encompassing hundreds of terabytes of data. One terabyte is equal to 1,024 gigabytes. For perspective, a DVD contains less than 5 GB.
To make sense of such a vast amount of data, says Takáč, algorithms must be able to harness the power of hundreds or thousands of computers simultaneously.
“To write an algorithm that can interpret Big Data, you need coding that can run in parallel on many distributed computers. Computers cannot think. You have to tell them step by step what to do.”
Takáč, who joined the faculty last year, is a theoretician with a holistic interest in algorithms—in assessing their reliability and performance, in explaining and implementing them, and in using them to find optimal solutions to problems with billions of variables.
“My approach is to develop a theory that enables you to design a better algorithm and then implement the algorithm so it can be run on a computer. It’s also necessary to have a theory that tells you that an algorithm will always work and how many steps it will need in order to find a good solution.
“We try to develop a theory to prove and explain each step. We use software to develop an algorithm, then we test it and compare it to other algorithms. It is a successful algorithm only if it passes all these tests. You can propose anything; it’s quite another thing to propose something that will actually work better than existing algorithms and, moreover, to prove that it works.”
Takáč came to Lehigh from the University of Edinburgh in the United Kingdom, where he completed his Ph.D. last year. While there, he worked with the Singapore University of Technology and Design on a computer vision problem. The challenge was to find the one single theme, or topic, that was common to each of 5,000 to 10,000 images—without having a human being look at the images and describe what they contained.
“We had no idea what the images showed, but we could assign them to categories. This would be like examining thousands of articles from The New York Times and writing algorithms that assign these articles to categories such as politics or sports without human intervention.
“We designed and patented an algorithm that enabled computers to ‘see’ and distinguish very quickly among thousands of features.”
More recently, Takáč has tackled another project—designing a distributed algorithm for classical machine learning problems like the email and spam problem.
“Let’s say I need to go through thousands of emails in a short period of time, labeling them and assigning them to one category or another,” he says. “To accomplish this, I might need to track 90,000 different key words, or features. I would assign negative values to words like money and Viagra, say -1 for money and -100 for Viagra. Martin and meeting would both be assigned positive values of +20.
“An incoming email that does not mention meeting or money but does mention Viagra twice would thus have a value of -200, earning it a quick spam assignation. The next email might mention meeting once; it therefore has a value of +20 and is not spam.
“Our task is to write an algorithm to find the values for every feature and then make predictions as to whether a large quantity of emails is spam or not.”
Takáč’s papers have been published in Mathematical Programming, the International Journal of Numerical Analysis and Modeling, Advances in Neural Information Processing Systems, Operations Research Proceedings, the Proceedings of the International Conference on Machine Learning and other journals and proceedings.
In 2012, Takáč won the Best Student Paper Award from the INFORMS (Institute for Operations Research and the Management Sciences) Computing Society.
In 2013, he received the 16th Leslie Fox Prize (second prize) from the Institute of Mathematics and its Applications for a paper he wrote with his Ph.D. adviser, Peter Richtarik of the University of Edinburgh. The prize, which is awarded once every two years, honors numerical analysts under the age of 31 for mathematical and algorithmic brilliance in tandem with presentational skills.