Amin Karbasi won NSF CAREER Award 2019: link
ABSTRACT: The difficulty of searching through a massive amount of data in order to quickly make an informed decision is one of today’s most ubiquitous challenges. Many scientific and engineering models feature data with inherently discrete characteristics, where discrete means that the data takes on a finite set of possible values. Examples of such data include phrases in text to objects in an image. Similarly, nearly all aspects of data science involve discrete tasks such as data summarization and model explanation. As computational methods pervade all aspects of science and engineering, it is of great importance to understand which discrete formulations can be solved efficiently and how to do so. Many of these problems are notoriously hard, and even those that are theoretically solvable may only be possible for only small amounts of data. However, the problems of practical interest are often much more well-behaved and possess inherent structure that allows them to be solved more efficiently. This CAREER award aims to substantially advance the frontiers of large-scale discrete optimization in data science and machine learning by developing fundamentally new algorithms. This project will also provide a number of educational opportunities such as outreach to local high school and middle school students through Yale’s Pathways to Science program.
Just as convexity has been a celebrated and well-studied condition under which continuous optimization is tractable, submodularity is a condition for which discrete objectives may be optimized. While current research in submodular optimization has led to fundamental breakthroughs in discrete mathematical programming, there is still a large gap between the theory and the limitations of the existing algorithms used by practitioners in the real world. In particular, most of the existing submodular optimization methods fail miserably when faced with the numerous sources of uncertainty inherent in machine learning tasks, from noise in the data to variability of the true objective. Moreover, submodularity is too strong an assumption for a variety of novel machine learning applications, necessitating the development of completely new algorithms. In order to lift current provable methods out of the sterile lab environment and scale them into the messy real world, it is important to carefully reexamine their limitations, consider more realistic but less perfect conditions, and develop correspondingly robust yet scalable algorithms. This CAREER project presents a research plan towards designing, analyzing, and evaluating new approaches for robust submodular optimization at a massive scale that leads to solving a broad array of optimization problems of significant practical importance. Furthermore, it addresses generalizations of submodular functions that widely broaden the applicability of these methods, moving to a realm beyond submodularity. The research directions in this project have deep and far-reaching societal benefits, as robust and scalable computational methods play a central role in nearly every scientific and industrial venture in today’s information age. Such advances are expected to play crucial roles in enabling data-driven scientific discoveries, promoting fairness in machine learning, and supporting STEM education by helping these communities handle the computational challenges associated with big data. The results of this project will be broadly disseminated to the greater scientific community through tutorials, workshops, and open-source software.
This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.