Title: Modern Challenges for Machine Learning Applications of Submodularity: Privacy, Scalability, and Sequences

Authors: Marko Mitrovic and Amin Karbasi

Marko 1 Marko 2 Marko3 Marko 4

Student: Marko Mitrovic

Dissertation Title: Modern Challenges for Machine Learning Applications of Submodularity: Privacy, Scalability, and Sequences

Date: Thursday, March 5, 2020 Time: 4:00 PM Location: Room 335, 3rd floor, 17 Hillhouse Avenue

Advisor: Amin Karbasi

Other committee members:

Dan Spielman
Dragomir Radev
Yaron Singer (Harvard)

Abstract :In a nutshell, submodularity covers the class of all problems that exhibit some form of diminishing returns. From a theoretical perspective, this notion of diminishing returns is extremely useful as the resulting mathematical properties allow for provably efficient optimization. From a practical perspective, diminishing returns appear in a wide variety of important real-world machine learning problems including data summarization, recommender systems, neural network interpretability, and influence maximization in social networks.

In this thesis, we will focus on three major challenges for modern machine learning applications of submodularity: privacy, scalability, and sequences:

With the emergence of data privacy as one of the foremost controversies in today’s society, it is imperative that each individual’s personal information be protected. However, machine learning is a field that particularly relies on data. Without data, there is no learning. To tackle this challenge, we adapt the foundational algorithms of submodularity to the framework of differential privacy, which provides mathematically provable protection against information leaks.

On the opposite end of the spectrum of data challenges, we have the problem of too much data. As we will see, many submodular algorithms run in linear time, but with the immense size of modern datasets, even linear time may be too slow. We present novel approaches to scalable submodularity with a specific focus on streaming and distributed frameworks.

Lastly, the vast majority of work and research on submodularity has been focused on set functions where the order of the input and output is not important. While this is perfectly reasonable for many problems, there are also many other machine learning problems (such as recommender systems), where explicitly considering the order of data and solutions can lead to large gains. In this dissertation, we present vital advancements in the field of sequence submodularity.

Full Text: [PDF]