ICML 2020 Tutorial on Submodular Optimization: From Discrete to Continuous and Back
School of Engineering and Applied Sciences
University of Pennsylvania
Philadelphia, PA 19104
Yale Institute for Network Science
Yale University
New Haven, CT 06520
Slides
Videos
Brief Description and Outline
This tutorial will cover recent advancements in discrete optimization methods for largescale machine learning. Traditionally, machine learning has been harnessing convex optimization to design fast algorithms with provable guarantees for a broad range of applications. In recent years, however, there has been a surge of interest in applications that involve discrete optimization. For discrete domains, the analog of convexity is considered to be submodularity, and the evolving theory of submodular optimization has been a catalyst for progress in extraordinarily varied application areas including active learning and experimental design, vision, sparse reconstruction, graph inference, video analysis, clustering, document summarization, object detection, information retrieval, network inference, interpreting neural network, and discrete adversarial attacks.
As applications and techniques of submodular optimization mature, a fundamental gap between theory and application emerges. In the past decade, paradigms such as largescale learning, distributed systems, and sequential decision making have enabled a quantum leap in the performance of learning methodologies. Incorporating these paradigms in discrete problems has led to fundamentally new frameworks for submodular optimization. The goal of this tutorial is to cover rigorous and scalable foundations for discrete optimization in complex, dynamic environments, addressing challenges of scalability and uncertainty, and facilitating distributed and sequential learning in broader discrete settings. More specifically, we will cover advancements in four areas:

Submodular Optimization with Perfect Information.
Stateoftheart submodular optimization techniques have mostly assumed perfect knowledge about the optimization problem and the environment. More precisely, the existing techniques have been designed based on the availability of an oracle with full information about the objective function at any queried point. In this context, recent work in submodular optimization has widely focused on developing fast, efficient, and provable methodologies in the presence of massive, highdimensional data:

Fast Greedy Methods. The first class of methods consider cases where the size and dimension of the data is large, but data can be fully accessible and stored in memory. Accordingly, fast greedy algorithms have been proposed recently with almost optimal runtime efficiency in terms of the query complexity and dimension of data while maintaining optimal quality guarantees in terms of objective value 1, 2, 3, 4,49, 5061.

Streaming Algorithms: In cases where the data sets are too large to be stored in memory, we consider streaming techniques, where computation is performed on the y. A recent line of work has been focused on developing algorithms that obtain desirable guarantees for important tasks in machine learning while using modest memory resources 5, 6, 7, 8, 9,6268.

Distributed Optimization: The rise of distributed computation frameworks such as MapReduce, Hadoop, and Spark, enabled unprecedented advancement in machine 1 learning. Although submodular optimization is a canonical example of computation that cannot be parallelized, there is a great deal of work on new algorithm design approaches that can utilize distributed computational resources 10, 11, 12, 13, 14, 15, 16,6977.

Submodular Optimization with Imperfect Information
In many modern applications, perfectinformation oracles can not assumed, and instead, we resort to approximate or imperfect ones in order to address challenges such as scalability, uncertainty in data and models, as well as dynamic and adversarial environments. We will formalize oracles with imperfect information through three main classes{Stochastic, Online, and Bandit} each of which opens a new avenue in the field of discrete (submodular) optimization and requires fundamentally novel techniques:

The Stochastic Oracle. The stochastic oracle returns a stochastic, but unbiased estimate of the function value at any query point. Given this type of oracle access to the function values, the primary question is to develop stochastic submodular optimization techniques with minimal sample complexity (i.e. the total number of queries from the stochastic oracle) and computation complexity 17, 18. The stochastic oracle is motivated form practical and largescale applications and covers popular instances of discrete optimization: (i) Oftentimes in practice the objective is defined in terms of an empirical risk, i.e., through a finite sum of loss functions associated to the data points. This formulation appears in submodular applications such as data summarization 19, 20, 21, recommender systems 22, 18, and sparse regression 23, 24, 25 etc. (ii) In some other cases, the objective is given as an expectation over an explicit stochastic model which is hard/intractable to compute. For example, in influence maximization in social networks 26,the objective value is defned as the expectation of a stochastic difiusion process, quantifying the expected fraction of the network influenced from a selected seed set.

Online/Bandit Oracle. This type of oracle is the model of choice when the optimizer/learner has to interact with an unknown, evolving, or even adversarially changing environment. The oracle’s outcome in this setting varies with time, and depending on how limited the oracle’s outcome is, we can define two different scenarios: online, and bandit settings. The primary optimization goal in this setting is to provide noregret mechanisms to maximize the cumulative objective over time. Online and Bandit submodular optimization, introduced in 27, 28, 29, arises naturally in discrete applications involving dynamic environments, e.g. online/dynamic resource allocation 30, 27, online ranking 31, 32, sensor placement in unknown or adversarial environments 33, in uencing dynamic social networks 34, online recommender systems 35, 36, and dynamic multirobot coverage problems 37.

A NonConvex Bridge Between Discrete and Continuous Optimization
To address discrete otimization with imperfect oracles, we will describe a recently developed framework which bridges discrete and continuous optimization. Through appropriate continuous extensions, this framework devises algorithms that succeed in finding good solutions by making a large number of small, exploratory steps using noisy estimates of the objective value, and moving toward the optimal solution. As we will discuss, solving the resulting continuous problems requires novel methods beyond the stateoftheart, because they are highly nonconvex and possess undesirable stationary points. We show how tools from discrete and continuous optimization as well as techniques that exploit the special structure (submodularity) admitted by these problems and avoid those bad stationary points 18 38, 39, 40, 41, 42, 43. Indeed, submodular optimization is inherently related to continuous optimization as many submodular optimization methods rely on continuous relaxations. This connection has recently been strengthen by introducing continuous submodular functions 44, 45, 78, 18. Such functions are not convex (nor concave) but still allow finding nearoptimal solutions, thus providing a rich framework for studying nonconvex optimization problems.

Beyond Submodularity
Finally, one can generalize the notion of submodularity and still provide approximation guarantees. Several of such generalizations, e.g., adaptive submodularity 46, 47, weak submodularity 25, 9, twostage submodularity 48, etc, have been recently proposed. These generalizations extend the applications of submodularity to interactive settings, dictionary learning, and sparse recovery.