Authors: E. R. Elenberg, R. Khanna, A. G. Dimakis, and S. Negahban


We connect high-dimensional subset selection and submodular maxi- mization. Our results extend the work of Das and Kempe [In ICML (2011) 1057–1064] from the setting of linear regression to arbitrary objective func- tions. For greedy feature selection, this connection allows us to obtain strong multiplicative performance bounds on several methods without statistical modeling assumptions. We also derive recovery guarantees of this form un- der standard assumptions. Our work shows that greedy algorithms perform within a constant factor from the best possible subset-selection solution for a broad class of general objective functions. Our methods allow a direct control over the number of obtained features as opposed to regularization parameters that only implicitly control sparsity. Our proof technique uses the concept of weak submodularity initially defined by Das and Kempe. We draw a connec- tion between convex analysis and submodular set function theory which may be of independent interest for other statistical learning applications that have combinatorial structure.

Full Text: [PDF]

Accessibility at Yale   Inference, Information, and Decision Group at Yale