We present distributed algorithms for several classes of submodular optimization problems such as k-cover, set cover, facility location, and probabilistic coverage. The new algorithms enjoy almost optimal space complexity, optimal approximation guarantees, optimal communication complexity (and run in only four rounds of computation), addressing major shortcomings of prior work. We first present a distributed algorithm for k-cover using only Õ(n) space per machine, and then extend it to several submodular optimization problems, improving previous results for all the above problems-e.g., our algorithm for facility location problem improves the space of the best-known algorithm (Lindgren et al.). Our algorithms are implementable in various distributed frameworks such as MapReduce and RAM models. On the hardness side, we demonstrate the limitations of uniform sampling via an information theoretic argument. Furthermore, we perform an extensive empirical study of our algorithms (implemented in MapReduce) on a variety of datasets. We observe that using sketches 30-600 times smaller than the input, one can solve the coverage maximization problem with quality very close to that of the state-of-the-art single machine algorithm. Finally, we demonstrate an application of our algorithm in large-scale feature selection.