Many applied machine learning tasks involve structured representations. This is particularly the case in natural language processing (NLP), where the discrete, compositional nature of words and sentences leads to natural combinatorial representations such as trees, sequences, segments, or alignments, among others. It is no surprise that structured output models have been successful and popular in NLP applications since their inception. At the same time, deep, hierarchical neural networks with latent representations are increasingly widely and successfully applied to language tasks. As compositions of differentiable building blocks, deep models conventionally perform smooth, soft computations, resulting in dense hidden representations. In this work, we focus on models with structure and sparsity in both their outputs as well as their latent representations, without sacrificing differentiability for end-to-end gradient-based training. We develop methods for sparse and structured attention mechanisms, for differentiable sparse structure inference, for latent neural network structure, and for sparse structured output prediction. We find our methods to be empirically useful on a wide range of applications including sentiment analysis, natural language inference, neural machine translation, sentence compression, and argument mining.