A wide variety of decision problems in engineering, science and economics involve uncertain parameters whose values are unknown to the decision maker when the decisions are made. Ignoring this uncertainty can lead to inferior solutions that perform poorly in practice. Many decision problems under uncertainty also span across multiple time stages and thus involve adaptive decisions. Such problems are naturally formulated as multi-stage stochastic programs. Despite its wide applicability, stochastic programming suffers from three major shortcomings. Firstly, stochastic programming models assume precise knowledge of the probability distribution of the uncertain parameters, an assumption which may be difficult to justify in practice when only a finite number of historical observations is available. Secondly, stochastic programming models typically minimize expected cost or disutility. This necessitates the evaluation of a multivariate integral, which can be computationally burdensome, particularly in the presence of high-dimensional uncertainty. Thirdly, multi-stage stochastic programs are inherently intractable as they involve infinitely many constraints as well as functional decision variables ranging over an infinite-dimensional space. In addition, many practical problems also involve integer decision variables, which further aggravates the complexity of even finding a feasible solution. The first main objective of this thesis is to formulate and analyze distributionally robust optimization models that overcome the aforementioned shortcomings. In distributionally robust optimization we seek decisions that perform best in view of the worst-case distribution from within some family of distributions of the uncertain parameters. Distributionally robust models thus relax the unrealistic assumption of a known probability distribution. Instead, they rely on the availability of an ambiguity set, that is, a family of distributions consistent with the decision maker's prior information. By employing the classical duality theory for moment problems, distributionally robust optimization problems can be reduced to finite dimensional conic programs, which are often computationally tractable. In this thesis, we study a distributionally robust variant of the multi-product newsvendor problem. While the arising formulation is proven to be NP-hard due to the presence of a sum of maxima term, we propose a tractable conservative approximation in the form of quadratic decision rules. We further study distributionally robust uncertainty quantification and chance constrained programming problems. We introduce a standardized ambiguity set that captures a wide variety of popular ambiguity sets as special cases, and we present for the first time a thorough complexity analysis of distributionally robust uncertainty quantification and chance constrained programming. The second main objective of this thesis is to develop scalable approximations for dynamic optimization problems under uncertainty. For two-stage problems with continuous recourse decisions there exist already some tractable approximation schemes. However, to date hardly any attempts have been undertaken to solve (distributionally) robust dynamic problems with integer recourse decisions. In this thesis, we present a new finite adaptability approximation for two-stage robust binary programs. We show that these problems admit mixed-integer linear programming approximations that are not much harder to solve than their deterministic counterparts. For multi-stage decision problems with continuous recourse decisions, we further propose a data-driven dynamic programming algorithm that allows the decision maker to incorporate the historical observations directly into the solution procedure in an asymptotically consistent manner. We then combine the data-driven method with robust optimization techniques to alleviate the overfitting effects inherent in problems with sparse historical observations.