This paper attempts to give an extension of learning theory to a setting where the assumption of i.i.d. data is weakened by keeping the independence but abandoning the identical restriction. We hypothesize that a sequence of examples (xt, yt) in X × Y for t = 1, 2, 3,... is drawn from a probability distribution ρt on X × Y. The marginal probabilities on X are supposed to converge to a limit probability on X. Two main examples for this time process are discussed. The first is a stochastic one which in the special case of a finite space X is defined by a stochastic matrix and more generally by a stochastic kernel. The second is determined by an underlying discrete dynamical system on the space X. Our theoretical treatment requires that this dynamics be hyperbolic (or "Axiom A") which still permits a class of chaotic systems (with Sinai–Ruelle–Bowen attractors). Even in the case of a limit Dirac point probability, one needs the measure theory to be defined using Hölder spaces. Many implications of our work remain unexplored. These include, for example, the relation to Hidden Markov Models, as well as Markov Chain Monte Carlo methods. It seems reasonable that further work should consider the push forward of the process from X × Y by some kind of observable function to a data space. [ABSTRACT FROM AUTHOR]