1. Computing the throughput of probabilistic and replicated streaming applications
- Author
-
Anne Benoit, Matthieu Gallet, Bruno Gaujal, Yves Robert, Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Middleware efficiently scalable (MESCAL), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire d'Informatique de Grenoble (LIG), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS), Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École normale supérieure - Lyon (ENS Lyon), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF), and Université de Lyon-École normale supérieure - Lyon (ENS Lyon)-Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL)
- Subjects
0209 industrial biotechnology ,Theoretical computer science ,General Computer Science ,Computer science ,Applied Mathematics ,Computation ,Probabilistic logic ,02 engineering and technology ,01 natural sciences ,Computer Science Applications ,Set (abstract data type) ,010104 statistics & probability ,[INFO.INFO-PF]Computer Science [cs]/Performance [cs.PF] ,020901 industrial engineering & automation ,Bounded function ,Theory of computation ,Graph (abstract data type) ,0101 mathematics ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,Random variable ,Algorithm ,Throughput (business) - Abstract
In this paper, we investigate how to compute the throughput of probabilistic and replicated streaming applications. We are given (i) a streaming application whose dependence graph is a linear chain; (ii) a one-to-many mapping of the application onto a fully heterogeneous target platform, where a processor is assigned at most one application stage, but where a stage can be replicated onto a set of processors; and (iii) a set of random variables modeling the computation and communication times in the mapping. We show how to compute the throughput of the application, i.e., the rate at which data sets can be processed. The problem is easy when application stages are not replicated, i.e., each application stage is assigned to a single processor: in that case the throughput is dictated by the critical hardware resource. However, when stages are replicated, i.e., each application stage may be assigned to several processors, the problem becomes surprisingly complicated: even in the deterministic case, the optimal throughput may be lower than the smallest internal resource throughput. The first contribution of the paper is to provide a general method to compute the throughput when computation and communication times, also called stage parameters, are constant or follow I.I.D. exponential laws. The second contribution is to provide bounds for the throughput when stage parameters form associated random sequences (correlation between communication and processing times of a given data set on the different application stages, i.e., a data set that takes a long time on the first stage is likely to be large, and to take a long time on the next stages), and are N.B.U.E. (New Better than Used in Expectation) variables (if an operation has already been processed for some duration, the remaining time is smaller than the processing time of a fresh operation): the throughput is bounded from below by the exponential case and bounded from above by the deterministic case. An extensive set of simulation allows us to assess the quality of the model, and to observe the actual behavior of several distributions.
- Published
- 2013