Oyebamiji, O. K., Wilkinson, D. J., Jayathilake, P. G., Curtis, T. P., Rushton, S. P., Li, B., Gupta, P., Oyebamiji, O. K., Wilkinson, D. J., Jayathilake, P. G., Curtis, T. P., Rushton, S. P., Li, B., and Gupta, P.
The ability to make credible simulations of open engineered biological systems is an important step towards the application of scientific knowledge to solve real-world problems in this challenging, complex engineering domain. An important application of this type of knowledge is in the design and management of wastewater treatment systems. One of the crucial aspects of an engineering biology approach to wastewater treatment study is the ability to run a simulation of complex biological communities. However, the simulation of open biological systems is challenging because they often involve a large number of bacteria that ranges from order 1012 (a baby's microbiome) to 1018 (a wastewater treatment plant) individual particles, and are physically complex. Since the models are computationally expensive, and due to computing constraints, the consideration of only a limited set of scenarios is often possible. A simplified approach to this problem is to use a statistical approximation of the simulation ensembles derived from the complex models at a fine scale which will help in reducing the computational burden. Our aim in this paper is to build a cheaper surrogate of an individual-based (IB) model simulation of microbial communities. The paper focuses on how to use an emulator as an effective tool for studying and incorporating microscale processes in a computationally efficient way into macroscale models. The main issue we address is a strategy for emulating high-level summaries from the IB model simulation data. We use a Gaussian process regression model for the emulation. Under cross-validation, the percentage of variance explained for the univariate emulator ranges from 83–99% and 87–99% for the multivariate emulators, and for both biofilms and floc. Our emulators show an approximately 220-fold increase in computational efficiency. The sensitivity analyses indicated that substrate nutrient concentration for nitrate, carbon, nitrite and oxygen as well as the maximum gr