1. Identification of nonlinear state-space systems from heterogeneous datasets
- Author
-
Pan, W, Yuan, Y, Ljung, L, Goncalves, J, Stan, GB, Engineering & Physical Science Research Council (EPSRC), Pan, W [0000-0003-1121-9879], Yuan, Y [0000-0001-7858-0437], Ljung, L [0000-0003-4881-8955], Goncalves, J [0000-0002-5228-6165], Stan, GB [0000-0002-5560-902X], and Apollo - University of Cambridge Repository
- Subjects
0301 basic medicine ,0209 industrial biotechnology ,Technology ,Control and Optimization ,Optimization problem ,Computer Networks and Communications ,Computer science ,MODELS ,02 engineering and technology ,0805 Distributed Computing ,Bayesian inference ,computer.software_genre ,Electronic mail ,Data modeling ,03 medical and health sciences ,020901 industrial engineering & automation ,Automation & Control Systems ,Robustness (computer science) ,0102 Applied Mathematics ,SPARSE ,CONVEX ,system identification ,Science & Technology ,Computer Science, Information Systems ,System identification ,Biological system modeling ,NETWORKS ,Parameter identification problem ,0906 Electrical and Electronic Engineering ,030104 developmental biology ,Control and Systems Engineering ,Signal Processing ,Computer Science ,INFERENCE ,Data mining ,computer ,Repressilator - Abstract
This paper proposes a new method to identify nonlinear state-space systems from heterogeneous datasets. The method is described in the context of identifying biochemical/gene networks (i.e., identifying both reaction dynamics and kinetic parameters) from experimental data. Simultaneous integration of various datasets has the potential to yield better performance for system identification. Data collected experimentally typically vary depending on the specific experimental setup and conditions. Typically, heterogeneous data are obtained experimentally through (a) replicate measurements from the same biological system or (b) application of different experimental conditions such as changes/perturbations in biological inductions, temperature, gene knock-out, gene over-expression, etc. We formulate here the identification problem using a Bayesian learning framework that makes use of “sparse group” priors to allow inference of the sparsest model that can explain the whole set of observed, heterogeneous data. To enable scale up to large number of features, the resulting non-convex optimisation problem is relaxed to a re-weighted Group Lasso problem using a convex-concave procedure. As an illustrative example of the effectiveness of our method, we use it to identify a genetic oscillator (generalised eight species repressilator). Through this example we show that our algorithm outperforms Group Lasso when the number of experiments is increased, even when each single time-series dataset is short. We additionally assess the robustness of our algorithm against noise by varying the intensity of process noise and measurement noise.
- Published
- 2017