Keywords Nonlinear programming; Mixed integer linear programming; Distributed systems; Failure detection; Heartbeats Highlights * A nonlinear optimisation model for high-performing probabilistic failure detectors. * Transformed nonlinear optimisation model to Mixed-Integer Linear Problem. * New self-optimised failure detector using network data and the model optimal solutions. * Design of a heuristic algorithm to efficiently tackle the scalability challenge. * Testbeds development on both the Amazon Cloud and a Cloud simulator for robust results. Abstract Failure detectors (FDs) are fundamental building blocks for distributed systems. An FD detects whether a process has crashed or not based on the reception of heartbeats' messages sent by this process over a communication channel. A key challenge of FDs is to tune their parameters to achieve optimal performance which satisfies the desired system requirements. This is challenging due to the complexities of large-scale networks. Existing FDs ignore such optimisation and adopt ad-hoc parameters. In this paper, we propose a new Mixed Integer Linear Programming (MILP) optimisation-based FD algorithm. We obtain the MILP formulation via piecewise linearisation relaxations. The MILP involves obtaining optimal FD parameters that meet the optimal trade-off between its performance metrics requirements, network conditions and system parameters. The MILP maximises our FD's accuracy under bounded failure detection time while considering network and system conditions as constraints. The MILP's solution represents optimised FD parameters that maximise FD's expected performance. To adapt to real-time network changes, our proposed MILP-based FD fits the probability distribution of heartbeats' inter-arrivals. To address our FD scalability challenge in large-scale systems where the MILP model needs to compute approximate optimal solutions quickly, we also propose a heuristic algorithm. To test our proposed approach, we adopt Amazon Cloud as a realistic testing environment and develop a simulator for robustness tests. Our results show consistent improvement of overall FD performance and scalability. To the best of our knowledge, this is the first attempt to combine the MILP-based optimisation modelling with FD to achieve performance guarantees. Author Affiliation: (a) Centre for Risk Research, Department of Decision Analytic and Risk, Southampton Business School, University of Southampton, Building 2, University Road, SO17 1BJ, UK (b) The Artificial Intelligence Applications Institute, School of Informatics, Informatics Forum, The University of Edinburgh, Crichton Street, EH8 9AB, UK (c) Edinburgh Research Centre, Huawei Technologies R&D, 2, Semple Street, EH3 8BL, UK * Corresponding author at: Centre for Risk Research, Department of Decision Analytic and Risk, Southampton Business School, University of Southampton, Building 2, University Road, SO17 1BJ; School of Informatics The University of Edinburgh Informatics Forum Edinburgh EH8 9AB. Article History: Received 18 February 2020; Accepted 2 February 2022 (footnote)[white star] This piece of research was partially sponsored by Huawei Ltd. (footnote)1 This work has been achieved while B. Er-Rahmadi was a Research Fellow at the University of Southampton, UK; currently she is affiliated to Edinburgh Research Centre, Huawei Technologies R&D, 2, Semple Street, EH3 8BL, UK. Byline: Btissam Er-Rahmadi [btissam.errahmadi@gmail.com] (1,a,c), Tiejun Ma [tiejun.ma@soton.ac.uk] (*,a,b)