1. Adaptive estimation for Hawkes processes; application to genome analysis
- Author
-
Sophie Schbath, Patricia Reynaud-Bouret, Laboratoire Jean Alexandre Dieudonné (JAD), Université Nice Sophia Antipolis (... - 2019) (UNS), Université Côte d'Azur (UCA)-Université Côte d'Azur (UCA)-Centre National de la Recherche Scientifique (CNRS), Unité Mathématique Informatique et Génome (MIG), Institut National de la Recherche Agronomique (INRA), ANR-06-JCJC-0015,ATLAS,From Applications to Theory in Learning and Adaptive Statistics(2006), and COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Statistics and Probability ,model selection ,Process (engineering) ,Decision theory ,adaptive estimation ,Primary 62G05, 62G20 ,secondary 46N60, 65C60 ,Mathematics - Statistics Theory ,Statistics Theory (math.ST) ,01 natural sciences ,Genome ,Oracle ,010104 statistics & probability ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,0502 economics and business ,minimax risk ,data- driven penalty ,FOS: Mathematics ,oracle inequalities ,62G05 ,Penalty method ,0101 mathematics ,62G20 ,Hawkes process ,Mathematics ,genome analysis ,Estimation ,050208 finance ,Model selection ,05 social sciences ,Estimator ,[STAT.TH]Statistics [stat]/Statistics Theory [stat.TH] ,65C60 ,46N60 ,data-driven penalty ,unknown support ,Statistics, Probability and Uncertainty ,Algorithm - Abstract
The aim of this paper is to provide a new method for the detection of either favored or avoided distances between genomic events along DNA sequences. These events are modeled by a Hawkes process. The biological problem is actually complex enough to need a nonasymptotic penalized model selection approach. We provide a theoretical penalty that satisfies an oracle inequality even for quite complex families of models. The consecutive theoretical estimator is shown to be adaptive minimax for H\"{o}lderian functions with regularity in $(1/2,1]$: those aspects have not yet been studied for the Hawkes' process. Moreover, we introduce an efficient strategy, named Islands, which is not classically used in model selection, but that happens to be particularly relevant to the biological question we want to answer. Since a multiplicative constant in the theoretical penalty is not computable in practice, we provide extensive simulations to find a data-driven calibration of this constant. The results obtained on real genomic data are coherent with biological knowledge and eventually refine them., Comment: Published in at http://dx.doi.org/10.1214/10-AOS806 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)
- Published
- 2009
- Full Text
- View/download PDF