Author: "Avalos, Marta" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Avalos, Marta"' showing total 183 results

Start Over Author "Avalos, Marta"

183 results on '"Avalos, Marta"'

1. Pre-training A Neural Language Model Improves The Sample Efficiency of an Emergency Room Classification Model

Author: Xu, Binbin, Gil-Jardiné, Cédric, Thiessard, Frantz, Tellier, Eric, Avalos, Marta, and Lagarde, Emmanuel
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: To build a French national electronic injury surveillance system based on emergency room visits, we aim to develop a coding system to classify their causes from clinical notes in free-text. Supervised learning techniques have shown good results in this area but require a large amount of expert annotated dataset which is time consuming and costly to obtain. We hypothesize that the Natural Language Processing Transformer model incorporating a generative self-supervised pre-training step can significantly reduce the required number of annotated samples for supervised fine-tuning. In this preliminary study, we test our hypothesis in the simplified problem of predicting whether a visit is the consequence of a traumatic event or not from free-text clinical notes. Using fully re-trained GPT-2 models (without OpenAI pre-trained weights), we assess the gain of applying a self-supervised pre-training phase with unlabeled notes prior to the supervised learning task. Results show that the number of data required to achieve a ginve level of performance (AUC>0.95) was reduced by a factor of 10 when applying pre-training. Namely, for 16 times more data, the fully-supervised model achieved an improvement <1% in AUC. To conclude, it is possible to adapt a multi-purpose neural language model such as the GPT-2 to create a powerful tool for classification of free-text notes with only a small number of labeled samples., Comment: Version of the published manuscript
Published: 2019

2. Optimising Criteria for Manual Smear Review Following Automated Blood Count Analysis: A Machine Learning Approach

Author: Avalos, Marta, Touchais, Hélène, Henríquez-Henríquez, Marcela, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Abraham, Ajith, editor, Sasaki, Hideyasu, editor, Rios, Ricardo, editor, Gandhi, Niketa, editor, Singh, Umang, editor, and Ma, Kun, editor
Published: 2021
Full Text: View/download PDF

3. A web-based prospective cohort study of home, leisure, school and sports injuries in France: a descriptive analysis

Author: Rojas Castro, Madelyn Yiseth, Orriols, Ludivine, Basha Sakr, Dunia, Contrand, Benjamin, Dupuy, Marion, Travanca, Marina, Sztal-Kutas, Catherine, Avalos, Marta, and Lagarde, Emmanuel
Published: 2021
Full Text: View/download PDF

4. Serious Games for Training in Patient Flow Management in Emergency Departments: State of the Art and Perspectives

Author: Blot, Charlotte, primary, Weickert, Katia, additional, Payros, Arthur, additional, Avalos, Marta, additional, and Gil-Jardiné, Cédric, additional
Published: 2023
Full Text: View/download PDF

5. Serious Games for Training in Patient Flow Management in Emergency Departments: State of the Art and Perspectives

Author: Blot, Charlotte, Weickert, Katia, Payros, Arthur, Avalos, Marta, Gil-Jardiné, Cédric, Université de Bordeaux (UB), Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), CHU de Bordeaux Pellegrin [Bordeaux], and The Florida Artificial Intelligence Research Society
Subjects: Serious games, Healthcare Informatics, Emergency department, [SDV]Life Sciences [q-bio], Healthcare Professional Training, [INFO]Computer Science [cs], Education, [SHS]Humanities and Social Sciences
Abstract: International audience; Emergency departments (EDs) face significant challenges in providing timely care due to the increase in patient volume and limited resources. To improve patient flow management, new strategies based on artificial intelligence, machine learning, computer modeling, and simulation have been developed, including serious computer games and virtual reality. We performed a systematic review of the use of serious games and virtual reality to train healthcare professionals in the ED.
Published: 2023
Full Text: View/download PDF

6. High–Dimensional Sparse Matched Case–Control and Case–Crossover Data: A Review of Recent Works, Description of an R Tool and an Illustration of the Use in Epidemiological Studies

Author: Avalos, Marta, Grandvalet, Yves, Pouyes, Hélène, Orriols, Ludivine, Lagarde, Emmanuel, Istrail, Sorin, Series editor, Pevzner, Pavel, Series editor, Waterman, Michael, Series editor, Formenti, Enrico, editor, Tagliaferri, Roberto, editor, and Wit, Ernst, editor
Published: 2014
Full Text: View/download PDF

7. Public Health surveillance from emergency call center data: visualization dashboard and NLP of call reports

Author: Naprous, Alexandre, Avalos, Marta, Pradeau, Catherine, Lagarde, Emmanuel, Gil-Jardine, Cédric, Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Université de Bordeaux (UB), Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), CHU de Bordeaux Pellegrin [Bordeaux], Institut de Santé Publique, d'Epidémiologie et de Développement (ISPED), Université Bordeaux Segalen - Bordeaux 2, This work was supported by the French National Research Agency (ANR) under the grant COSAM 'Epidemiological surveillance of the COVID-19 pandemic period by real-time automatic classification of clinical notes from 15 emergency call centers using Transformer-based artificial neural networks' (project number ANR-20-COVl-0004-01). The authors’ research teams had annual grants from the University of Bordeaux, INSERM U1219 and INRIA., and Avalos, Marta
Subjects: [STAT.AP]Statistics [stat]/Applications [stat.AP], [STAT.ME] Statistics [stat]/Methodology [stat.ME], [STAT.AP] Statistics [stat]/Applications [stat.AP], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie, [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], [STAT.CO]Statistics [stat]/Computation [stat.CO], [STAT.CO] Statistics [stat]/Computation [stat.CO], [STAT.ME]Statistics [stat]/Methodology [stat.ME], [STAT.ML] Statistics [stat]/Machine Learning [stat.ML]
Abstract: International audience; By focusing on symptoms and not diagnoses, the socalled syndromic surveillance system gains in immediacy what it loses in specificity with respect to other more traditional options for public health surveillance. Reports of calls to emergency medical communication centers (EMCC) supplemented by the data collected by the rescue workers who arrived on the scene constitute a cost-effective and rich source of information. Unfortunately, EMCC data are infrequently used and their utility has not been demonstrated.The aim of this study was to explore the usefulness for public health surveillance of EMCC data when analyzed using text mining and visualization tools. Transformer-based deep learning architectures were used to classify call reports according to the reason for the call. We also extracted indicators that could serve as proxy measures using a keyword-search algorithm. We then developed a dashboard visualization tool to enable dynamic and spatial exploratory analyses. Finally, we explored the potential of this tool for two applications. While the tool proved unable to detect Covid-19 outbreaks, it appeared to be promising for a better understanding of territorial inequalities in healthcare access.
Published: 2022
Full Text: View/download PDF

8. Les données du SAMU comme moyen de surveillance de la santé de la population

Author: Naprous, Alexandre, Avalos, Marta, Pradeau, Catherine, Lagarde, Emmanuel, Gil-Jardine, Cédric, Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Université de Bordeaux (UB), Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), CHU Bordeaux [Bordeaux], and Avalos, Marta
Subjects: [STAT.AP]Statistics [stat]/Applications [stat.AP], [STAT.ME] Statistics [stat]/Methodology [stat.ME], analyse exploratoire de données volumineuses, réseaux de neurones transformers, [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], [STAT.CO] Statistics [stat]/Computation [stat.CO], [STAT.ML] Statistics [stat]/Machine Learning [stat.ML], données de Gironde, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [STAT.AP] Statistics [stat]/Applications [stat.AP], [SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie, dashboard, datamining, [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, [STAT.CO]Statistics [stat]/Computation [stat.CO], [STAT.ME]Statistics [stat]/Methodology [stat.ME], analyse spatiale
Abstract: National audience; Les données enregistrées lors d'appels aux SAMU en Gironde constituent une source d'information riche (données temporelles, spatiales, cliniques, textuelles). L'application de méthodes d'apprentissage aux rapports disponibles en textes libres permet de classer ces appels en fonction des pathologies ou des symptômes en cause. Nous avons également développé un outil de visualisation des tableaux de bord afin de réaliser une analyse exploratoire dynamique et spatiale de ces données.
Published: 2022

9. Organisational learning strategies of introductory statistics courses in an online MPH: slow and steady wins the race

Author: Avalos, Marta, Mehouachi, Samia, Avalos, Marta, Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Université de Bordeaux (UB), and IASE (International Association for Statistical Education) and ISI (International Statistical Institute)
Subjects: [STAT]Statistics [stat], [STAT.AP]Statistics [stat]/Applications [stat.AP], [STAT.AP] Statistics [stat]/Applications [stat.AP], [SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie, [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST], [SHS.EDU]Humanities and Social Sciences/Education, [SHS.EDU] Humanities and Social Sciences/Education, [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, [MATH.MATH-ST] Mathematics [math]/Statistics [math.ST], [STAT] Statistics [stat]
Abstract: International audience
Published: 2022

10. Stratégies organisationnelles d'apprentissage des cours d'introduction à la statistique dans une formation de master de sante publique en ligne : rien ne sert de courir, il faut partir à point

Author: Avalos, Marta, Mehouachi, Samia, Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Université de Bordeaux (UB), and IASE (International Association for Statistical Education) and ISI (International Statistical Institute)
Subjects: [STAT]Statistics [stat], [STAT.AP]Statistics [stat]/Applications [stat.AP], [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST], [SHS.EDU]Humanities and Social Sciences/Education, [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie
Abstract: International audience
Published: 2022

11. Effects of home environmental, behavioural and domestic activities on the risk of home injuries in French adults: Results from a prospective study

Author: Rojas Castro, Madelyn Yiseth, primary, Avalos, Marta, additional, Contrand, Benjamin, additional, Dupuy, Marion, additional, Sztal-Kutas, Catherine, additional, Orriols, Ludivine, additional, and Lagarde, Emmanuel, additional
Published: 2022
Full Text: View/download PDF

12. Regularization Methods for Additive Models

Author: Avalos, Marta, Grandvalet, Yves, Ambroise, Christophe, Goos, Gerhard, editor, Hartmanis, Juris, editor, van Leeuwen, Jan, editor, R. Berthold, Michael, editor, Lenz, Hans-Joachim, editor, Bradley, Elizabeth, editor, Kruse, Rudolf, editor, and Borgelt, Christian, editor
Published: 2003
Full Text: View/download PDF

13. The respiratory microbiota alpha-diversity in chronic lung diseases: a systematic review and meta-analysis

Author: Avalos, Marta, Alin, Thibaud, Métayer, Clémence, Thiébaut, Rodolphe, Enaud, Raphael, Delhaes, Laurence, Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Institut National de Recherche en Informatique et en Automatique (Inria), Université de Bordeaux (UB), CHU Bordeaux [Bordeaux], CHU de Bordeaux Pellegrin [Bordeaux], Centre de recherche Cardio-Thoracique de Bordeaux [Bordeaux] (CRCTB), and Université Bordeaux Segalen - Bordeaux 2-CHU Bordeaux [Bordeaux]-Institut National de la Santé et de la Recherche Médicale (INSERM)
Subjects: [STAT.AP]Statistics [stat]/Applications [stat.AP], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, [STAT.ME]Statistics [stat]/Methodology [stat.ME]
Abstract: International audience; Imbalance in microbial composition (i.e. dysbiosis) in the gut microbiome is consensually considered an indicator of deteriorated health and has been associated to different chronic health conditions. However, there is no clear evidence how this generalizes to the other human microbiomes. Especially, researches on the relationship between respiratory microbiota imbalance and chronic lung diseases are recent whereas microbial colonization of the airways respiratory tract have characterized chronic lung diseases. Imbalance is mainly measured through the relative abundance of microbial species in space and time within a given community (i.e. alpha-diversity). Identifying a range of values in alpha-diversity when comparing exacerbated, stable patients and healthy subjects may lead to identify new biomarker in chronic respiratory diseases. In the present work, we propose a systematic review of studies investigating the lung microbiota alpha-diversity in patients with chronic respiratory diseases in which a control group based on disease status or healthy subjects is provided for comparison. We focused on the most common measures of alpha-diversity (Chao1, Shannon, and Simpson) indexes and the most common chronic diseases (asthma, chronic obstructive pulmonary disease –COPD-, cystic fibrosis –CF-, bronchiectasis, and pulmonary hypertension). Subsequently, we conducted a meta-analysis based on random-effects models using the R package metafor to characterize the difference in alpha-diversity indexes when comparing cases to controls. We also explored heterogeneity of sources and risk of bias though Factor Analysis of Mixed Data (FAMD) using the FactoMineR R package.After removing duplicate records, we screened 351 articles on title and abstract, of which 27 met our inclusion criteria for the systematic review. Finally, data from 25 studies were used in the meta-analysis. Eight studies deal with CF, 8 with COPD, 10 with asthma and 1 with bronchiectasis. All of the studies dedicated to the respiratory tract microbiota, mainly based on sputum samples analysis and, the majority of the studies used metataxonomy approaches. As highlighted by the meta-analysis, these metataxonomy methods exhibited numerous heterogeneities. Differences in alpha-diversity indexes between healthy and diseased people were observed only in some of the diseases studied. However, prudence is required in its interpretation because of substantial heterogeneity.
Published: 2022

14. Brief Report: Prescription-Drug-Related Risk in Driving: Comparing Conventional and Lasso Shrinkage Logistic Regressions

Author: Avalos, Marta, Adroher, Nuria Duran, Lagarde, Emmanuel, Thiessard, Frantz, Grandvalet, Yves, Contrand, Benjamin, and Orriols, Ludivine
Published: 2012
Full Text: View/download PDF

15. Performance en classification de données textuelles des passages aux urgences des modèles BERT pour le français

Author: Chenais, Gabrielle, Touchais, Hélène, Avalos, Marta, Bourdois, Loïck, Revel, Philippe, Gil-Jardiné, Cédric, Lagarde, Emmanuel, Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Université de Bordeaux (UB), Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Institut National de Recherche en Informatique et en Automatique (Inria), CHU de Bordeaux Pellegrin [Bordeaux], Journée organisée avec le soutien de l’Association française d’Informatique Médicale (AIM) et le Collège Science de l’Ingénierie des Connaissances de l’AFIA dans le cadre de la Plate-Forme Intelligence Artificielle (PFIA), and Avalos, Marta
Subjects: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], SVM classification supervisée multi-classe, SVM, Natural Langage Processing, Urgences, [STAT.CO] Statistics [stat]/Computation [stat.CO], [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], FlauBERT, [STAT.AP] Statistics [stat]/Applications [stat.AP], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Artificial Intelligence, [INFO.INFO-AU]Computer Science [cs]/Automatic Control Engineering, Traitement automatique du langage, [STAT.CO]Statistics [stat]/Computation [stat.CO], CamemBERT, [STAT.AP]Statistics [stat]/Applications [stat.AP], [STAT.ME] Statistics [stat]/Methodology [stat.ME], TF-IDF, [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], multi-class classification, [STAT.ML] Statistics [stat]/Machine Learning [stat.ML], [SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie, Emergency, [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, [INFO.INFO-AU] Computer Science [cs]/Automatic Control Engineering, [STAT.ME]Statistics [stat]/Methodology [stat.ME]
Abstract: Contextualized language models based on the Transformer architecture such as BERT (Bidirectional Encoder Representations from Transformers) have achieved remarkable performances in various language processing tasks. CamemBERT and FlauBERT are pre-trained versions for French.We used these two models to automatically classify free clinical notes from emergency department visits following a trauma. Their performances were compared to the TF-IDF (Term-Frequency - Inverse Document Frequency) method associated with the SVM (Support Vector Machine) classifier on 22481 clinical notes from the emergency department of the Bordeaux University Hospital. CamemBERT and FlauBERT obtained slightly better results than the TF-IDF/SVM couple for the micro F1-score. These encouraging results allow us to consider further developments in the use of transformers in the automation of emergency department data processing in order to consider the implementation of a national observatory of trauma in France., Les modèles de langue contextualisés basés sur l'architecture Transformer tels que BERT (Bidirectional Encoder Representations from Transformers) ont atteint des performances remarquables dans des diverses tâches de traitement de la langue. CamemBERT et FlauBERT en sont des versions pré-entraînées pour le français. Nous avons utilisé ces deux modèles afin de classer automatiquement des notes cliniques libres issues de visites aux urgences à la suite d'un traumatisme. Leurs performances ont été comparées à la méthode TF-IDF (Term-Frequency-Inverse Document Frequency) associé au classifieur SVM (Support Vector Machine) sur 22481 notes cliniques provenant du service des urgences du CHU de Bordeaux. CamemBERT et FlauBERT ont obtenu des résultats légèrement supérieurs à ceux du couple TF-IDF/SVM pour le micro F1-score. Ces résultats encourageants permettent d'envisager l'utilisation des transformers pour automatiser le traitement des données des urgences dans le cadre de la mise en place d'un observatoire national du traumatisme en France.
Published: 2021

16. Traitement automatique des résumés de passages aux urgences : focus sur la désidentification

Author: Bourdois, Loïck, Avalos, Marta, Chenais, Gabrielle, Contrand, Benjamin, Gil-Jardiné, Cédric, Guennec-Jacques, Antoine, Revel, Philippe, Thiessard, Frantz, Touchais, Hélène, Lagarde, Emmanuel, Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Université de Bordeaux (UB), Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Institut National de Recherche en Informatique et en Automatique (Inria), CHU de Bordeaux Pellegrin [Bordeaux], Journée organisée avec le soutien de l’Association française d’Informatique Médicale (AIM) et le Collège Science de l’Ingénierie des Connaissances de l’AFIA dans le cadre de la Plate-Forme Intelligence Artificielle (PFIA), and Avalos, Marta
Subjects: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], [STAT.AP]Statistics [stat]/Applications [stat.AP], [STAT.ME] Statistics [stat]/Methodology [stat.ME], French, Natural Langage Processing, Urgences, Emergency room, [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], [STAT.CO] Statistics [stat]/Computation [stat.CO], Pré-entraînement, [STAT.ML] Statistics [stat]/Machine Learning [stat.ML], Français, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], [STAT.AP] Statistics [stat]/Applications [stat.AP], Pre-training, [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie, Transformers, [INFO.INFO-AU]Computer Science [cs]/Automatic Control Engineering, [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, Traitement automatique du langage, [STAT.CO]Statistics [stat]/Computation [stat.CO], [INFO.INFO-AU] Computer Science [cs]/Automatic Control Engineering, [STAT.ME]Statistics [stat]/Methodology [stat.ME]
Abstract: In France, structured data on emergency room visits are aggregated at the national level to build a syndromic surveillance system for different health events. For visits motivated by a traumatic event, information on the circumstances is stored in free text clinical notes. Automating the processing of these notes should allow the enrichment of surveillance tools. In development at Inserm and the Emergency Department of the Bordeaux University Hospital, The TARPON (for Automatic Processing of Emergency Room Notes for a National Observatory, in French) project aims to meet this objective by using the latest deep learning tools applied to automatic language analysis. To exploit these data, an automatic de-identification system, guaranteeing the protection of personal data, is necessary. We present here a comparison study of models allowing the de-identification of clinical texts in French., En France, les données structurées concernant les visites aux urgences sont agrégées au niveau national pour construire un système de surveillance syndromique de différents événements de santé. Pour les visites motivées par un événement traumatique, les informations sur les circonstances sont stockées dans des notes cliniques en texte libre. Automatiser le traitement de ces notes devrait permettre l'enrichissement des outils de surveillance. En développement à l'Inserm et au Service des urgences du CHU de Bordeaux, le projet TARPON (Traitement Automatique des Résumés de Passages aux urgences pour un Observatoire National) vise à répondre à cet objectif par le biais des derniers outils d'apprentissage profond appliqués à l'analyse automatique du langage. Pour exploiter ces données, un système de désidentification automatique, garantissant la protection des données personnelles est nécessaire. Nous présentons ici une étude de comparaison de modèles permettant la désidentification des textes cliniques en français.
Published: 2021

17. Cohort profile: MAVIE a web-based prospective cohort study of home, leisure, and sports injuries in France

Author: Castro, Madelyn Yiseth Rojas, Orriols, Ludivine, Contrand, Benjamin, Dupuy, Marion, Sztal-Kutas, Catherine, Avalos, Marta, Lagarde, Emmanuel, Université de Bordeaux (UB), Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Calyxis Pôle d'Expertise des Risques, Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), and Avalos, Marta
Subjects: Male, Epidemiology, Health Status, Geographical locations, [STAT.AP] Statistics [stat]/Applications [stat.AP], Risk Factors, Surveys and Questionnaires, Medicine and Health Sciences, Public and Occupational Health, Longitudinal Studies, Prospective Studies, Computer Networks, Child, Data Management, [STAT.AP]Statistics [stat]/Applications [stat.AP], [STAT.ME] Statistics [stat]/Methodology [stat.ME], Traumatic Injury Risk Factors, Middle Aged, Mobile Applications, Sports Science, Socioeconomic Aspects of Health, Europe, Child, Preschool, Athletic Injuries, Medicine, Female, France, [STAT.ME]Statistics [stat]/Methodology [stat.ME], Research Article, Adult, Computer and Information Sciences, Adolescent, Science, Young Adult, Leisure Activities, Mental Health and Psychiatry, Humans, European Union, Sports and Exercise Medicine, Aged, Internet, Infant, Newborn, Biology and Life Sciences, Infant, Health Care, Accidents, Home, [SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie, Medical Risk Factors, Wounds and Injuries, [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, People and places
Abstract: International audience; MAVIE is a web-based prospective cohort study of Home, Leisure, and Sports Injuries with a longitudinal follow-up of French general population volunteers. MAVIE participants are voluntary members of French households, including overseas territories. Participation in the cohort involves answering individual and household questionnaires and relevant exposures and prospectively reporting injury events during the follow-up. Recruitment and data collection have been in progress since 2014. The number of participants as of the end of the year 2019 was 12,419 from 9,483 households. A total of 8,640 participants provided data during follow-up. Respondents to follow-up were composed of 763 children aged 0–14, 655 teenagers and young adults aged 15–29, 6,845 adults, and 377 people aged 75 or more. At the end of the year 2019, 1,698 participants had reported 2,483 injury events. Children, people aged 50 and more, people with poor self-perceived physical and mental health, people who engage in sports activities, and people with a history of injury during the year before recruitment were more likely to report new injuries. An interactive mobile/web application (MAVIE-Lab) was developed to help volunteers decide on personalized measures to prevent their risks of HLIs. The available data provides an opportunity to analyse multiple exposures at both the individual and household levels that may be associated with an increased risk of trauma. The ongoing analysis includes HLI incidence estimates, the determination of health-related risk factors, a specific study on the risk of home injury, another on sports injuries, and an analysis of the role of cognitive skills and mind wandering. Volunteers form a community that constitutes a population laboratory for preventative initiatives.
Published: 2021
Full Text: View/download PDF

18. Performance of BERT models for French in the classification of textual data from emergency room visits

Author: Chenais, Gabrielle, Touchais, Hélène, Avalos, Marta, Bourdois, Loïck, Revel, Philippe, Gil-Jardiné, Cédric, Lagarde, Emmanuel, Avalos, Marta, Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Université de Bordeaux (UB), Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Institut National de Recherche en Informatique et en Automatique (Inria), CHU de Bordeaux Pellegrin [Bordeaux], and Journée organisée avec le soutien de l’Association française d’Informatique Médicale (AIM) et le Collège Science de l’Ingénierie des Connaissances de l’AFIA dans le cadre de la Plate-Forme Intelligence Artificielle (PFIA)
Subjects: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], SVM classification supervisée multi-classe, Natural Langage Processing, SVM, Urgences, [STAT.CO] Statistics [stat]/Computation [stat.CO], [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], FlauBERT, [STAT.AP] Statistics [stat]/Applications [stat.AP], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Artificial Intelligence, [INFO.INFO-AU]Computer Science [cs]/Automatic Control Engineering, Traitement automatique du langage, [STAT.CO]Statistics [stat]/Computation [stat.CO], CamemBERT, [STAT.AP]Statistics [stat]/Applications [stat.AP], [STAT.ME] Statistics [stat]/Methodology [stat.ME], TF-IDF, [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], multi-class classification, [STAT.ML] Statistics [stat]/Machine Learning [stat.ML], [SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie, Emergency, [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, [INFO.INFO-AU] Computer Science [cs]/Automatic Control Engineering, [STAT.ME]Statistics [stat]/Methodology [stat.ME]
Abstract: Contextualized language models based on the Transformer architecture such as BERT (Bidirectional Encoder Representations from Transformers) have achieved remarkable performances in various language processing tasks. CamemBERT and FlauBERT are pre-trained versions for French.We used these two models to automatically classify free clinical notes from emergency department visits following a trauma. Their performances were compared to the TF-IDF (Term-Frequency - Inverse Document Frequency) method associated with the SVM (Support Vector Machine) classifier on 22481 clinical notes from the emergency department of the Bordeaux University Hospital. CamemBERT and FlauBERT obtained slightly better results than the TF-IDF/SVM couple for the micro F1-score. These encouraging results allow us to consider further developments in the use of transformers in the automation of emergency department data processing in order to consider the implementation of a national observatory of trauma in France., Les modèles de langue contextualisés basés sur l'architecture Transformer tels que BERT (Bidirectional Encoder Representations from Transformers) ont atteint des performances remarquables dans des diverses tâches de traitement de la langue. CamemBERT et FlauBERT en sont des versions pré-entraînées pour le français. Nous avons utilisé ces deux modèles afin de classer automatiquement des notes cliniques libres issues de visites aux urgences à la suite d'un traumatisme. Leurs performances ont été comparées à la méthode TF-IDF (Term-Frequency-Inverse Document Frequency) associé au classifieur SVM (Support Vector Machine) sur 22481 notes cliniques provenant du service des urgences du CHU de Bordeaux. CamemBERT et FlauBERT ont obtenu des résultats légèrement supérieurs à ceux du couple TF-IDF/SVM pour le micro F1-score. Ces résultats encourageants permettent d'envisager l'utilisation des transformers pour automatiser le traitement des données des urgences dans le cadre de la mise en place d'un observatoire national du traumatisme en France.
Published: 2021

19. Automatic processing of emergency room notes: focus on de-identification

Author: Bourdois, Loïck, Avalos, Marta, Chenais, Gabrielle, Contrand, Benjamin, Gil-Jardiné, Cédric, Guennec-Jacques, Antoine, Revel, Philippe, Thiessard, Frantz, Touchais, Hélène, Lagarde, Emmanuel, Avalos, Marta, Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Université de Bordeaux (UB), Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Institut National de Recherche en Informatique et en Automatique (Inria), CHU de Bordeaux Pellegrin [Bordeaux], and Journée organisée avec le soutien de l’Association française d’Informatique Médicale (AIM) et le Collège Science de l’Ingénierie des Connaissances de l’AFIA dans le cadre de la Plate-Forme Intelligence Artificielle (PFIA)
Subjects: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], [STAT.AP]Statistics [stat]/Applications [stat.AP], [STAT.ME] Statistics [stat]/Methodology [stat.ME], French, Natural Langage Processing, Urgences, Emergency room, [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], Pré-entraînement, [STAT.CO] Statistics [stat]/Computation [stat.CO], Français, [STAT.ML] Statistics [stat]/Machine Learning [stat.ML], [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], Pre-training, [STAT.AP] Statistics [stat]/Applications [stat.AP], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie, Transformers, [INFO.INFO-AU]Computer Science [cs]/Automatic Control Engineering, [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, Traitement automatique du langage, [STAT.CO]Statistics [stat]/Computation [stat.CO], [INFO.INFO-AU] Computer Science [cs]/Automatic Control Engineering, [STAT.ME]Statistics [stat]/Methodology [stat.ME]
Abstract: In France, structured data on emergency room visits are aggregated at the national level to build a syndromic surveillance system for different health events. For visits motivated by a traumatic event, information on the circumstances is stored in free text clinical notes. Automating the processing of these notes should allow the enrichment of surveillance tools. In development at Inserm and the Emergency Department of the Bordeaux University Hospital, The TARPON (for Automatic Processing of Emergency Room Notes for a National Observatory, in French) project aims to meet this objective by using the latest deep learning tools applied to automatic language analysis. To exploit these data, an automatic de-identification system, guaranteeing the protection of personal data, is necessary. We present here a comparison study of models allowing the de-identification of clinical texts in French., En France, les données structurées concernant les visites aux urgences sont agrégées au niveau national pour construire un système de surveillance syndromique de différents événements de santé. Pour les visites motivées par un événement traumatique, les informations sur les circonstances sont stockées dans des notes cliniques en texte libre. Automatiser le traitement de ces notes devrait permettre l'enrichissement des outils de surveillance. En développement à l'Inserm et au Service des urgences du CHU de Bordeaux, le projet TARPON (Traitement Automatique des Résumés de Passages aux urgences pour un Observatoire National) vise à répondre à cet objectif par le biais des derniers outils d'apprentissage profond appliqués à l'analyse automatique du langage. Pour exploiter ces données, un système de désidentification automatique, garantissant la protection des données personnelles est nécessaire. Nous présentons ici une étude de comparaison de modèles permettant la désidentification des textes cliniques en français.
Published: 2021

20. Agence pour la protection des programmes (APP) : CBCTool - Package R CBCtool - version 0.0.0.9000, 31-12-2021

Author: Avalos, Marta, Touchais, Hélène, Université de Bordeaux (UB), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria), Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Statistics In System biology and Translational Medicine (SISTM), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), and Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)
Subjects: [STAT.AP]Statistics [stat]/Applications [stat.AP], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, [STAT.CO]Statistics [stat]/Computation [stat.CO], [STAT.ME]Statistics [stat]/Methodology [stat.ME]
Published: 2021

21. Una herramienta de toma de decisiones para ajustar los niveles anormales en las pruebas de conteo sanguíneo completo

Author: Avalos, Marta, Touchais, Hélène, Henríquez-Henríquez, Marcela, Université de Bordeaux (UB), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria), Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Statistics In System biology and Translational Medicine (SISTM), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), IntegraMedica, and British United Provident Association Chile (BUPA Chile)
Subjects: [STAT.AP]Statistics [stat]/Applications [stat.AP], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, [STAT.CO]Statistics [stat]/Computation [stat.CO], [STAT.ME]Statistics [stat]/Methodology [stat.ME]
Abstract: International audience; El Conteo Sanguíneo Completo (CBC) realizado con analizadores hematológicos automatizados es una de las pruebas de laboratorio más frecuentemente solicitadas. Se utiliza como instrumento de primera línea para el control de la salud, el diagnóstico y el seguimiento de los pacientes, el hemograma influye así en la mayoría de las decisiones médicas. Si el análisis no se ajusta a lo esperado, el personal del laboratorio revisa manualmente un frottis sanguíneo, lo que requiere tiempo. Los criterios de revisión de los hemogramas se basan en directrices de consenso internacional y se adaptan localmente para tener en cuenta los recursos del laboratorio y las características de la población. En este trabajo, nuestro objetivo consiste en proporcionar una herramienta de apoyo a las decisiones del laboratorio clínico para identificar qué variables del hemograma están relacionadas con un mayor riesgo de frottis manual anormal y en qué valores umbral. Así, tratamos el ajuste de criterios como un problema de selección de características (feature selection). Proponemos una regresión logística aditiva penalizada por Lasso, sensible a costes (cost-sensitive), en combinación con un criterio de selección de estabilidad (stability selection), todo ello con el fin de tener en cuenta las peculiaridades de los datos y el contexto: desequilibrio importante de clases, categorización de predictores continuos, necesidad de obtener resultados estables e interpretables. Nuestra propuesta es competitiva en términos de predicción (en comparación con redes neuronales profundas) y en términos de selección de modelos (siempre y cuando haya suficientes datos en la vecindad de los verdaderos valores umbrales). El paquete R CBCtools está disponible públicamente. Este trabajo se hizo en colaboración con Hélène Touchais, Inria Bordeaux, y Marcela Henríquez Henríquez, BUPA Chile
Published: 2021

22. A decision-making tool to fine-tune abnormal levels in the complete blood count tests

Author: Avalos, Marta, Touchais, Hélène, Henríquez-Henríquez, Marcela, Université de Bordeaux (UB), Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT), IntegraMedica, British United Provident Association Chile (BUPA Chile), Avalos, Marta, and Université Fédérale Toulouse Midi-Pyrénées
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, [STAT.AP]Statistics [stat]/Applications [stat.AP], [STAT.ME] Statistics [stat]/Methodology [stat.ME], Population Health, Machine Learning (stat.ML), [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], [STAT.CO] Statistics [stat]/Computation [stat.CO], Statistics - Applications, [STAT.ML] Statistics [stat]/Machine Learning [stat.ML], GAM, Machine Learning (cs.LG), Imbalance, [STAT.AP] Statistics [stat]/Applications [stat.AP], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie, Statistics - Machine Learning, Applications (stat.AP), Interpretability, [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, Lasso, [STAT.CO]Statistics [stat]/Computation [stat.CO], [STAT.ME]Statistics [stat]/Methodology [stat.ME]
Abstract: The complete blood count (CBC) performed by automated hematology analyzers is one of the most ordered laboratory tests. It is a first-line tool for assessing a patient's general health status, or diagnosing and monitoring disease progression. When the analysis does not fit an expected setting, technologists manually review a blood smear using a microscope. The International Consensus Group for Hematology Review published in 2005 a set of criteria for reviewing CBCs. Commonly, adjustments are locally needed to account for laboratory resources and populations characteristics. Our objective is to provide a decision support tool to identify which CBC variables are associated with higher risks of abnormal smear and at which cutoff values. We propose a cost-sensitive Lasso-penalized additive logistic regression combined with stability selection. Using simulated and real CBC data, we demonstrate that our tool correctly identify the true cutoff values, provided that there is enough available data in their neighbourhood., Comment: Machine Learning for Health (ML4H) at NeurIPS 2020 - Extended Abstract
Published: 2020

23. Pre-Training a Neural Language Model Improves the Sample Efficiency of an Emergency Room Classification Model

Author: XU, Binbin, GIL-JARDINE, Cedric, THIESSARD, Frantz, TELLIER, Éric, AVALOS, Marta, LAGARDE, Emmanuel, Avalos, Marta, Roman Barták, Eric Bell, Université de Bordeaux (UB), Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Hôpital Pellegrin, CHU Bordeaux [Bordeaux]-Groupe hospitalier Pellegrin, Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), and Association for the Advancement of Artificial Intelligence
Subjects: FOS: Computer and information sciences, [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], Computer Science - Machine Learning, [STAT.AP]Statistics [stat]/Applications [stat.AP], Computer Science - Computation and Language, [STAT.ME] Statistics [stat]/Methodology [stat.ME], Computer Science - Artificial Intelligence, ERIAS, [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], [STAT.ML] Statistics [stat]/Machine Learning [stat.ML], SISTM, Machine Learning (cs.LG), [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], Artificial Intelligence (cs.AI), ComputingMethodologies_PATTERNRECOGNITION, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [STAT.AP] Statistics [stat]/Applications [stat.AP], IETO, [SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie, [INFO.INFO-AU]Computer Science [cs]/Automatic Control Engineering, [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, Computation and Language (cs.CL), [STAT.ME]Statistics [stat]/Methodology [stat.ME], [INFO.INFO-AU] Computer Science [cs]/Automatic Control Engineering
Abstract: To build a French national electronic injury surveillance system based on emergency room visits, we aim to develop a coding system to classify their causes from clinical notes in free-text. Supervised learning techniques have shown good results in this area but require a large amount of expert annotated dataset which is time consuming and costly to obtain. We hypothesize that the Natural Language Processing Transformer model incorporating a generative self-supervised pre-training step can significantly reduce the required number of annotated samples for supervised fine-tuning. In this preliminary study, we test our hypothesis in the simplified problem of predicting whether a visit is the consequence of a traumatic event or not from free-text clinical notes. Using fully re-trained GPT-2 models (without OpenAI pre-trained weights), we assess the gain of applying a self-supervised pre-training phase with unlabeled notes prior to the supervised learning task. Results show that the number of data required to achieve a ginve level of performance (AUC>0.95) was reduced by a factor of 10 when applying pre-training. Namely, for 16 times more data, the fully-supervised model achieved an improvement, Version of the published manuscript
Published: 2020

24. Training-Related Risk of Common Illnesses in Elite Swimmers over a 4-yr Period

Author: HELLARD, PHILIPPE, AVALOS, MARTA, GUIMARAES, FANNY, TOUSSAINT, JEAN-FRANÇOIS, and PYNE, DAVID B.
Published: 2015
Full Text: View/download PDF

25. High–Dimensional Sparse Matched Case–Control and Case–Crossover Data: A Review of Recent Works, Description of an R Tool and an Illustration of the Use in Epidemiological Studies

Author: Avalos, Marta, primary, Grandvalet, Yves, additional, Pouyes, Hélène, additional, Orriols, Ludivine, additional, and Lagarde, Emmanuel, additional
Published: 2014
Full Text: View/download PDF

26. Health conditions and the risk of home injury in French adults: results from a prospective study of the MAVIE cohort

Author: Rojas Castro, Madelyn Yiseth, primary, Avalos, Marta, additional, Contrand, Benjamin, additional, Dupuy, Marion, additional, Sztal-Kutas, Catherine, additional, Orriols, Ludivine, additional, and Lagarde, Emmanuel, additional
Published: 2021
Full Text: View/download PDF

27. Variable selection on large case-crossover data: application to a registry-based study of prescription drugs and road traffic crashes†

Author: Avalos, Marta, Orriols, Ludivine, Pouyes, Hélène, Grandvalet, Yves, Thiessard, Frantz, and Lagarde, Emmanuel
Published: 2014
Full Text: View/download PDF

28. Neural Language Model for Automated Classification of Electronic Medical Records at the Emergency Room. The Significant Benefit of Unsupervised Generative Pre-training

Author: Xu, Binbin, Gil-Jardiné, Cédric, Thiessard, Frantz, Tellier, Éric, Avalos, Marta, Lagarde, Emmanuel, Université de Bordeaux (UB), Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), CHU de Bordeaux Pellegrin [Bordeaux], Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Epidémiologie et Biostatistique [Bordeaux], Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), and Avalos, Marta
Subjects: Transformer, [STAT.AP]Statistics [stat]/Applications [stat.AP], [STAT.ME] Statistics [stat]/Methodology [stat.ME], [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], Neural Language Model, [STAT.CO] Statistics [stat]/Computation [stat.CO], [STAT.ML] Statistics [stat]/Machine Learning [stat.ML], [STAT.AP] Statistics [stat]/Applications [stat.AP], Pre-training, [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie, [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, GPT-2, [STAT.CO]Statistics [stat]/Computation [stat.CO], [STAT.ME]Statistics [stat]/Methodology [stat.ME]
Abstract: In order to build a national injury surveillance system based on emergency room (ER) visits we are developing a coding system to classify their causes from clinical notes in free-text. Supervised learning techniques have shown good results in this area but require large number of annotated dataset. New levels of performance have been recently achieved in neural language models (NLM) with models based on the Transformer architecture incorporating an unsupervised generative pre-training step. Our hypothesis is that methods involving a generative self-supervised pre-training step can significantly reduce the required number of annotated samples for supervised fine-tuning. In this case study, we assessed whether we could predict from free-text clinical notes whether a visit was the consequence of a traumatic or non-traumatic event. Using fully re-trained GPT-2 models (without OpenAI pre-trained weightings), we compared two scenarios: Scenario A (26 study cases of different training data sizes) consisted in training the GPT-2 on the trauma/non-trauma labeled (up to 161 930) clinical notes. In Scenario B (19 study cases), a first step of self-supervised pre-training phase with unlabeled (up to 151 930) notes and the second step of supervised fine-tuning with labeled (up to 10 000) notes. Results showed that, Scenario A needed to process >6 000 notes to achieve good performance (AUC>0.95), Scenario B needed only 600 notes, gain of a factor 10. At the end case of both scenarios, for 16 times more data (161 930 vs. 10 000), the gain from Scenario A compared to Scenario B is only an improvement of 0.89% in AUC and 2.12% in F1 score. To conclude, it is possible to adapt a multi-purpose NLM model such as the GPT-2 to create a powerful tool for classification of free-text notes with only very small number of labeled samples.
Published: 2019

29. Health administrative data enrichment using cohort information: Comparative evaluation of methods by simulation and application to real data

Author: Silenou, Bernard, Avalos, Marta, Helmer, Catherine, Berr, Claudine, Pariente, Antoine, Jacqmin-Gadda, Hélène, Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Université de Montpellier (UM), Institut National de la Santé et de la Recherche Médicale (INSERM), CHU Bordeaux [Bordeaux], and Avalos, Marta
Subjects: Male, Databases, Factual, Economics, Science, Social Sciences, Research and Analysis Methods, Cohort Studies, Fractures, Bone, Database and Informatics Methods, Benzodiazepines, Databases, Insurance Claim Review, Health Economics, Mathematical and Statistical Techniques, [STAT.AP] Statistics [stat]/Applications [stat.AP], Mental Health and Psychiatry, Medicine and Health Sciences, Humans, Bone, Factual, Animal Management, Aged, Pharmacology, Animal Performance, [STAT.AP]Statistics [stat]/Applications [stat.AP], Mood Disorders, Depression, Simulation and Modeling, Drugs, Biology and Life Sciences, Agriculture, Health Care, Antihypertensive Drugs, Research Design, [SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie, Medicine, Female, [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, France, Mathematical Functions, Fractures, Research Article, Health Insurance
Abstract: Background Studies using health administrative databases (HAD) may lead to biased results since information on potential confounders is often missing. Methods that integrate confounder data from cohort studies, such as multivariate imputation by chained equations (MICE) and two-stage calibration (TSC), aim to reduce confounding bias. We provide new insights into their behavior under different deviations from representativeness of the cohort. Methods We conducted an extensive simulation study to assess the performance of these two methods under different deviations from representativeness of the cohort. We illustrate these approaches by studying the association between benzodiazepine use and fractures in the elderly using the general sample of French health insurance beneficiaries (EGB) as main database and two French cohorts (Paquid and 3C) as validation samples. Results When the cohort was representative from the same population as the HAD, the two methods are unbiased. TSC was more efficient and faster but its variance could be slightly underestimated when confounders were non-Gaussian. If the cohort was a subsample of the HAD (internal validation) with the probability of the subject being included in the cohort depending on both exposure and outcome, MICE was unbiased while TSC was biased. The two methods appeared biased when the inclusion probability in the cohort depended on unobserved confounders. Conclusion When choosing the most appropriate method, epidemiologists should consider the origin of the cohort (internal or external validation) as well as the (anticipated or observed) selection biases of the validation sample.
Published: 2019
Full Text: View/download PDF

30. De-identification of Emergency Medical Records in French: Survey and Comparison of State-of-the-Art Automated Systems

Author: Bourdois, Loick, primary, Avalos, Marta, primary, Chenais, Gabrielle, primary, Thiessard, Frantz, primary, Revel, Philippe, primary, Gil-Jardine, Cedric, primary, and Lagarde, Emmanuel, primary
Published: 2021
Full Text: View/download PDF

31. Prescription medicine use by pedestrians and the risk of injurious road traffic crashes: A case-crossover study

Author: Née, Mélanie, Avalos, Marta, Luxcey, Audrey, Contrand, Benjamin, Salmi, Louis-Rachid, Fourrier-Réglat, Annie, Gadegbeku, Blandine, Lagarde, Emmanuel, and Orriols, Ludivine
Subjects: Crash injuries -- Risk factors, Pedestrians -- Health aspects, Prescriptions (Drugs) -- Dosage and administration -- Complications and side effects, Biological sciences
Abstract: Background While some medicinal drugs have been found to affect driving ability, no study has investigated whether a relationship exists between these medicines and crashes involving pedestrians. The aim of this study was to explore the association between the use of medicinal drugs and the risk of being involved in a road traffic crash as a pedestrian. Methods and findings Data from 3 French nationwide databases were matched. We used the case-crossover design to control for time-invariant factors by using each case as its own control. To perform multivariable analysis and limit false-positive results, we implemented a bootstrap version of Lasso. To avoid the effect of unmeasured time-varying factors, we varied the length of the washout period from 30 to 119 days before the crash. The matching procedure led to the inclusion of 16,458 pedestrians involved in an injurious road traffic crash from 1 July 2005 to 31 December 2011. We found 48 medicine classes with a positive association with the risk of crash, with median odds ratios ranging from 1.12 to 2.98. Among these, benzodiazepines and benzodiazepine-related drugs, antihistamines, and anti-inflammatory and antirheumatic drugs were among the 10 medicines most consumed by the 16,458 pedestrians. Study limitations included slight overrepresentation of pedestrians injured in more severe crashes, lack of information about self-medication and the use of over-the-counter drugs, and lack of data on amount of walking. Conclusions Therapeutic classes already identified as impacting the ability to drive, such as benzodiazepines and antihistamines, are also associated with an increased risk of pedestrians being involved in a road traffic crash. This study on pedestrians highlights the necessity of improving awareness of the effect of these medicines on this category of road user., Author(s): Mélanie Née 1,2,*, Marta Avalos 1,3, Audrey Luxcey 1,2, Benjamin Contrand 1,2, Louis-Rachid Salmi 1,2,4, Annie Fourrier-Réglat 5,6,7, Blandine Gadegbeku 8,9,10, Emmanuel Lagarde 1,2, Ludivine Orriols 1,2 Introduction Walking [...]
Published: 2017
Full Text: View/download PDF

32. Parsimonious additive models

Author: Avalos, Marta, Grandvalet, Yves, and Ambroise, Christophe
Published: 2007
Full Text: View/download PDF

33. Modeling the residual effects and threshold saturation of training: a case study of Olympic swimmers

Author: Hellard, Philippe, Avalos, Marta, Millet, Gregoire, Lacoste, Lucien, Barale, Frederic, and Chatard, Jean-Claude
Subjects: Strengthening exercises -- Research, Swimming -- Research, Health, Sports and fitness
Abstract: Hellard, P., M. Avalos, G. Millet, L. Lacoste, and J.C. Chatard. Modeling the residual effects and threshold saturation of training: A case study of Olympic swimmers. J. Strength Cond. Res. 19(1):67-75. 2005.--The aim of this study was to model the residual effects of training on the swimming performance and to compare a model that includes threshold saturation (MM) with the Banister model (BM). Seven Olympic swimmers were studied over a period of 4 [+ or -] 2 years. For 3 training loads (low-intensity [w.sup.LIT], high-intensity [w.sup.HIT] and strength training [w.sup.ST]), 3 residual training effects were determined: short-term (STE) during the taper phase (i.e., 3 weeks before the performance [weeks 0, 1, and 2]), intermediate-term (ITE) during the intensity phase (weeks 3, 4, and 5), and long-term (LTE) during the volume phase (weeks 6, 7, and 8). ITE and LTE were positive for [w.sup.HIT] and [w.sup.LIT], respectively (p < 0.05). Low-intensity training load during taper was related to performances by a parabolic relationship (p < 0.05). Different quality measures indicated that MM compares favorably with BM. Identifying individual training thresholds may help individualize the distribution of training loads. KEY WORDS. mathematical model, performance, swimming
Published: 2005

34. Health conditions and the risk of home injury in French adults: results from a prospective study of the MAVIE cohort.

Author: Rojas Castro, Madelyn Yiseth, Avalos, Marta, Contrand, Benjamin, Dupuy, Marion, Sztal-Kutas, Catherine, Orriols, Ludivine, and Lagarde, Emmanuel
Subjects: INJURY risk factors, PATIENT aftercare, CONFIDENCE intervals, HOME accidents, DIZZINESS, HEALTH status indicators, BACKACHE, RISK assessment, SCIATICA, DESCRIPTIVE statistics, ODDS ratio, LONGITUDINAL method, VERTIGO, DISEASE complications, ADULTS
Published: 2022
Full Text: View/download PDF

35. Modeling the Association between HR Variability and Illness in Elite Swimmers

Author: HELLARD, PHILIPPE, GUIMARAES, FANNY, AVALOS, MARTA, HOUEL, NICOLAS, HAUSSWIRTH, CHRISTOPHE, and TOUSSAINT, JEAN FRANÇOIS
Published: 2011
Full Text: View/download PDF

36. Health conditions and the risk of home injury in French adults: Results from a prospective study of the MAVIE cohort

Author: Rojas Castro, Madelyn Yiseth, primary, Avalos, Marta, additional, Contrand, Benjamin, additional, Dupuy, Marion, additional, Sztal-Kutas, Catherine, additional, Orriols, Ludivine, additional, and Lagarde, Emmanuel, additional
Published: 2020
Full Text: View/download PDF

37. Penalized logistic regression with low prevalence exposures beyond high dimensional settings

Author: Doerken, Sam, Avalos, Marta, Lagarde, Emmanuel, Schumacher, Martin, Institute of Medical Biometry and Statistics [Fribourg] (IMBI), Faculty of Medicine and Medical Center, Université de Fribourg = University of Fribourg (UNIFR)-Université de Fribourg = University of Fribourg (UNIFR), University of Freiburg [Freiburg], Université de Bordeaux (UB), Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), University of Fribourg-University of Fribourg, and Avalos, Marta
Subjects: Male, Epidemiology, Cervical Cancer, [STAT.AP] Statistics [stat]/Applications [stat.AP], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], Risk Factors, Prevalence, Medicine and Health Sciences, Public and Occupational Health, [STAT.CO]Statistics [stat]/Computation [stat.CO], [STAT.AP]Statistics [stat]/Applications [stat.AP], Likelihood Functions, [STAT.ME] Statistics [stat]/Methodology [stat.ME], Cancer Risk Factors, Simulation and Modeling, Traumatic Injury Risk Factors, Accidents, Traffic, Middle Aged, Pharmaceutical Preparations, Oncology, Research Design, Data Interpretation, Statistical, Road Traffic Collisions, Medicine, Regression Analysis, Female, France, [STAT.ME]Statistics [stat]/Methodology [stat.ME], Cancer Epidemiology, Research Article, Adult, Adolescent, Science, Research and Analysis Methods, [STAT.CO] Statistics [stat]/Computation [stat.CO], Young Adult, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Humans, Computer Simulation, Cancers and Neoplasms, [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], [STAT.ML] Statistics [stat]/Machine Learning [stat.ML], Logistic Models, [SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie, Case-Control Studies, Medical Risk Factors, [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, Gynecological Tumors
Abstract: International audience; Estimating and selecting risk factors with extremely low prevalences of exposure for a binary outcome is a challenge because classical standard techniques, markedly logistic regression, often fail to provide meaningful results in such settings. While penalized regression methods are widely used in high-dimensional settings, we were able to show their usefulness in low-dimensional settings as well. Specifically, we demonstrate that Firth correction, ridge, the lasso and boosting all improve the estimation for low-prevalence risk factors. While the methods themselves are well-established, comparison studies are needed to assess their potential benefits in this context. This is done here using the dataset of a large unmatched case-control study from France (2005-2008) about the relationship between prescription medicines and road traffic accidents and an accompanying simulation study. Results show that the estimation of risk factors with prevalences below 0.1% can be drastically improved by using Firth correction and boosting in particular, especially for ultra-low prevalences. When a moderate number of low prevalence exposures is available, we recommend the use of penalized techniques.
Published: 2018
Full Text: View/download PDF

38. A simulation framework of high-dimensional phylogenetic microbiota data

Author: Soret, Perrine, Avalos, Marta, Delhaes, Laurence, Thiébaut, Rodolphe, Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Université de Bordeaux (UB), Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Vaccine Research Institute (VRI), Université Paris-Est Créteil Val-de-Marne - Paris 12 (UPEC UP12), CHU de Bordeaux Pellegrin [Bordeaux], Centre de recherche Cardio-Thoracique de Bordeaux [Bordeaux] (CRCTB), Université Bordeaux Segalen - Bordeaux 2-CHU Bordeaux [Bordeaux]-Institut National de la Santé et de la Recherche Médicale (INSERM), and Avalos, Marta
Subjects: [STAT.AP]Statistics [stat]/Applications [stat.AP], [STAT.ME] Statistics [stat]/Methodology [stat.ME], [STAT.AP] Statistics [stat]/Applications [stat.AP], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie, [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], [STAT.CO]Statistics [stat]/Computation [stat.CO], [STAT.CO] Statistics [stat]/Computation [stat.CO], [STAT.ME]Statistics [stat]/Methodology [stat.ME], [STAT.ML] Statistics [stat]/Machine Learning [stat.ML], ComputingMilieux_MISCELLANEOUS
Abstract: International audience
Published: 2018

39. Usefulness of Bayesian modeling in risk analysis and prevention of Home Leisure and Sport Injuries (HLIs)

Author: Rojas Castro, Madelyn, Travanca, Marina, Avalos, Marta, Conesa, David Valentin, Lagarde, Emmanuel, Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Université de Bordeaux (UB), Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Universitat de València (UV), and Avalos, Marta
Subjects: Injury epidemiology, [STAT.AP]Statistics [stat]/Applications [stat.AP], [STAT.ME] Statistics [stat]/Methodology [stat.ME], [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], [STAT.CO] Statistics [stat]/Computation [stat.CO], [STAT.ML] Statistics [stat]/Machine Learning [stat.ML], Bayesian modeling, [STAT.AP] Statistics [stat]/Applications [stat.AP], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie, [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, [STAT.CO]Statistics [stat]/Computation [stat.CO], [STAT.ME]Statistics [stat]/Methodology [stat.ME], ComputingMilieux_MISCELLANEOUS
Abstract: International audience
Published: 2018

40. Lasso regularization for left-censored Gaussian outcome and high-dimensional predictors

Author: Soret, Perrine, Avalos, Marta, Wittkop, Linda, Commenges, Daniel, Thiébaut, Rodolphe, Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Vaccine Research Institute (VRI), Université Paris-Est Créteil Val-de-Marne - Paris 12 (UPEC UP12), Université de Bordeaux (UB), and Avalos, Marta
Subjects: Genotype, Limit of detection, Normal Distribution, HIV Infections, Biostatistics, [STAT.CO] Statistics [stat]/Computation [stat.CO], HIV viral load, [STAT.AP] Statistics [stat]/Applications [stat.AP], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Outcome Assessment, Health Care, Humans, MORPH3Eus, Computer Simulation, Least-Squares Analysis, [STAT.CO]Statistics [stat]/Computation [stat.CO], Buckley-James least squares procedure, [STAT.AP]Statistics [stat]/Applications [stat.AP], lcsh:R5-920, [STAT.ME] Statistics [stat]/Methodology [stat.ME], HIV genotypic mutations, [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], Models, Theoretical, Prognosis, [STAT.ML] Statistics [stat]/Machine Learning [stat.ML], SISTM, [SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie, Drug resistance, Mutation, Cross-sectional studies, [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, lcsh:Medicine (General), [STAT.ME]Statistics [stat]/Methodology [stat.ME], Algorithms, Research Article
Abstract: International audience; BACKGROUND:Biological assays for the quantification of markers may suffer from a lack of sensitivity and thus from an analytical detection limit. This is the case of human immunodeficiency virus (HIV) viral load. Below this threshold the exact value is unknown and values are consequently left-censored. Statistical methods have been proposed to deal with left-censoring but few are adapted in the context of high-dimensional data.METHODS:We propose to reverse the Buckley-James least squares algorithm to handle left-censored data enhanced with a Lasso regularization to accommodate high-dimensional predictors. We present a Lasso-regularized Buckley-James least squares method with both non-parametric imputation using Kaplan-Meier and parametric imputation based on the Gaussian distribution, which is typically assumed for HIV viral load data after logarithmic transformation. Cross-validation for parameter-tuning is based on an appropriate loss function that takes into account the different contributions of censored and uncensored observations. We specify how these techniques can be easily implemented using available R packages. The Lasso-regularized Buckley-James least square method was compared to simple imputation strategies to predict the response to antiretroviral therapy measured by HIV viral load according to the HIV genotypic mutations. We used a dataset composed of several clinical trials and cohorts from the Forum for Collaborative HIV Research (HIV Med. 2008;7:27-40). The proposed methods were also assessed on simulated data mimicking the observed data.RESULTS:Approaches accounting for left-censoring outperformed simple imputation methods in a high-dimensional setting. The Gaussian Buckley-James method with cross-validation based on the appropriate loss function showed the lowest prediction error on simulated data and, using real data, the most valid results according to the current literature on HIV mutations.CONCLUSIONS:The proposed approach deals with high-dimensional predictors and left-censored outcomes and has shown its interest for predicting HIV viral load according to HIV mutations.
Published: 2018
Full Text: View/download PDF

41. Regularization Methods for Additive Models

Author: Avalos, Marta, primary, Grandvalet, Yves, additional, and Ambroise, Christophe, additional
Published: 2003
Full Text: View/download PDF

42. Assessing the limitations of the Banister model in monitoring training

Author: Hellard, Philippe, Avalos, Marta, LaCoste, Lucien, Barale, Frederic, Chatard, Jean-Claude, and Millet, Gregoire P.
Subjects: Exercise -- Research, Exercise -- Physiological aspects, Swimming -- Physiological aspects, Swimming -- Analysis
Published: 2006

43. Modeling the Training-Performance Relationship Using a Mixed Model in Elite Swimmers

Author: AVALOS, MARTA, HELLARD, PHILIPPE, and CHATARD, JEAN-CLAUDE
Published: 2003

44. High-dimensional compositional microbiota data: state-of-the-art of methods and software implementations

Author: Soret, Perrine, Avalos, Marta, Cheng, Soon, Thiebaut, Rodolphe, Université de Bordeaux (UB), Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Data61 [Canberra] (CSIRO), Australian National University (ANU)-Commonwealth Scientific and Industrial Research Organisation [Canberra] (CSIRO), CHU de Bordeaux Pellegrin [Bordeaux], and Avalos, Marta
Subjects: [STAT.AP]Statistics [stat]/Applications [stat.AP], [STAT.ME] Statistics [stat]/Methodology [stat.ME], [STAT.AP] Statistics [stat]/Applications [stat.AP], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie, [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], [STAT.ME]Statistics [stat]/Methodology [stat.ME], [STAT.ML] Statistics [stat]/Machine Learning [stat.ML]
Abstract: National audience; Compositional data (CoDa) consist of a collection of nonnegative measurements that sum to a constant value, typically, proportions that sum to 1. Because knowing the sum, one component can be determined from the sum of the remainder, the parts that make up the composition are mathematically and statistically dependent. This distinct structure complicates analysis and does not allow standard statistical analyses. Aitchison (JRSS-B, 1982) and Egozcue and colleagues (Math. Geol., 2003), among others, provided a framework to analyze CoDa by mapping data from the constrained simplex space to the Euclidian space using nonlinear transforms such as the log-odds or the isometric log-ratio transforms. The increasing quality/reducing cost of high-throughput sequencing technology, in particular, 16S rRNA gene sequencing of the bacterial component of the human microbial community (microbiota), has enabled researchers to investigate human diseases. Subsequently, microbiota has been associated with numerous diseases, including inflammatory bowel disease, diabetes, cancer and cystic fibrosis. Because of the compositional structure and the high-dimensional data generated by microbiota sequencing, there is also a parallel development of specific statistical analysis methods and computational tools. Microbiota are usually measured as relative abundance of species and analyzed as CoDa. The objectives of this work are the following: - First, to review theory and usage of CoDa analysis in the microbiota setting, with particular emphasis on recent proposals adapted to high-dimensional problems (e.g. supervised –constrained Lasso, hierarchical Lasso, kernel methods, sPLS, or unsupervised – PCoA, PCA, Sparse inverse covariance estimation).- Second, to investigate the current state-of-the-art software implementations (basically, R packages: compositions, vegan, ALDex2, PERMANOVA, MiRKAT, MixMC . . . )- Third, using toy examples and publicly available data (the 16S data from the Koren and colleagues’ study in March 2011’s PNAS, available in the MixMC R package), to implement and evaluate those methods with publicly available codes. Evaluation criteria are mainly based oncomputational and practical aspects.
Published: 2017

45. Penalized logistic regression with low prevalence exposures beyond high dimensional settings

Author: Doerken, Sam, primary, Avalos, Marta, additional, Lagarde, Emmanuel, additional, and Schumacher, Martin, additional
Published: 2019
Full Text: View/download PDF

46. Evolution of Teaching Strategies in a French ODL University Course

Author: Avalos, Marta, Le Goff, Mélanie, Joly, Pierre, Jutand, Marthe-Aline, Alioum, Ahmadou, Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Université de Bordeaux (UB), Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), The International Association for Statistical Education (IASE), H. MacGillivray, M. Martin, B. Phillips, and Avalos, Marta
Subjects: [STAT.AP]Statistics [stat]/Applications [stat.AP], [STAT.ME] Statistics [stat]/Methodology [stat.ME], [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], [STAT.CO] Statistics [stat]/Computation [stat.CO], [STAT.ML] Statistics [stat]/Machine Learning [stat.ML], teaching introductory statistics, Learning strategies, [STAT.AP] Statistics [stat]/Applications [stat.AP], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie, [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, [STAT.CO]Statistics [stat]/Computation [stat.CO], [STAT.ME]Statistics [stat]/Methodology [stat.ME], e-learning
Abstract: ISBN: 978-0-9805950-2-4; International audience; The university course on statistical methods in health at the Bordeaux School of Public Health, University of Bordeaux, has been run as an Open and Distance Learning (ODL) program since 2004 on the basics of statistical reasoning in the health field. The course is mainly for professionals. In more than ten years, about 1,000 people have been trained with over a third coming from sub-Saharan Africa. The program aims to meet a growing demand for statistical training from professionals from the south whose mobility is limited. Each year a satisfaction survey is sent to students with a view to improving the program. Even though participation in the survey is anonymous and not compulsory, it is a valuable source of comments and ideas. These have led to innovative pedagogical practices such as “tutored exercises” with individual correction, the use of new statistical software, summary sheets and flipped classrooms. However, benchmarking of the program has shown that more could be done. Teaching strategies should evolve within the framework of distance learning in terms of content, form and interactivity. This article discusses the development of these new educational strategies from their inception as well as future projects.
Published: 2016

47. Exploring variable selection in additive mixed effects models using group lasso

Author: Avalos, Marta, Soret, Perrine, Meza, Cristian, Bertin, Karine, Ren, Hao, Hellard, Philippe, Université de Bordeaux (UB), Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Vaccine Research Institute (VRI), Université Paris-Est Créteil Val-de-Marne - Paris 12 (UPEC UP12), Centro de Investigación y Modelamiento de Fenómenos Aleatorios – Valparaíso (CIMFAV), Universidad de Valparaiso [Chile], Université de Technologie de Compiègne (UTC), Fédération Française de Natation (FFN), Institut de recherche biomédicale et d’épidémiologie du sport (IRMES - EA 7329), Université Paris Descartes - Paris 5 (UPD5)-Institut national du sport, de l'expertise et de la performance (INSEP), This research was partially funded by the French Institute of Sport, Expertise and Performance (INSEP) under grant no14r21, Statistical Society of Australia (SSA), Avalos, Marta, Université de Bordeaux ( UB ), Statistics In System biology and Translational Medicine ( SISTM ), Epidémiologie et Biostatistique [Bordeaux], Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale ( INSERM ) -Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale ( INSERM ) -Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique ( Inria ) -Institut National de Recherche en Informatique et en Automatique ( Inria ), Centro de Investigación y Modelamiento de Fenómenos Aleatorios – Valparaíso ( CIMFAV ), Université de Technologie de Compiègne ( UTC ), Fédération Française de Natation ( FFN ), Institut de recherche biomédicale et d’épidémiologie du sport ( IRMES - EA 7329 ), Institut national du sport, de l'expertise et de la performance ( INSEP ) -Université Paris Descartes - Paris 5 ( UPD5 ), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Epidémiologie et Biostatistique [Bordeaux], and Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)
Subjects: [STAT.AP]Statistics [stat]/Applications [stat.AP], [STAT.ME] Statistics [stat]/Methodology [stat.ME], Longitudinal data --- algorithme EM, [ STAT.AP ] Statistics [stat]/Applications [stat.AP], [ SDV.SPEE ] Life Sciences [q-bio]/Santé publique et épidémiologie, [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], L1-penalty, [STAT.CO] Statistics [stat]/Computation [stat.CO], [ INFO.INFO-LG ] Computer Science [cs]/Machine Learning [cs.LG], [STAT.ML] Statistics [stat]/Machine Learning [stat.ML], [ STAT.ME ] Statistics [stat]/Methodology [stat.ME], [STAT.AP] Statistics [stat]/Applications [stat.AP], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie, [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, Sport science data, [STAT.CO]Statistics [stat]/Computation [stat.CO], [STAT.ME]Statistics [stat]/Methodology [stat.ME], [ STAT.ML ] Statistics [stat]/Machine Learning [stat.ML], [ STAT.CO ] Statistics [stat]/Computation [stat.CO]
Abstract: International audience; We consider the problem of estimating a high-dimensional additive mixed model for longitudinal data using sparse methods. In this problem, multiple measurements are made on the same subject across time, and then the different sources of variability (intra- and inter-subject variability) and correlation within subjects have to be considered. Also, the relationships between explanatory variables and the outcome arepossibly non linear. In addition, the number of explanatory variables could be larger than the sample size but only a small set of explanatory variables contribute to the response.Several computational approaches for high-dimensional additive modelling for independent data have been developed in the literature. Recently, Amato and colleagues (Stat Methods Appl 2016; s10260-016-0357-8) conducted a comprehensive review of these methods. Efficient regularized estimation procedures for variable selection in nonparametric additive models use basis function approximations. The authors also proposed a reformulation of the estimation problem in terms of group Lasso that allows deducing convergence and asymptotic optimality properties.Only a few works have developed suggestions to analyse high-dimensional longitudinal data using Lasso-type methods in additive mixed model. The resulting estimator depends only on a relatively small number of basis functions, however variable selection is not directly encouraged. In this study we explore the extension of the group Lasso penalty to additive mixed effects models. We discuss computational aspects, including a comparison of group Lasso algorithms implemented through publicly available R codes, the estimation of optimal regularization parameter and linkages between mean and covariance parameter estimation algorithms. We illustrate the interest of such approaches in the analysis of a twenty - year longitudinal study of training practices of elite athletes.
Published: 2016

48. Adjustment for Unobserved Confounders in Health Administrative Databases

Author: Silenou Chawo, Bernard, Avalos, Marta, Pariente, Antoine, Jacqmin-Gadda, Hélène, Université de Bordeaux (UB), Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), CHU Bordeaux [Bordeaux], CIC - Bordeaux, Université Bordeaux Segalen - Bordeaux 2-CHU Bordeaux [Bordeaux]-Institut National de la Santé et de la Recherche Médicale (INSERM), ANSM, INTERNATIONAL SOCIETY FOR PHARMACOEPIDEMIOLOGY (ISPE), DRUGS-SAFE, and Avalos, Marta
Subjects: congenital, hereditary, and neonatal diseases and abnormalities, [STAT.AP]Statistics [stat]/Applications [stat.AP], [STAT.ME] Statistics [stat]/Methodology [stat.ME], Phamacoepidemiology, [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], [STAT.CO] Statistics [stat]/Computation [stat.CO], [STAT.ML] Statistics [stat]/Machine Learning [stat.ML], Multivariate imputation by chained equations, [STAT.AP] Statistics [stat]/Applications [stat.AP], EGB, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie, Two stage calibration, [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, Unmeasured confounding, [STAT.CO]Statistics [stat]/Computation [stat.CO], [STAT.ME]Statistics [stat]/Methodology [stat.ME]
Abstract: International audience; BackgroundIn health administrative databases (HAD) information on potential confounders such as tobacco and alcohol consumption are missing. Often, this information is readily available in a cohort data. Multivariate imputation by chained equations (MICE) and Two stage calibration (TSC) may be used to adjust for unobserved confounders (UC) in HAD using cohort data.ObjectivesWe aim at comparing the performances of MICE and TSC in adjusting for UC in HAD using a cohort data in a simulation study.MethodsWe generated a HAD with 10000 observations, a binary exposure, binary response and two observed confounders (OC). Likewise a cohort data with 1000 observations and additional two UC. The design exploited various distribution of OC and UC, strength of confounding effect, misspecification of propensity score model and lack of representativeness of the cohort data to HAD. MICE was applied by imputing the UC or propensity scores while TSC was applied with or without spline. Comparison was based on Bias, coverage rate of the confidence interval and mean square (MSE).ResultsWhen the cohort data is a representative sample with Gaussian confounders and a well-defined propensity score model assumed; both methods gives no bias, nominal coverage rate with smallest variance from TSC. Similar results were got in a misspecified propensity score (MPS) setting with smaller coverage rate for TSC. In addition, with strong confounding effect of UC and nonstandard distributions assumed, the coverage rate of TSC may slightly decrease in a MPS setting, but this is ameliorated by TSC with spline. Moreover, under lack of representativeness of the cohort sample, both methods are bias with low coverage rates.ConclusionsOur results justify that when a well specified Propensity score model is assumed, TSC and MICE gives better and equivalent results but in a misspecified setting, the coverage of TSC is poorer than that of MICE although the bias and standard errors might still be small. These methods will hereafter be used to study the association between benzodiazepine consumption and fracture in the French HAD by utilising information on UC from a cohort study.
Published: 2016

49. A comparison of unsupervised curve classification methods for sport training data

Author: Lefort, Gaëlle, Avalos, Marta, Soret, Perrine, David, Pyne, Toussaint, Jean-François, Hellard, Philippe, Ecole Nationale de la Statistique et de l'Analyse de l'Information [Bruz] (ENSAI), Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Université de Bordeaux (UB), Vaccine Research Institute (VRI), Université Paris-Est Créteil Val-de-Marne - Paris 12 (UPEC UP12), Australian Institute of Sport, Institut national du sport, de l'expertise et de la performance (INSEP), Institut de recherche biomédicale et d’épidémiologie du sport (IRMES - EA 7329), Université Paris Descartes - Paris 5 (UPD5)-Institut national du sport, de l'expertise et de la performance (INSEP), Université Paris Descartes - Paris 5 (UPD5), Fédération Française de Natation (FFN), This research was partially funded by the French Institute of Sport, Expertise and Performance (INSEP) under grant no14r21., The International Society for NonParametric Statistics (ISNPS), Ecole Nationale de la Statistique et de l'Analyse de l'Information ( ENSAI ), Ensai, Ecole Nationale de la Statistique et de l'Analyse de l'Information, Statistics In System biology and Translational Medicine ( SISTM ), Epidémiologie et Biostatistique [Bordeaux], Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale ( INSERM ) -Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale ( INSERM ) -Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique ( Inria ) -Institut National de Recherche en Informatique et en Automatique ( Inria ), Université de Bordeaux ( UB ), Institut national du sport, de l'expertise et de la performance ( INSEP ), Institut de recherche biomédicale et d’épidémiologie du sport ( IRMES - EA 7329 ), Université Paris Descartes - Paris 5 ( UPD5 ) -Institut national du sport, de l'expertise et de la performance ( INSEP ), Université Paris Descartes - Paris 5 ( UPD5 ), Fédération Française de Natation ( FFN ), Ecole Nationale de la Statistique et de l'Analyse de l'Information (ENSAI), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Epidémiologie et Biostatistique [Bordeaux], Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), and Avalos, Marta
Subjects: [STAT.AP]Statistics [stat]/Applications [stat.AP], [STAT.ME] Statistics [stat]/Methodology [stat.ME], [ STAT.AP ] Statistics [stat]/Applications [stat.AP], [ SDV.SPEE ] Life Sciences [q-bio]/Santé publique et épidémiologie, [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], [STAT.CO] Statistics [stat]/Computation [stat.CO], [STAT.ML] Statistics [stat]/Machine Learning [stat.ML], [ INFO.INFO-LG ] Computer Science [cs]/Machine Learning [cs.LG], Functional data analysis, [ STAT.ME ] Statistics [stat]/Methodology [stat.ME], [STAT.AP] Statistics [stat]/Applications [stat.AP], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie, [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, Longitudinal data analysis, Sport science data, [STAT.CO]Statistics [stat]/Computation [stat.CO], [STAT.ME]Statistics [stat]/Methodology [stat.ME], [ STAT.ML ] Statistics [stat]/Machine Learning [stat.ML], [ STAT.CO ] Statistics [stat]/Computation [stat.CO]
Abstract: International audience; Achieving peak performance at a specified time is the primary goal of athletes’ training programs. To optimize performance and reduce the risk of injury, a comprehensive list of training program parameters (e.g. intensity, volume, frequency, distribution, duration and type) requires careful management. This work focuses on clustering of time evolution curves of training measurements.Training data are recorded densely over time. However, duration of follow-up and duration of the seasons vary among subjects. Also, subject-specific variation can induce substantial error. Functional data analysis (FDA) and longitudinal data analysis (LDA) are the main approaches to analyze repeated measures data (in which multiple measurements are made on the same subject across time). Typically, FDA is applied when the data are dense, assumed to be observed in the continuum, and a function of time. LDA is usually applied when data are sparse, possibly with different number of measurements across individuals, and subject to error. We compared several FDA and LDA methods implemented through publicly available R code: k-means based on the standard Euclidian distance, a discrete Fréchet distance [2], and a functional distance [1]; Gaussian mixture model–based clustering for standard [4], longitudinal [5] and functional [3] data; and latent class mixed models [6]. We discuss advantages and limitations including computational and practical aspects.References[1] Febrero-Bande, M. and Oviedo de la Fuente, M. (2012). Statistical computing in functional data analysis: the R package fda.usc. Journal of Statistical Software, 51, 1–28.[2] Genolini, C. and Falissard, B. (2011). Kml : A package to cluster longitudinal data. Computer Methods and Programs in Biomedicine.[3] Jacques, J. and Preda, C. (2013). Funclust: A curves clustering method using functional random variables density approximation. Neurocomputing, 112, 164–171.[4] Lebret, R., Iovleff, S., Langrognet, F., Biernacki, C., Celeux, G., and Govaert, G. (2014). Rmixmod: The R package of the model–based unsupervised, supervised and semi–supervised classification mixmod library. Journal of Statistical Software.[5] McNicholas, P. D. and Murphy, T. B. (2010). Model–based clustering of longitudinal data. Canadian Journal of Statistics, 38, 153–168.[6] Proust-Lima, C., Philipps, V., and Liquet, B. (2015). Estimation of extended mixed models using latent classes and latent processes: the R package lcmm. Technical report, University of Bordeaux. arXiv:1503.00890v2.
Published: 2016

50. Lasso-type estimators for non-parametric mixed-effects models: application to high-dimensional data from a vaccine clinical trial for HIV

Author: Soret, Perrine, Meza, Cristian, Avalos, Marta, Bertin, Karine, Thiébaut, Rodolphe, Vaccine Research Institute (VRI), Université Paris-Est Créteil Val-de-Marne - Paris 12 (UPEC UP12), Université de Bordeaux (UB), Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Centro de Investigación y Modelamiento de Fenómenos Aleatorios – Valparaíso (CIMFAV), Universidad de Valparaiso [Chile], CHU Bordeaux [Bordeaux], VRI, and Avalos, Marta
Subjects: longitudinal data, [STAT.AP]Statistics [stat]/Applications [stat.AP], [STAT.ME] Statistics [stat]/Methodology [stat.ME], [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], [STAT.CO] Statistics [stat]/Computation [stat.CO], [STAT.ML] Statistics [stat]/Machine Learning [stat.ML], machine learning, [STAT.AP] Statistics [stat]/Applications [stat.AP], complex data, [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie, genomics, [SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie, [STAT.CO]Statistics [stat]/Computation [stat.CO], [STAT.ME]Statistics [stat]/Methodology [stat.ME]
Abstract: International audience; The penalization of likelihoods by L1–norms has become a relatively standard technique for highdimensional data when the assumed models are based on n independent and identically distributed observations. These techniques should improve prediction accuracy (since regularization leads to variance reduction) together with interpretability (since sparsity identifies a subset of variables with strong effects). Computationally, these penalties are attractive and their theoretical properties have been intensively studied during the last years.Several authors have recently suggested analyzing high-dimensional clustered or longitudinal data using L1–penalization methods in mixed effects models. These approaches are mostly developed for variable selection purposes in linear and generalized linear mixed effects models and also, but less extensive, in parametric nonlinear mixed effects models. Only a few works have considered the problem of selecting nonlinear functions using L1–penalization methods in nonparametric mixed effects models, with additive or nonadditive predictors. Nonlinear functions are approximated by a linear combination of smooth functions (spline, wavelet or Fourier basis functions) possibly combined with more irregular functions (spiky basis functions). The resulting estimator depends only on a relatively small number of variables and/ora relatively small number of basis functions [1].In this study we illustrate the interest of such approaches in the analysis of the DALIA-1 longitudinal trial [2]. Eighteen HIV infected patients received vaccine injections at weeks 0, 4, 8 and 12. Antiretroviral treatment was interrupted at week 24. The patients were followed up to week 48, leading to 14 repeated measures per subject. Our aim was to predict the evolution of viral loads (continuous response) from the about 260 gene sets (predictors). The incorporation of the temporal effect is a key point to reach accurate predictions.References[1] Arribas-Gil, A. and Bertin, K. and Meza, C. and Rivoirard, V. (2012). LASSO-type estimators for Semiparametric Nonlinar Mixed-Effects Models Estimation, Statistics and Computing, 24 (3), 443-460.[2] Lévy, Y. and Thiébaut, R. and Montes, M. and Lacabaratz, C. and Sloan, L. and King, B. et al. (2014). Dendritic cell-based therapeutic vaccine elicits polyfunctional HIV-specific T-cell immunity associated with control of viral load. European journal of immunology, 44 (9), 2802-2810.
Published: 2016

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

183 results on '"Avalos, Marta"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources