Author: "Thierry Lecroq" / Journal: bmc bioinformatics - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Thierry Lecroq"' showing total 3 results

Start Over Author "Thierry Lecroq" Journal bmc bioinformatics

3 results on '"Thierry Lecroq"'

1. Improving high-resolution copy number variation analysis from next generation sequencing using unique molecular identifiers

Author: Pierre-Julien Viailly, Vincent Sater, Mathieu Viennot, Elodie Bohers, Nicolas Vergne, Caroline Berard, Hélène Dauchel, Thierry Lecroq, Alison Celebi, Philippe Ruminy, Vinciane Marchand, Marie-Delphine Lanic, Sydney Dubois, Dominique Penther, Hervé Tilly, Sylvain Mareschal, and Fabrice Jardin
Subjects: UMI, CNV calling, Next generation sequencing, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Background Recently, copy number variations (CNV) impacting genes involved in oncogenic pathways have attracted an increasing attention to manage disease susceptibility. CNV is one of the most important somatic aberrations in the genome of tumor cells. Oncogene activation and tumor suppressor gene inactivation are often attributed to copy number gain/amplification or deletion, respectively, in many cancer types and stages. Recent advances in next generation sequencing protocols allow for the addition of unique molecular identifiers (UMI) to each read. Each targeted DNA fragment is labeled with a unique random nucleotide sequence added to sequencing primers. UMI are especially useful for CNV detection by making each DNA molecule in a population of reads distinct. Results Here, we present molecular Copy Number Alteration (mCNA), a new methodology allowing the detection of copy number changes using UMI. The algorithm is composed of four main steps: the construction of UMI count matrices, the use of control samples to construct a pseudo-reference, the computation of log-ratios, the segmentation and finally the statistical inference of abnormal segmented breaks. We demonstrate the success of mCNA on a dataset of patients suffering from Diffuse Large B-cell Lymphoma and we highlight that mCNA results have a strong correlation with comparative genomic hybridization. Conclusion We provide mCNA, a new approach for CNV detection, freely available at https://gitlab.com/pierrejulien.viailly/mcna/ under MIT license. mCNA can significantly improve detection accuracy of CNV changes by using UMI.
Published: 2021
Full Text: View/download PDF

2. Matching health information seekers' queries to medical terms

Author: Thierry Lecroq, Zied Moalla, Stéfan Jacques Darmoni, Lina Fatima Soualmia, Élise Prieur-Gaston, Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes (LITIS), Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Université de Rouen Normandie (UNIROUEN), Normandie Université (NU)-Université Le Havre Normandie (ULH), and Normandie Université (NU)
Subjects: Matching (statistics), 020205 medical informatics, Computer science, Score, Information Storage and Retrieval, 02 engineering and technology, Query language, lcsh:Computer applications to medicine. Medical informatics, Biochemistry, Query expansion, Structural Biology, Online search, Controlled vocabulary, 0202 electrical engineering, electronic engineering, information engineering, Humans, lcsh:QH301-705.5, Molecular Biology, Language, Internet, Information retrieval, Applied Mathematics, Research, String (computer science), 3. Good health, Computer Science Applications, lcsh:Biology (General), Vocabulary, Controlled, [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR], lcsh:R858-859.7, 020201 artificial intelligence & image processing, Edit distance, Algorithms, Medical Informatics, Coding (social sciences)
Abstract: Background The Internet is a major source of health information but most seekers are not familiar with medical vocabularies. Hence, their searches fail due to bad query formulation. Several methods have been proposed to improve information retrieval: query expansion, syntactic and semantic techniques or knowledge-based methods. However, it would be useful to clean those queries which are misspelled. In this paper, we propose a simple yet efficient method in order to correct misspellings of queries submitted by health information seekers to a medical online search tool. Methods In addition to query normalizations and exact phonetic term matching, we tested two approximate string comparators: the similarity score function of Stoilos and the normalized Levenshtein edit distance. We propose here to combine them to increase the number of matched medical terms in French. We first took a sample of query logs to determine the thresholds and processing times. In the second run, at a greater scale we tested different combinations of query normalizations before or after misspelling correction with the retained thresholds in the first run. Results According to the total number of suggestions (around 163, the number of the first sample of queries), at a threshold comparator score of 0.3, the normalized Levenshtein edit distance gave the highest F-Measure (88.15%) and at a threshold comparator score of 0.7, the Stoilos function gave the highest F-Measure (84.31%). By combining Levenshtein and Stoilos, the highest F-Measure (80.28%) is obtained with 0.2 and 0.7 thresholds respectively. However, queries are composed by several terms that may be combination of medical terms. The process of query normalization and segmentation is thus required. The highest F-Measure (64.18%) is obtained when this process is realized before spelling-correction. Conclusions Despite the widely known high performance of the normalized edit distance of Levenshtein, we show in this paper that its combination with the Stoilos algorithm improved the results for misspelling correction of user queries. Accuracy is improved by combining spelling, phoneme-based information and string normalizations and segmentations into medical terms. These encouraging results have enabled the integration of this method into two projects funded by the French National Research Agency-Technologies for Health Care. The first aims to facilitate the coding process of clinical free texts contained in Electronic Health Records and discharge summaries, whereas the second aims at improving information retrieval through Electronic Health Records.
Published: 2012
Full Text: View/download PDF

3. Querying large read collections in main memory: a versatile data structure

Author: Eric Rivals, Mikaël Salson, Nicolas Philippe, Thérèse Commes, Thierry Lecroq, Martine Léonard, Méthodes et Algorithmes pour la Bioinformatique (MAB), Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), Bioinformatics and Sequence Analysis (BONSAI), Laboratoire d'Informatique Fondamentale de Lille (LIFL), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria), Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes (LITIS), Université Le Havre Normandie (ULH), Normandie Université (NU)-Normandie Université (NU)-Université de Rouen Normandie (UNIROUEN), Normandie Université (NU)-Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA), Centre de recherche en Biologie Cellulaire (CRBM), Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Université Montpellier 1 (UM1)-Université Montpellier 2 - Sciences et Techniques (UM2)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie), Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Université de Rouen Normandie (UNIROUEN), Normandie Université (NU)-Université Le Havre Normandie (ULH), and Normandie Université (NU)
Subjects: Computer science, 0206 medical engineering, [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS], Genomics, 02 engineering and technology, computer.software_genre, lcsh:Computer applications to medicine. Medical informatics, indexation, Genome, Biochemistry, DNA sequencing, Transcriptome, reads, genomic, 03 medical and health sciences, Structural Biology, Humans, structure, Throughput (business), lcsh:QH301-705.5, Molecular Biology, 030304 developmental biology, Epigenomics, 0303 health sciences, Information retrieval, algorithm, bioinformatic, Computers, Applied Mathematics, Methodology Article, Computational Biology, High-Throughput Nucleotide Sequencing, transcriptomic, Data structure, [SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM], Computer Science Applications, Index (publishing), lcsh:Biology (General), Metagenomics, NGS, lcsh:R858-859.7, Data mining, DNA microarray, [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], computer, 020602 bioinformatics, Algorithms, Software, Reference genome
Abstract: Background High Throughput Sequencing (HTS) is now heavily exploited for genome (re-) sequencing, metagenomics, epigenomics, and transcriptomics and requires different, but computer intensive bioinformatic analyses. When a reference genome is available, mapping reads on it is the first step of this analysis. Read mapping programs owe their efficiency to the use of involved genome indexing data structures, like the Burrows-Wheeler transform. Recent solutions index both the genome, and the k-mers of the reads using hash-tables to further increase efficiency and accuracy. In various contexts (e.g. assembly or transcriptome analysis), read processing requires to determine the sub-collection of reads that are related to a given sequence, which is done by searching for some k-mers in the reads. Currently, many developments have focused on genome indexing structures for read mapping, but the question of read indexing remains broadly unexplored. However, the increase in sequence throughput urges for new algorithmic solutions to query large read collections efficiently. Results Here, we present a solution, named Gk arrays, to index large collections of reads, an algorithm to build the structure, and procedures to query it. Once constructed, the index structure is kept in main memory and is repeatedly accessed to answer queries like "given a k-mer, get the reads containing this k-mer (once/at least once)". We compared our structure to other solutions that adapt uncompressed indexing structures designed for long texts and show that it processes queries fast, while requiring much less memory. Our structure can thus handle larger read collections. We provide examples where such queries are adapted to different types of read analysis (SNP detection, assembly, RNA-Seq). Conclusions Gk arrays constitute a versatile data structure that enables fast and more accurate read analysis in various contexts. The Gk arrays provide a flexible brick to design innovative programs that mine efficiently genomics, epigenomics, metagenomics, or transcriptomics reads. The Gk arrays library is available under Cecill (GPL compliant) license from http://www.atgc-montpellier.fr/ngs/.
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

3 results on '"Thierry Lecroq"'

1. Improving high-resolution copy number variation analysis from next generation sequencing using unique molecular identifiers

2. Matching health information seekers' queries to medical terms

3. Querying large read collections in main memory: a versatile data structure

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

Publisher

3 results on '"Thierry Lecroq"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources