5 results on '"Vandin, Fabio"'
Search Results
2. On the Sample Complexity of Cancer Pathways Identification.
- Author
-
Vandin, Fabio, Raphael, Benjamin J., and Upfal, Eli
- Subjects
- *
NUCLEOTIDE sequencing , *CANCER genetics , *SOMATIC mutation , *MACHINE learning , *GENOMICS - Abstract
Advances in DNA sequencing technologies have enabled large cancer sequencing studies, collecting somatic mutation data from a large number of cancer patients. One of the main goals of these studies is the identification of all cancer genes-genes associated with cancer. Its achievement is complicated by the extensive mutational heterogeneity of cancer, due to the fact that important mutations in cancer target combinations of genes (i.e., pathways). Recently, the pattern of mutual exclusivity among mutations in a cancer pathway has been observed, and methods that find significant combinations of cancer genes by detecting mutual exclusivity have been proposed. A key question in the analysis of mutual exclusivity is the computation of the minimum number of samples required to reliably find a meaningful set of mutually exclusive mutations in the data, or conclude that there is no such set. In general, the problem of determining the sample complexity, or the number of samples required to identify significant combinations of features, of genomic problems is largely unexplored. In this work we propose a framework to analyze the sample complexity of problems that arise in the study of genomic datasets. Our framework is based on tools from combinatorial analysis and statistical learning theory that have been used for the analysis of machine learning and probably approximately correct (PAC) learning. We use our framework to analyze the problem of the identification of cancer pathways through mutual exclusivity analysis. We analytically derive matching upper and lower bounds on the sample complexity of the problem, showing that sample sizes much larger than currently available may be required to identify all the cancer genes in a pathway. We also provide two algorithms to find a cancer pathway from a large genomic dataset. On simulated and cancer data, we show that our algorithms can be used to identify cancer pathways from large genomic datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
3. Simultaneous Inference of Cancer Pathways and Tumor Progression from Cross-Sectional Mutation Data.
- Author
-
Raphael, Benjamin J. and Vandin, Fabio
- Subjects
- *
SOMATIC mutation , *CANCER cells , *TUMORS , *NUCLEOTIDE sequence , *CANCER patients - Abstract
Recent cancer sequencing studies provide a wealth of somatic mutation data from a large number of patients. One of the most intriguing and challenging questions arising from this data is to determine whether the temporal order of somatic mutations in a cancer follows any common progression. Since we usually obtain only one sample from a patient, such inferences are commonly made from cross-sectional data from different patients. This analysis is complicated by the extensive variation in the somatic mutations across different patients, variation that is reduced by examining combinations of mutations in various pathways. Thus far, methods to reconstruct tumor progression at the pathway level have restricted attention to known, a priori defined pathways. In this work we show how to simultaneously infer pathways and the temporal order of their mutations from cross-sectional data, leveraging on the exclusivity property of driver mutations within a pathway. We define the pathway linear progression model, and derive a combinatorial formulation for the problem of finding the optimal model from mutation data. We show that with enough samples the optimal solution to this problem uniquely identifies the correct model with high probability even when errors are present in the mutation data. We then formulate the problem as an integer linear program (ILP), which allows the analysis of datasets from recent studies with large numbers of samples. We use our algorithm to analyze somatic mutation data from three cancer studies, including two studies from The Cancer Genome Atlas (TCGA) on large number of samples on colorectal cancer and glioblastoma. The models reconstructed with our method capture most of the current knowledge of the progression of somatic mutations in these cancer types, while also providing new insights on the tumor progression at the pathway level. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
4. Ballast: A Ball-based Algorithm for Structural Motifs.
- Author
-
He, Lu, Vandin, Fabio, Pandurangan, Gopal, and Bailey-Kellogg, Chris
- Subjects
- *
AMINO acid sequence , *PROTEIN structure , *MOLECULAR biology , *DATABASE evaluation , *COMPUTATIONAL complexity , *CHEMICAL warfare agents - Abstract
Structural motifs encapsulate local sequence-structure-function relationships characteristic of related proteins, enabling the prediction of functional characteristics of new proteins, providing molecular-level insights into how those functions are performed, and supporting the development of variants specifically maintaining or perturbing function in concert with other properties. Numerous computational methods have been developed to search through databases of structures for instances of specified motifs. However, it remains an open problem how best to leverage the local geometric and chemical constraints underlying structural motifs in order to develop motif-finding algorithms that are both theoretically and practically efficient. We present a simple, general, efficient approach, called B allast (ball-based algorithm for structural motifs), to match given structural motifs to given structures. B allast combines the best properties of previously developed methods, exploiting the composition and local geometry of a structural motif and its possible instances in order to effectively filter candidate matches. We show that on a wide range of motif-matching problems, B allast efficiently and effectively finds good matches, and we provide theoretical insights into why it works well. By supporting generic measures of compositional and geometric similarity, B allast provides a powerful substrate for the development of motif-matching algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
5. MADMX: A Strategy for Maximal Dense Motif Extraction.
- Author
-
GROSSI, ROBERTO, PIETRACAPRINA, ANDREA, PISANTI, NADIA, PUCCI, GEPPINO, UPFAL, ELI, and VANDIN, FABIO
- Published
- 2011
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.