150 results on '"Vandin, Fabio"'
Search Results
2. caSPiTa: mining statistically significant paths in time series data from an unknown network
- Author
-
Tonon, Andrea and Vandin, Fabio
- Published
- 2023
- Full Text
- View/download PDF
Catalog
3. gRosSo: mining statistically robust patterns from a sequence of datasets
- Author
-
Tonon, Andrea and Vandin, Fabio
- Published
- 2022
- Full Text
- View/download PDF
4. Identifying Drug Sensitivity Subnetworks with NETPHIX
- Author
-
Kim, Yoo-Ah, Sarto Basso, Rebecca, Wojtowicz, Damian, Liu, Amanda S., Hochbaum, Dorit S., Vandin, Fabio, and Przytycka, Teresa M.
- Published
- 2020
- Full Text
- View/download PDF
5. Efficient mining of the most significant patterns with permutation testing
- Author
-
Pellegrina, Leonardo and Vandin, Fabio
- Published
- 2020
- Full Text
- View/download PDF
6. ALLSTAR: inference of reliAble causaL ruLes between Somatic muTAtions and canceR phenotypes.
- Author
-
Simionato, Dario, Collesei, Antonio, Miglietta, Federica, and Vandin, Fabio
- Subjects
SOMATIC mutation ,NP-hard problems ,PHENOTYPES ,DNA sequencing ,CAUSAL inference - Abstract
Motivation Recent advances in DNA sequencing technologies have allowed the detailed characterization of genomes in large cohorts of tumors, highlighting their extreme heterogeneity, with no two tumors sharing the same complement of somatic mutations. Such heterogeneity hinders our ability to identify somatic mutations important for the disease, including mutations that determine clinically relevant phenotypes (e.g. cancer subtypes). Several tools have been developed to identify somatic mutations related to cancer phenotypes. However, such tools identify correlations between somatic mutations and cancer phenotypes, with no guarantee of highlighting causal relations. Results We describe ALLSTAR , a novel tool to infer reliable causal relations between somatic mutations and cancer phenotypes. ALLSTAR identifies reliable causal rules highlighting combinations of somatic mutations with the highest impact in terms of average effect on the phenotype. While we prove that the underlying computational problem is NP-hard, we develop a branch-and-bound approach that employs protein–protein interaction networks and novel bounds for pruning the search space, while properly correcting for multiple hypothesis testing. Our extensive experimental evaluation on synthetic data shows that our tool is able to identify reliable causal relations in large cancer cohorts. Moreover, the reliable causal rules identified by our tool in cancer data show that our approach identifies several somatic mutations known to be relevant for cancer phenotypes as well as novel biologically meaningful relations. Availability and implementation Code, data, and scripts to reproduce the experiments available at https://github.com/VandinLab/ALLSTAR. [ABSTRACT FROM AUTHOR] more...
- Published
- 2024
- Full Text
- View/download PDF
7. Enriched power of disease-concordant twin-case-only design in detecting interactions in genome-wide association studies
- Author
-
Li, Weilong, Baumbach, Jan, Mohammadnejad, Afsaneh, Brasch-Andersen, Charlotte, Vandin, Fabio, Korbel, Jan O., and Tan, Qihua
- Published
- 2019
- Full Text
- View/download PDF
8. SILVAN: Estimating Betweenness Centralities with Progressive Sampling and Non-uniform Rademacher Bounds.
- Author
-
Pellegrina, Leonardo and Vandin, Fabio
- Subjects
STATISTICAL learning ,APPROXIMATION algorithms - Abstract
"Sim Sala Bim!" —Silvan, https://en.wikipedia.org/wiki/Silvan%5f(illusionist) Betweenness centrality is a popular centrality measure with applications in several domains and whose exact computation is impractical for modern-sized networks. We present SILVAN, a novel, efficient algorithm to compute, with high probability, accurate estimates of the betweenness centrality of all nodes of a graph and a high-quality approximation of the top-k betweenness centralities. SILVAN follows a progressive sampling approach and builds on novel bounds based on Monte Carlo Empirical Rademacher Averages, a powerful and flexible tool from statistical learning theory. SILVAN relies on a novel estimation scheme providing non-uniform bounds on the deviation of the estimates of the betweenness centrality of all the nodes from their true values and a refined characterisation of the number of samples required to obtain a high-quality approximation. Our extensive experimental evaluation shows that SILVAN extracts high-quality approximations while outperforming, in terms of number of samples and accuracy, the state-of-the-art approximation algorithm with comparable quality guarantees. [ABSTRACT FROM AUTHOR] more...
- Published
- 2024
- Full Text
- View/download PDF
9. CoExpresso: assess the quantitative behavior of protein complexes in human cells
- Author
-
Chalabi, Morteza H., Tsiamis, Vasileios, Käll, Lukas, Vandin, Fabio, and Schwämmle, Veit
- Published
- 2019
- Full Text
- View/download PDF
10. Differentially mutated subnetworks discovery
- Author
-
Hajkarim, Morteza Chalabi, Upfal, Eli, and Vandin, Fabio
- Published
- 2019
- Full Text
- View/download PDF
11. Jllumina - A comprehensive Java-based API for statistical Illumina Infinium HumanMethylation450 and Infinium MethylationEPIC BeadChip data processing
- Author
-
Almeida Diogo, Skov Ida, Lund Jesper, Mohammadnejad Afsaneh, Silva Artur, Vandin Fabio, Tan Qihua, Baumbach Jan, and Röttger Richard
- Subjects
Biotechnology ,TP248.13-248.65 - Abstract
Measuring differential methylation of the DNA is the nowadays most common approach to linking epigenetic modifications to diseases (called epigenome-wide association studies, EWAS). For its low cost, its efficiency and easy handling, the Illumina HumanMethylation450 BeadChip and its successor, the Infinium MethylationEPIC BeadChip, is the by far most popular techniques for conduction EWAS in large patient cohorts. Despite the popularity of this chip technology, raw data processing and statistical analysis of the array data remains far from trivial and still lacks dedicated software libraries enabling high quality and statistically sound downstream analyses. As of yet, only R-based solutions are freely available for low-level processing of the Illumina chip data. However, the lack of alternative libraries poses a hurdle for the development of new bioinformatic tools, in particular when it comes to web services or applications where run time and memory consumption matter, or EWAS data analysis is an integrative part of a bigger framework or data analysis pipeline. We have therefore developed and implemented Jllumina, an open-source Java library for raw data manipulation of Illumina Infinium HumanMethylation450 and Infinium MethylationEPIC BeadChip data, supporting the developer with Java functions covering reading and preprocessing the raw data, down to statistical assessment, permutation tests, and identification of differentially methylated loci. Jllumina is fully parallelizable and publicly available at http://dimmer.compbio.sdu.dk/download.html more...
- Published
- 2016
- Full Text
- View/download PDF
12. Integrated Genomic Characterization of Papillary Thyroid Carcinoma
- Author
-
Agrawal, Nishant, Akbani, Rehan, Aksoy, B. Arman, Ally, Adrian, Arachchi, Harindra, Asa, Sylvia L., Auman, J. Todd, Balasundaram, Miruna, Balu, Saianand, Baylin, Stephen B., Behera, Madhusmita, Bernard, Brady, Beroukhim, Rameen, Bishop, Justin A., Black, Aaron D., Bodenheimer, Tom, Boice, Lori, Bootwalla, Moiz S., Bowen, Jay, Bowlby, Reanne, Bristow, Christopher A., Brookens, Robin, Brooks, Denise, Bryant, Robert, Buda, Elizabeth, Butterfield, Yaron S.N., Carling, Tobias, Carlsen, Rebecca, Carter, Scott L., Carty, Sally E., Chan, Timothy A., Chen, Amy Y., Cherniack, Andrew D., Cheung, Dorothy, Chin, Lynda, Cho, Juok, Chu, Andy, Chuah, Eric, Cibulskis, Kristian, Ciriello, Giovanni, Clarke, Amanda, Clayman, Gary L., Cope, Leslie, Copland, John A., Covington, Kyle, Danilova, Ludmila, Davidsen, Tanja, Demchok, John A., DiCara, Daniel, Dhalla, Noreen, Dhir, Rajiv, Dookran, Sheliann S., Dresdner, Gideon, Eldridge, Jonathan, Eley, Greg, El-Naggar, Adel K., Eng, Stephanie, Fagin, James A., Fennell, Timothy, Ferris, Robert L., Fisher, Sheila, Frazer, Scott, Frick, Jessica, Gabriel, Stacey B., Ganly, Ian, Gao, Jianjiong, Garraway, Levi A., Gastier-Foster, Julie M., Getz, Gad, Gehlenborg, Nils, Ghossein, Ronald, Gibbs, Richard A., Giordano, Thomas J., Gomez-Hernandez, Karen, Grimsby, Jonna, Gross, Benjamin, Guin, Ranabir, Hadjipanayis, Angela, Harper, Hollie A., Hayes, D. Neil, Heiman, David I., Herman, James G., Hoadley, Katherine A., Hofree, Matan, Holt, Robert A., Hoyle, Alan P., Huang, Franklin W., Huang, Mei, Hutter, Carolyn M., Ideker, Trey, Iype, Lisa, Jacobsen, Anders, Jefferys, Stuart R., Jones, Corbin D., Jones, Steven J.M., Kasaian, Katayoon, Kebebew, Electron, Khuri, Fadlo R., Kim, Jaegil, Kramer, Roger, Kreisberg, Richard, Kucherlapati, Raju, Kwiatkowski, David J., Ladanyi, Marc, Lai, Phillip H., Laird, Peter W., Lander, Eric, Lawrence, Michael S., Lee, Darlene, Lee, Eunjung, Lee, Semin, Lee, William, Leraas, Kristen M., Lichtenberg, Tara M., Lichtenstein, Lee, Lin, Pei, Ling, Shiyun, Liu, Jinze, Liu, Wenbin, Liu, Yingchun, LiVolsi, Virginia A., Lu, Yiling, Ma, Yussanne, Mahadeshwar, Harshad S., Marra, Marco A., Mayo, Michael, McFadden, David G., Meng, Shaowu, Meyerson, Matthew, Mieczkowski, Piotr A., Miller, Michael, Mills, Gordon, Moore, Richard A., Mose, Lisle E., Mungall, Andrew J., Murray, Bradley A., Nikiforov, Yuri E., Noble, Michael S., Ojesina, Akinyemi I., Owonikoko, Taofeek K., Ozenberger, Bradley A., Pantazi, Angeliki, Parfenov, Michael, Park, Peter J., Parker, Joel S., Paull, Evan O., Pedamallu, Chandra Sekhar, Perou, Charles M., Prins, Jan F., Protopopov, Alexei, Ramalingam, Suresh S., Ramirez, Nilsa C., Ramirez, Ricardo, Raphael, Benjamin J., Rathmell, W. Kimryn, Ren, Xiaojia, Reynolds, Sheila M., Rheinbay, Esther, Ringel, Matthew D., Rivera, Michael, Roach, Jeffrey, Robertson, A. Gordon, Rosenberg, Mara W., Rosenthal, Matthew, Sadeghi, Sara, Saksena, Gordon, Sander, Chris, Santoso, Netty, Schein, Jacqueline E., Schultz, Nikolaus, Schumacher, Steven E., Seethala, Raja R., Seidman, Jonathan, Senbabaoglu, Yasin, Seth, Sahil, Sharpe, Samantha, Shaw, Kenna R. Mills, Shen, John P., Shen, Ronglai, Sherman, Steven, Sheth, Margi, Shi, Yan, Shmulevich, Ilya, Sica, Gabriel L., Simons, Janae V., Sinha, Rileen, Sipahimalani, Payal, Smallridge, Robert C., Sofia, Heidi J., Soloway, Matthew G., Song, Xingzhi, Sougnez, Carrie, Stewart, Chip, Stojanov, Petar, Stuart, Joshua M., Sumer, S. Onur, Sun, Yichao, Tabak, Barbara, Tam, Angela, Tan, Donghui, Tang, Jiabin, Tarnuzzer, Roy, Taylor, Barry S., Thiessen, Nina, Thorne, Leigh, Thorsson, Vésteinn, Tuttle, R. Michael, Umbricht, Christopher B., Van Den Berg, David J., Vandin, Fabio, Veluvolu, Umadevi, Verhaak, Roel G.W., Vinco, Michelle, Voet, Doug, Walter, Vonn, Wang, Zhining, Waring, Scot, Weinberger, Paul M., Weinhold, Nils, Weinstein, John N., Weisenberger, Daniel J., Wheeler, David, Wilkerson, Matthew D., Wilson, Jocelyn, Williams, Michelle, Winer, Daniel A., Wise, Lisa, Wu, Junyuan, Xi, Liu, Xu, Andrew W., Yang, Liming, Yang, Lixing, Zack, Travis I., Zeiger, Martha A., Zeng, Dong, Zenklusen, Jean Claude, Zhao, Ni, Zhang, Hailei, Zhang, Jianhua, Zhang, Jiashan (Julia), Zhang, Wei, Zmuda, Erik, and Zou, Lihua more...
- Published
- 2014
- Full Text
- View/download PDF
13. Computational pan-genomics: status, promises and challenges
- Author
-
Marschall, Tobias, Marz, Manja, Abeel, Thomas, Dijkstra, Louis, Dutilh, Bas E, Ghaffaari, Ali, Kersey, Paul, Kloosterman, Wigard P, Mäkinen, Veli, Novak, Adam M, Paten, Benedict, Porubsky, David, Rivals, Eric, Alkan, Can, Baaijens, Jasmijn A, De Bakker, Paul I W, Boeva, Valentina, Bonnal, Raoul J P, Chiaromonte, Francesca, Chikhi, Rayan, Ciccarelli, Francesca D, Cijvat, Robin, Datema, Erwin, Van Duijn, Cornelia M, Eichler, Evan E, Ernst, Corinna, Eskin, Eleazar, Garrison, Erik, El-Kebir, Mohammed, Klau, Gunnar W, Korbel, Jan O, Lameijer, Eric-Wubbo, Langmead, Benjamin, Martin, Marcel, Medvedev, Paul, Mu, John C, Neerincx, Pieter, Ouwens, Klaasjan, Peterlongo, Pierre, Pisanti, Nadia, Rahmann, Sven, Raphael, Ben, Reinert, Knut, de Ridder, Dick, de Ridder, Jeroen, Schlesner, Matthias, Schulz-Trieglaff, Ole, Sanders, Ashley D, Sheikhizadeh, Siavash, Shneider, Carl, Smit, Sandra, Valenzuela, Daniel, Wang, Jiayin, Wessels, Lodewyk, Zhang, Ying, Guryev, Victor, Vandin, Fabio, Ye, Kai, and Schönhuth, Alexander more...
- Published
- 2018
- Full Text
- View/download PDF
14. Bounding the Family-Wise Error Rate in Local Causal Discovery using Rademacher Averages
- Author
-
Simionato, Dario and Vandin, Fabio
- Published
- 2022
15. SizeShiftReg: a Regularization Method for Improving Size-Generalization in Graph Neural Networks
- Author
-
Buffelli, Davide, Liò, Pietro, and Vandin, Fabio
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Machine Learning (cs.LG) - Abstract
In the past few years, graph neural networks (GNNs) have become the de facto model of choice for graph classification. While, from the theoretical viewpoint, most GNNs can operate on graphs of any size, it is empirically observed that their classification performance degrades when they are applied on graphs with sizes that differ from those in the training data. Previous works have tried to tackle this issue in graph classification by providing the model with inductive biases derived from assumptions on the generative process of the graphs, or by requiring access to graphs from the test domain. The first strategy is tied to the quality of the assumptions made for the generative process, and requires the use of specific models designed after the explicit definition of the generative process of the data, leaving open the question of how to improve the performance of generic GNN models in general settings. On the other hand, the second strategy can be applied to any GNN, but requires access to information that is not always easy to obtain. In this work we consider the scenario in which we only have access to the training data, and we propose a regularization strategy that can be applied to any GNN to improve its generalization capabilities from smaller to larger graphs without requiring access to the test data. Our regularization is based on the idea of simulating a shift in the size of the training graphs using coarsening techniques, and enforcing the model to be robust to such a shift. Experimental results on standard datasets show that popular GNN models, trained on the 50% smallest graphs in the dataset and tested on the 10% largest graphs, obtain performance improvements of up to 30% when trained with our regularization strategy., Accepted at NeurIPS 2022 more...
- Published
- 2022
16. Disease‐Concordant Twins Empower Genetic Association Studies
- Author
-
Tan, Qihua, Li, Weilong, and Vandin, Fabio
- Published
- 2017
- Full Text
- View/download PDF
17. Differentially Methylated Genomic Regions in Birth-Weight Discordant Twin Pairs
- Author
-
Chen, Mubo, Baumbach, Jan, Vandin, Fabio, Röttger, Richard, Barbosa, Eudes, Dong, Mingchui, Frost, Morten, Christiansen, Lene, and Tan, Qihua
- Published
- 2016
- Full Text
- View/download PDF
18. MCRapper: Monte-Carlo Rademacher Averages for Poset Families and Approximate Pattern Mining.
- Author
-
PELLEGRINA, LEONARDO, COUSINS, CYRUS, VANDIN, FABIO, and RIONDATO, MATTEO
- Subjects
PARTIALLY ordered sets ,STATISTICAL learning ,STATISTICAL power analysis - Abstract
We present MCRapper, an algorithm for efficient computation of Monte-Carlo Empirical Rademacher Averages (MCERA) for families of functions exhibiting poset (e.g., lattice) structure, such as those that arise in many pattern mining tasks. The MCERA allows us to compute upper bounds to the maximum deviation of sample means from their expectations, thus it can be used to find both (1) statistically-significant functions (i.e., patterns) when the available data is seen as a sample from an unknown distribution, and (2) approximations of collections of high-expectation functions (e.g., frequent patterns) when the available data is a small sample froma large dataset. This flexibility offered byMCRapper is a big advantage over previously proposed solutions, which could only achieve one of the two. MCRapper uses upper bounds to the discrepancy of the functions to efficiently explore and prune the search space, a technique borrowed from pattern mining itself. To show the practical use of MCRapper, we employ it to develop an algorithm TFP-R for the task of True Frequent Pattern (TFP) mining, by appropriately computing approximations of the negative and positive borders of the collection of patterns of interest, which allow an effective pruning of the pattern space and the computation of strong bounds to the supremum deviation. TFP-R gives guarantees on the probability of including any false positives (precision) and exhibits higher statistical power (recall) than existing methods offering the same guarantees. We evaluate MCRapper and TFP-R and show that they outperform the state-of-the-art for their respective tasks. [ABSTRACT FROM AUTHOR] more...
- Published
- 2022
- Full Text
- View/download PDF
19. Mutational landscape and significance across 12 major cancer types
- Author
-
Kandoth, Cyriac, McLellan, Michael D., Vandin, Fabio, Ye, Kai, Niu, Beifang, Lu, Charles, Xie, Mingchao, Zhang, Qunyuan, McMichael, Joshua F., Wyczalkowski, Matthew A., Leiserson, Mark D.M., Miller, Christopher A., Welch, John S., Walter, Matthew J., Wendl, Michael C., Ley, Timothy J., Wilson, Richard K., Raphael, Benjamin J., and Ding, Li more...
- Subjects
Gene mutations -- Research ,Genetic research ,Cancer -- Genetic aspects ,Environmental issues ,Science and technology ,Zoology and wildlife conservation - Abstract
The Cancer Genome Atlas (TCGA) has used the latest sequencing and analysis methods to identify somatic variants across thousands of tumours. Here we present data and analytical results for point mutations and small insertions/deletions from 3,281 tumours across 12 tumour types as part of the TCGA Pan-Cancer effort. We illustrate the distributions of mutation frequencies, types and contexts across tumour types, and establish their links to tissues of origin, environmental/ carcinogen influences, and DNA repair defects. Using the integrated data sets, we identified 127 significantly mutated genes from well-known (for example, mitogen-activatedprotein kinase, phosphatidylinositol-3-OH kinase, Wnt/β-catenin and receptor tyrosine kinase signalling pathways, and cell cycle control) and emerging (for example, histone, histone modification, splicing, metabolism and proteolysis) cellular processes in cancer. The average number of mutations in these significantly mutated genes varies across tumour types; most tumours have two to six, indicating that the number of driver mutations required during oncogenesis is relatively small. Mutations in transcriptional factors/regulators show tissue specificity, whereas histone modifiers are often mutated across several cancer types. Clinical association analysis identifies genes having a significant effect on survival, and investigations of mutations with respect to clonal/subclonal architecture delineate their temporal orders during tumorigenesis. Taken together, these results lay the groundwork for developing new diagnostics and individualizing cancer treatment., The advancement of DNA sequencing technologies now enables the processing of thousands of tumours of many types for systematic mutation discovery. This expansion of scope, coupled with appreciable progress in [...] more...
- Published
- 2013
- Full Text
- View/download PDF
20. The mutational landscape of lethal castration-resistant prostate cancer
- Author
-
Grasso, Catherine S., Wu, Yi-Mi, Robinson, Dan R., Cao, Xuhong, Dhanasekaran, Saravana M., Khan, Amjad P., Quist, Michael J., Jing, Xiaojun, Lonigro, Robert J., Brenner, J. Chad, Asangani, Irfan A., Ateeq, Bushra, Chun, Sang Y., Siddiqui, Javed, Sam, Lee, Anstett, Matt, Mehra, Rohit, Prensner, John R., Palanisamy, Nallasivam, Ryslik, Gregory A., Vandin, Fabio, Raphael, Benjamin J., Kunju, Lakshmi P., Rhodes, Daniel R., Pienta, Kenneth J., Chinnaiyan, Arul M., and Tomlins, Scott A. more...
- Subjects
Gene mutations -- Health aspects -- Research ,Prostate cancer -- Development and progression -- Genetic aspects -- Research ,Environmental issues ,Science and technology ,Zoology and wildlife conservation - Abstract
Characterization of the prostate cancer transcriptome and genome has identified chromosomal rearrangements and copy number gains and losses, including ETS gene family fusions, PTEN loss and androgen receptor (AR) amplification, [...] more...
- Published
- 2012
21. Mining top-K frequent itemsets through progressive sampling
- Author
-
Pietracaprina, Andrea, Riondato, Matteo, Upfal, Eli, and Vandin, Fabio
- Published
- 2010
- Full Text
- View/download PDF
22. SPRISS: Approximating Frequent $k$-mers by Sampling Reads, and Applications
- Author
-
Santoro, Diego, Pellegrina, Leonardo, and Vandin, Fabio
- Subjects
FOS: Biological sciences ,Quantitative Biology - Quantitative Methods ,Quantitative Methods (q-bio.QM) - Abstract
The extraction of $k$-mers is a fundamental component in many complex analyses of large next-generation sequencing datasets, including reads classification in genomics and the characterization of RNA-seq datasets. The extraction of all $k$-mers and their frequencies is extremely demanding in terms of running time and memory, owing to the size of the data and to the exponential number of $k$-mers to be considered. However, in several applications, only frequent $k$-mers, which are $k$-mers appearing in a relatively high proportion of the data, are required by the analysis. In this work we present SPRISS, a new efficient algorithm to approximate frequent $k$-mers and their frequencies in next-generation sequencing data. SPRISS employs a simple yet powerful reads sampling scheme, which allows to extract a representative subset of the dataset that can be used, in combination with any $k$-mer counting algorithm, to perform downstream analyses in a fraction of the time required by the analysis of the whole data, while obtaining comparable answers. Our extensive experimental evaluation demonstrates the efficiency and accuracy of SPRISS in approximating frequent $k$-mers, and shows that it can be used in various scenarios, such as the comparison of metagenomic datasets and the identification of discriminative $k$-mers, to extract insights in a fraction of the time required by the analysis of the whole dataset., Accepted to RECOMB 2021 more...
- Published
- 2021
23. Discovering significant evolutionary trajectories in cancer phylogenies.
- Author
-
Pellegrina, Leonardo and Vandin, Fabio
- Subjects
- *
INTERNET servers , *GENE regulatory networks , *ACUTE myeloid leukemia , *ARBORETUMS - Abstract
Motivation Tumors are the result of a somatic evolutionary process leading to substantial intra-tumor heterogeneity. Single-cell and multi-region sequencing enable the detailed characterization of the clonal architecture of tumors and have highlighted its extensive diversity across tumors. While several computational methods have been developed to characterize the clonal composition and the evolutionary history of tumors, the identification of significantly conserved evolutionary trajectories across tumors is still a major challenge. Results We present a new algorithm, MAximal tumor treeS TRajectOries (MASTRO), to discover significantly conserved evolutionary trajectories in cancer. MASTRO discovers all conserved trajectories in a collection of phylogenetic trees describing the evolution of a cohort of tumors, allowing the discovery of conserved complex relations between alterations. MASTRO assesses the significance of the trajectories using a conditional statistical test that captures the coherence in the order in which alterations are observed in different tumors. We apply MASTRO to data from nonsmall-cell lung cancer bulk sequencing and to acute myeloid leukemia data from single-cell panel sequencing, and find significant evolutionary trajectories recapitulating and extending the results reported in the original studies. Availability and implementation MASTRO is available at https://github.com/VandinLab/MASTRO. Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR] more...
- Published
- 2022
- Full Text
- View/download PDF
24. Gene selection for optimal prediction of cell position in tissues from single-cell transcriptomics data
- Author
-
Jovan, Tanevski, Thin, Nguyen, Buu, Truong, Nikos, Karaiskos, Mehmet Eren Ahsen, Xinyu, Zhang, Chang, Shu, Ke, Xu, Xiaoyu, Liang, Ying, Hu, Hoang VV Pham, Xiaomei, Li, Thuc, D Le, Adi, L Tarca, Gaurav, Bhatti, Roberto, Romero, Nestoras, Karathanasis, Phillipe, Loher, Yang, Chen, Zhengqing, Ouyang, Disheng, Mao, Yuping, Zhang, Maryam, Zand, Jianhua, Ruan, Christoph, Hafemeister, Peng, Qiu, Duc, Tran, Tin, Nguyen, Attila, Gabor, Thomas, Yu, Justin, Guinney, Enrico, Glaab, Roland, Krause, Peter, Banda, DREAM SCTC Consortium, Baruzzo, Giacomo, Cappellato, Marco, Zorzan, Irene, DEL FAVERO, Simone, Schenato, Luca, Vandin, Fabio, DI CAMILLO, Barbara, Shruti, Gupta, Ajay Kumar Verma, Shandar, Ahmad, Ronesh, Sharma, Edwin, Vans, Alok, Sharma, Ashwini, Patil, Alejandra, Carrea, Alonso, Andres M., Luis, Diambra, Vijay, Narsapuram, Vinay, Kaikala, Chaitanyam, Potnuru, Sunil, Kumar, Jiajie, Peng, Xiaoyu, Wang, Xuequn, Shang, Dani, Livne, Tom, Snir, Hagit, Philip, Alona, Zilberberg, Sol, Efroni, Hamid Reza Hassanzadeh, Reihaneh, Hassanzadeh, Ghazal, Jahanshahi, M-Mahdi, Naddaf-Sh, Drayer, Phillip M., Sadra, Naddaf-Sh, Marouen Ben Guebila, Changlin, Wan, Yuchen, Cao, Saber, Meamardoost, Nan Papili Gao, Rudiyanto, Gunawan, Gustavo, Stolovitzky, Nikolaus, Rajewsky, Julio, Saez-Rodriguez, Pablo, Meyer, Tanevski, Jovan, Nguyen, Thin, Truong, Buu, Karaiskos, Nikos, Pham, Hoang Vv, Xiaomei, Li, Le, Thuc D, Meyer, Pablo, and Dream SCTC consortuim more...
- Subjects
Health, Toxicology and Mutagenesis ,Cell ,Plant Science ,In situ hybridization ,Computational biology ,Biology ,Biochemistry, Genetics and Molecular Biology (miscellaneous) ,Transcriptome ,03 medical and health sciences ,Spatial reconstruction ,0302 clinical medicine ,mental disorders ,Databases, Genetic ,medicine ,Animals ,Gene Regulatory Networks ,Gene ,Spatial analysis ,Spatial organization ,Research Articles ,Zebrafish ,030304 developmental biology ,0303 health sciences ,Spatial Analysis ,Ecology ,Sequence Analysis, RNA ,Gene Expression Profiling ,RNA-seq technologies ,Computational Biology ,Gene Expression Regulation, Developmental ,Gene selection ,medicine.anatomical_structure ,Cardiovascular and Metabolic Diseases ,Single-cell RNA-sequencing (scRNAseq) ,Drosophila ,Single-Cell Analysis ,030217 neurology & neurosurgery ,psychological phenomena and processes ,Algorithms ,gene selection ,Research Article ,Forecasting - Abstract
We describe and provide an array of diverse methods to predict cellular positions in tissue from RNA-seq data selecting mapping genes according to their spatial/statistical properties and their effect on improving the cell positioning., Single-cell RNA-sequencing (scRNAseq) technologies are rapidly evolving. Although very informative, in standard scRNAseq experiments, the spatial organization of the cells in the tissue of origin is lost. Conversely, spatial RNA-seq technologies designed to maintain cell localization have limited throughput and gene coverage. Mapping scRNAseq to genes with spatial information increases coverage while providing spatial location. However, methods to perform such mapping have not yet been benchmarked. To fill this gap, we organized the DREAM Single-Cell Transcriptomics challenge focused on the spatial reconstruction of cells from the Drosophila embryo from scRNAseq data, leveraging as silver standard, genes with in situ hybridization data from the Berkeley Drosophila Transcription Network Project reference atlas. The 34 participating teams used diverse algorithms for gene selection and location prediction, while being able to correctly localize clusters of cells. Selection of predictor genes was essential for this task. Predictor genes showed a relatively high expression entropy, high spatial clustering and included prominent developmental genes such as gap and pair-rule genes and tissue markers. Application of the top 10 methods to a zebra fish embryo dataset yielded similar performance and statistical properties of the selected genes than in the Drosophila data. This suggests that methods developed in this challenge are able to extract generalizable properties of genes that are useful to accurately reconstruct the spatial arrangement of cells in tissues. more...
- Published
- 2020
25. Comprehensive molecular characterization of clear cell renal cell carcinoma
- Author
-
Creighton, Chad J., Morgan, Margaret, Gunaratne, Preethi H., Wheeler, David A., Gibbs, Richard A., Robertson, Gordon A., Chu, Andy, Beroukhim, Rameen, Cibulskis, Kristian, Signoretti, Sabina, Hsin-Ta Wu, Fabio Vandin, Raphael, Benjamin J., Verhaak, Roel G. W., Tamboli, Pheroze, Torres-Garcia, Wandaliz, Akbani, Rehan, Weinstein, John N., Reuter, Victor, Hsieh, James J., Brannon, Rose A., Ari Hakimi, A., Jacobsen, Anders, Ciriello, Giovanni, Reva, Boris, Ricketts, Christopher J., Linehan, Marston W., Stuart, Joshua M., Rathmell, Kimryn W., Shen, Hui, Laird, Peter W., Muzny, Donna, Davis, Caleb, Xi, Liu, Chang, Kyle, Kakkar, Nipun, Treviño, Lisa R., Benton, Susan, Reid, Jeffrey G., Morton, Donna, Doddapaneni, Harsha, Han, Yi, Lewis, Lora, Dinh, Huyen, Kovar, Christie, Zhu, Yiming, Santibanez, Jireh, Wang, Min, Hale, Walker, Kalra, Divya, Getz, Gad, Lawrence, Michael S., Sougnez, Carrie, Carter, Scott L., Sivachenko, Andrey, Lichtenstein, Lee, Stewart, Chip, Voet, Doug, Fisher, Sheila, Gabriel, Stacey B., Lander, Eric, Schumacher, Steve E., Tabak, Barbara, Saksena, Gordon, Onofrio, Robert C., Cherniack, Andrew D., Gentry, Jeff, Ardlie, Kristin, Meyerson, Matthew, Chun, Hye-Jung E., Mungall, Andrew J., Sipahimalani, Payal, Stoll, Dominik, Ally, Adrian, Balasundaram, Miruna, Butterfield, Yaron S. N., Carlsen, Rebecca, Carter, Candace, Chuah, Eric, Coope, Robin J. N., Dhalla, Noreen, Gorski, Sharon, Guin, Ranabir, Hirst, Carrie, Hirst, Martin, Holt, Robert A., Lebovitz, Chandra, Lee, Darlene, Li, Haiyan I., Mayo, Michael, Moore, Richard A., Pleasance, Erin, Plettner, Patrick, Schein, Jacqueline E., Shafiei, Arash, Slobodan, Jared R., Tam, Angela, Thiessen, Nina, Varhol, Richard J., Wye, Natasja, Zhao, Yongjun, Birol, Inanc, Jones, Steven J. M., Marra, Marco A., Auman, Todd J., Tan, Donghui, Jones, Corbin D., Hoadley, Katherine A., Mieczkowski, Piotr A., Mose, Lisle E., Jefferys, Stuart R., Topal, Michael D., Liquori, Christina, Turman, Yidi J., Shi, Yan, Waring, Scot, Buda, Elizabeth, Walsh, Jesse, Wu, Junyuan, Bodenheimer, Tom, Hoyle, Alan P., Simons, Janae V., Soloway, Mathew G., Balu, Saianand, Parker, Joel S., Hayes, Neil D., Perou, Charles M., Kucherlapati, Raju, Park, Peter, Triche, Timothy, Jr, Weisenberger, Daniel J., Lai, Phillip H., Bootwalla, Moiz S., Maglinte, Dennis T., Mahurkar, Swapna, Berman, Benjamin P., Van Den Berg, David J., Cope, Leslie, Baylin, Stephen B., Noble, Michael S., DiCara, Daniel, Zhang, Hailei, Cho, Juok, Heiman, David I., Gehlenborg, Nils, Mallard, William, Lin, Pei, Frazer, Scott, Stojanov, Petar, Liu, Yingchun, Zhou, Lihua, Kim, Jaegil, Chin, Lynda, Vandin, Fabio, Wu, Hsin-Ta, Benz, Christopher, Yau, Christina, Reynolds, Sheila M., Shmulevich, Ilya, Verhaak, Roel G.W., Vegesna, Rahul, Kim, Hoon, Zhang, Wei, Cogdell, David, Jonasch, Eric, Ding, Zhiyong, Lu, Yiling, Zhang, Nianxiang, Unruh, Anna K., Casasent, Tod D., Wakefield, Chris, Tsavachidou, Dimitra, Mills, Gordon B., Schultz, Nikolaus, Antipin, Yevgeniy, Gao, Jianjiong, Cerami, Ethan, Gross, Benjamin, Aksoy, Arman B., Sinha, Rileen, Weinhold, Nils, Sumer, Onur S., Taylor, Barry S., Shen, Ronglai, Ostrovnaya, Irina, Berger, Michael F., Ladanyi, Marc, Sander, Chris, Fei, Suzanne S., Stout, Andrew, Spellman, Paul T., Rubin, Daniel L., Liu, Tiffany T., Ng, Sam, Paull, Evan O., Carlin, Daniel, Goldstein, Theodore, Waltman, Peter, Ellrott, Kyle, Zhu, Jing, Haussler, David, Xiao, Weimin, Shelton, Candace, Gardner, Johanna, Penny, Robert, Sherman, Mark, Mallery, David, Morris, Scott, Paulauskis, Joseph, Burnett, Ken, Shelton, Troy, Kaelin, William G., Choueiri, Toni, Atkins, Michael B., Curley, Erin, Tickoo, Satish, Thorne, Leigh, Boice, Lori, Huang, Mei, Fisher, Jennifer C., Vocke, Cathy D., Peterson, James, Worrell, Robert, Merino, Maria J., Schmidt, Laura S., Czerniak, Bogdan A., Aldape, Kenneth D., Wood, Christopher G., Boyd, Jeff, Weaver, JoEllen, Iacocca, Mary V., Petrelli, Nicholas, Witkin, Gary, Brown, Jennifer, Czerwinski, Christine, Huelsenbeck-Dill, Lori, Rabeno, Brenda, Myers, Jerome, Morrison, Carl, Bergsten, Julie, Eckman, John, Harr, Jodi, Smith, Christine, Tucker, Kelinda, Zach, Leigh Anne, Bshara, Wiam, Gaudioso, Carmelo, Dhir, Rajiv, Maranchie, Jodi, Nelson, Joel, Parwani, Anil, Potapova, Olga, Fedosenko, Konstantin, Cheville, John C., Thompson, Houston R., Mosquera, Juan M., Rubin, Mark A., Blute, Michael L., Pihl, Todd, Jensen, Mark, Sfeir, Robert, Kahn, Ari, Chu, Anna, Kothiyal, Prachi, Snyder, Eric, Pontius, Joan, Ayala, Brenda, Backus, Mark, Walton, Jessica, Baboud, Julien, Berton, Dominique, Nicholls, Matthew, Srinivasan, Deepak, Raman, Rohini, Girshik, Stanley, Kigonya, Peter, Alonso, Shelley, Sanbhadti, Rashmi, Barletta, Sean, Pot, David, Sheth, Margi, Demchok, John A., Davidsen, Tanja, Wang, Zhining, Yang, Liming, Tarnuzzer, Roy W., Zhang, Jiashan, Eley, Greg, Ferguson, Martin L., Mills Shaw, Kenna R., Guyer, Mark S., Ozenberger, Bradley A., and Sofia, Heidi J. more...
- Published
- 2013
- Full Text
- View/download PDF
26. SPRISS: approximating frequent k-mers by sampling reads, and applications.
- Author
-
Santoro, Diego, Pellegrina, Leonardo, Comin, Matteo, and Vandin, Fabio
- Subjects
SINGLE nucleotide polymorphisms ,NUCLEOTIDE sequencing ,FRACTIONS - Abstract
Motivation The extraction of k -mers is a fundamental component in many complex analyses of large next-generation sequencing datasets, including reads classification in genomics and the characterization of RNA-seq datasets. The extraction of all k -mers and their frequencies is extremely demanding in terms of running time and memory, owing to the size of the data and to the exponential number of k -mers to be considered. However, in several applications, only frequent k -mers, which are k -mers appearing in a relatively high proportion of the data, are required by the analysis. Results In this work, we present SPRISS, a new efficient algorithm to approximate frequent k -mers and their frequencies in next-generation sequencing data. SPRISS uses a simple yet powerful reads sampling scheme, which allows to extract a representative subset of the dataset that can be used, in combination with any k -mer counting algorithm, to perform downstream analyses in a fraction of the time required by the analysis of the whole data, while obtaining comparable answers. Our extensive experimental evaluation demonstrates the efficiency and accuracy of SPRISS in approximating frequent k -mers, and shows that it can be used in various scenarios, such as the comparison of metagenomic datasets, the identification of discriminative k -mers, and SNP (single nucleotide polymorphism) genotyping, to extract insights in a fraction of the time required by the analysis of the whole dataset. Availability and implementation SPRISS [a preliminary version (Santoro et al. , 2021) of this work was presented at RECOMB 2021] is available at https://github.com/VandinLab/SPRISS. Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR] more...
- Published
- 2022
- Full Text
- View/download PDF
27. Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine
- Author
-
Raphael, Benjamin J, Dobson, Jason R, Oesper, Layla, and Vandin, Fabio
- Published
- 2014
- Full Text
- View/download PDF
28. Principles of Systems Biology, No. 31
- Author
-
Cho, Hyunghoon, Berger, Bonnie, Peng, Jian, Galitzine, Cyril, Vitek, Olga, Beltran, Pierre M. Jean, Cristea, Ileana M., Görtler, Franziska, Solbrig, Stefan, Wettig, Tilo, Oefner, Peter J., Spang, Rainer, Altenbuchinger, Michael, Basso, Rebecca Sarto, Hochbaum, Dorit, Vandin, Fabio, Silverbush, Dana, Cristea, Simona, Yanovich, Gali, Geiger, Tamar, Beerenwinkel, Niko, Sharan, Roded, Zhou, Zhemin, Luhmann, Nina, Alikhan, Nabil-Fareed, and Achtman, Mark more...
- Published
- 2018
- Full Text
- View/download PDF
29. Finding driver pathways in cancer: models and algorithms
- Author
-
Vandin Fabio, Upfal Eli, and Raphael Benjamin J
- Subjects
Cancer ,Somatic Mutations ,Driver mutations ,Pathways ,Background mutation rate ,Generative models ,Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract Background Cancer sequencing projects are now measuring somatic mutations in large numbers of cancer genomes. A key challenge in interpreting these data is to distinguish driver mutations, mutations important for cancer development, from passenger mutations that have accumulated in somatic cells but without functional consequences. A common approach to identify genes harboring driver mutations is a single gene test that identifies individual genes that are recurrently mutated in a significant number of cancer genomes. However, the power of this test is reduced by: (1) the necessity of estimating the background mutation rate (BMR) for each gene; (2) the mutational heterogeneity in most cancers meaning that groups of genes (e.g. pathways), rather than single genes, are the primary target of mutations. Results We investigate the problem of discovering driver pathways, groups of genes containing driver mutations, directly from cancer mutation data and without prior knowledge of pathways or other interactions between genes. We introduce two generative models of somatic mutations in cancer and study the algorithmic complexity of discovering driver pathways in both models. We show that a single gene test for driver genes is highly sensitive to the estimate of the BMR. In contrast, we show that an algorithmic approach that maximizes a straightforward measure of the mutational properties of a driver pathway successfully discovers these groups of genes without an estimate of the BMR. Moreover, this approach is also successful in the case when the observed frequencies of passenger and driver mutations are indistinguishable, a situation where single gene tests fail. Conclusions Accurate estimation of the BMR is a challenging task. Thus, methods that do not require an estimate of the BMR, such as the ones we provide here, can give increased power for the discovery of driver genes. more...
- Published
- 2012
- Full Text
- View/download PDF
30. Attention-Based Deep Learning Framework for Human Activity Recognition With User Adaptation.
- Author
-
Buffelli, Davide and Vandin, Fabio
- Abstract
Sensor-based human activity recognition (HAR) requires to predict the action of a person based on sensor-generated time series data. HAR has attracted major interest in the past few years, thanks to the large number of applications enabled by modern ubiquitous computing devices. While several techniques based on hand-crafted feature engineering have been proposed, the current state-of-the-art is represented by deep learning architectures that automatically obtain high level representations and that use recurrent neural networks (RNNs) to extract temporal dependencies in the input. RNNs have several limitations, in particular in dealing with long-term dependencies. We propose a novel deep learning framework, TrASenD, based on a purely attention-based mechanism, that overcomes the limitations of the state-of-the-art. We show that our proposed attention-based architecture is considerably more powerful than previous approaches, with an average increment, of more than 7% on the F1 score over the previous best performing model. Furthermore, we consider the problem of personalizing HAR deep learning models, which is of great importance in several applications. We propose a simple and effective transfer-learning based strategy to adapt a model to a specific user, providing an average increment of 6% on the F1 score on the predictions for that user. Our extensive experimental evaluation proves the significantly superior capabilities of our proposed framework over the current state-of-the-art and the effectiveness of our user adaptation technique. [ABSTRACT FROM AUTHOR] more...
- Published
- 2021
- Full Text
- View/download PDF
31. MiSoSouP
- Author
-
Riondato, Matteo and Vandin, Fabio
- Published
- 2018
32. Comparison of microbiome samples: methods and computational challenges.
- Author
-
Comin, Matteo, Camillo, Barbara Di, Pizzi, Cinzia, and Vandin, Fabio
- Subjects
METAGENOMICS ,NUCLEOTIDE sequencing ,SAMPLING methods ,PHENOTYPES ,GENOMES ,MICROBIAL communities - Abstract
The study of microbial communities crucially relies on the comparison of metagenomic next-generation sequencing data sets, for which several methods have been designed in recent years. Here, we review three key challenges in the comparison of such data sets: species identification and quantification, the efficient computation of distances between metagenomic samples and the identification of metagenomic features associated with a phenotype such as disease status. We present current solutions for such challenges, considering both reference-based methods relying on a database of reference genomes and reference-free methods working directly on all sequencing reads from the samples. [ABSTRACT FROM AUTHOR] more...
- Published
- 2021
- Full Text
- View/download PDF
33. MiSoSouP: Mining Interesting Subgroups with Sampling and Pseudodimension.
- Author
-
RIONDATO, MATTEO and VANDIN, FABIO
- Subjects
STATISTICAL learning ,LINGUISTICS ,STATISTICAL sampling - Abstract
We present MiSoSouP, a suite of algorithms for extracting high-quality approximations of the most interesting subgroups, according to different popular interestingness measures, from a random sample of a transactional dataset. We describe a new formulation of these measures as functions of averages, that makes it possible to approximate them using sampling. We then discuss how pseudodimension, a key concept from statistical learning theory, relates to the sample size needed to obtain an high-quality approximation of the most interesting subgroups. We prove an upper bound on the pseudodimension of the problem at hand, which depends on characteristic quantities of the dataset and of the language of patterns of interest. This upper bound then leads to small sample sizes. Our evaluation on real datasets shows that MiSoSouP outperforms state-of-the-art algorithms offering the same guarantees, and it vastly speeds up the discovery of subgroups w.r.t. analyzing the whole dataset. [ABSTRACT FROM AUTHOR] more...
- Published
- 2020
- Full Text
- View/download PDF
34. Fast Approximation of Frequent k-Mers and Applications to Metagenomics.
- Author
-
Pellegrina, Leonardo, Pizzi, Cinzia, and Vandin, Fabio
- Published
- 2020
- Full Text
- View/download PDF
35. An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets.
- Author
-
Kirsch, Adam, Mitzenmacher, Michael, Pietracaprina, Andrea, Pucci, Geppino, Upfal, Eli, and Vandin, Fabio
- Subjects
ALGORITHMS ,STATISTICAL significance ,PATTERN recognition systems ,DATA mining ,FALSE discovery rate ,DATABASE searching - Abstract
As advances in technology allow for the collection, storage, and analysis of vast amounts of data, the task of screening and assessing the significance of discovered patterns is becoming a major challenge in data mining applications. In this work, we address significance in the context of frequent itemset mining. Specifically, we develop a novel methodology to identify a meaningful support threshold s* for a dataset, such that the number of itemsets with support at least s* represents a substantial deviation from what would be expected in a random dataset with the same number of transactions and the same individual item frequencies. These itemsets can then be flagged as statistically significant with a small false discovery rate. We present extensive experimental results to substantiate the effectiveness of our methodology. [ABSTRACT FROM AUTHOR] more...
- Published
- 2012
- Full Text
- View/download PDF
36. Technical Perspective Evaluating Sampled Metrics Is Challenging.
- Author
-
Vandin, Fabio
- Subjects
- *
SOFTWARE measurement , *RECOMMENDER systems - Abstract
An introduction is offered to an article in sampling metrics in recommender systems.
- Published
- 2022
- Full Text
- View/download PDF
37. Finding mutated subnetworks associated with survival in cancer
- Author
-
Hansen, Tommy and Vandin, Fabio
- Subjects
FOS: Biological sciences ,Quantitative Biology - Quantitative Methods ,Quantitative Methods (q-bio.QM) - Abstract
Next-generation sequencing technologies allow the measurement of somatic mutations in a large number of patients from the same cancer type. One of the main goals in analyzing these mutations is the identification of mutations associated with clinical parameters, such as survival time. This goal is hindered by the genetic heterogeneity of mutations in cancer, due to the fact that genes and mutations act in the context of pathways. To identify mutations associated with survival time it is therefore crucial to study mutations in the context of interaction networks. In this work we study the problem of identifying subnetworks of a large gene-gene interaction network that have mutations associated with survival. We formally define the associated computational problem by using a score for subnetworks based on the test statistic of the log-rank test, a widely used statistical test for comparing the survival of two populations. We show that the computational problem is NP-hard and we propose a novel algorithm, called Network of Mutations Associated with Survival (NoMAS), to solve it. NoMAS is based on the color-coding technique, that has been previously used in other applications to find the highest scoring subnetwork with high probability when the subnetwork score is additive. In our case the score is not additive; nonetheless, we prove that under a reasonable model for mutations in cancer NoMAS does identify the optimal solution with high probability. We test NoMAS on simulated and cancer data, comparing it to approaches based on single gene tests and to various greedy approaches. We show that our method does indeed find the optimal solution and performs better than the other approaches. Moreover, on two cancer datasets our method identifies subnetworks with significant association to survival when none of the genes has significant association with survival when considered in isolation., Comment: This paper was selected for oral presentation at RECOMB 2016 and an abstract is published in the conference proceedings more...
- Published
- 2016
38. Erratum to:CoMEt: A statistical approach to identify combinations of mutually exclusive alterations in cancer [Genome Biol., 16, (2015), (160)]
- Author
-
Leiserson, Mark D.M., Wu, Hsin Ta, Vandin, Fabio, and Raphael, Benjamin J.
- Abstract
After the publication of this work [1] it has been brought to our attention that the descriptions of the generation of the two simulated datasets were confusing. In the section 'Benchmarking of methods for individual gene sets', the first sentence of the second paragraph should specify the number of genes included in the gene set as 100, and it should read: "We compared CoMEt to the other methods on datasets with m= 100 genes and n = 500 samples and with implanted pathways with coverages ? ranging from 0.1 to 1.0." In the section 'Benchmarking identification of collections of gene sets', the first paragraph should specify that genes mutated in fewer than 1% of total samples (that is in fewer than 5 out of 500 samples) were removed from the simulation, and the sixth sentence in this paragraph should read: "Third, we include m= 20,000 genes and remove those genes that are mutated in fewer than 1% of total samples (that is in fewer than 5 out of 500 samples) (Additional file 1: Figure S2)." Additionally, these descriptions are also included in the Additional file 1, section S3, and should read: "We generated two different versions of the simulated datasets, depending on whether we implanted a single or multiple gene sets. We describe the method for generating datasets with multiple implanted gene sets in the main text. For all simulated datasets, we used n = 500, |C| = 5, ?C = (0.67, 0.49, 0.29, 0.29, 0.2), and q = 0.0027538462.1 We used μP = (0.5, 0.35, 0.15) for the single pathway simulations." We also updated the description of our procedure for assessing the convergence of the MCMC algorithm in Additional file 1, section S2, which should read: "To assess the convergence of the MCMC algorithm, we ran multiple chains with different initializations. For one of these initializations, we used the collection output by Multi-Dendrix [21] (using the same values of the parameters t and k as in CoMEt). The remaining initializations were random collections. We ran the MCMC algorithm with these initializations, running each chain for a given number of iterations. We consider the chains converged if the mean total variation distance between the chains is smaller than 0.005. Otherwise, we increase the number of iterations by a factor of 1.5. We repeat this process until the chains converge or the total number of iterations per chain reaches a maximum number of iterations, which we set as 1 billion. The output of the MCMC algorithm is the union of the sampling distributions from the different initializations." The corrected Additional file 1 is included in this Erratum. more...
- Published
- 2016
- Full Text
- View/download PDF
39. Efficient algorithms to discover alterations with complementary functional association in cancer.
- Author
-
Sarto Basso, Rebecca, Hochbaum, Dorit S., and Vandin, Fabio
- Subjects
CANCER genetics ,PERTURBATION theory ,PHENOTYPES ,ALGORITHMS ,COMPUTATIONAL biology - Abstract
Recent large cancer studies have measured somatic alterations in an unprecedented number of tumours. These large datasets allow the identification of cancer-related sets of genetic alterations by identifying relevant combinatorial patterns. Among such patterns, mutual exclusivity has been employed by several recent methods that have shown its effectiveness in characterizing gene sets associated to cancer. Mutual exclusivity arises because of the complementarity, at the functional level, of alterations in genes which are part of a group (e.g., a pathway) performing a given function. The availability of quantitative target profiles, from genetic perturbations or from clinical phenotypes, provides additional information that can be leveraged to improve the identification of cancer related gene sets by discovering groups with complementary functional associations with such targets. In this work we study the problem of finding groups of mutually exclusive alterations associated with a quantitative (functional) target. We propose a combinatorial formulation for the problem, and prove that the associated computational problem is computationally hard. We design two algorithms to solve the problem and implement them in our tool UNCOVER. We provide analytic evidence of the effectiveness of UNCOVER in finding high-quality solutions and show experimentally that UNCOVER finds sets of alterations significantly associated with functional targets in a variety of scenarios. In particular, we show that our algorithms find sets which are better than the ones obtained by the state-of-the-art method, even when sets are evaluated using the statistical score employed by the latter. In addition, our algorithms are much faster than the state-of-the-art, allowing the analysis of large datasets of thousands of target profiles from cancer cell lines. We show that on two such datasets, one from project Achilles and one from the Genomics of Drug Sensitivity in Cancer project, UNCOVER identifies several significant gene sets with complementary functional associations with targets. Software available at: . [ABSTRACT FROM AUTHOR] more...
- Published
- 2019
- Full Text
- View/download PDF
40. NoMAS: A Computational Approach to Find Mutated Subnetworks Associated With Survival in Genome-Wide Cancer Studies.
- Author
-
Altieri, Federico, Hansen, Tommy V., and Vandin, Fabio
- Subjects
SOMATIC mutation ,LOG-rank test ,CANCER - Abstract
Next-generation sequencing technologies allow to measure somatic mutations in a large number of patients from the same cancer type: one of the main goals in their analysis is the identification of mutations associated with clinical parameters. The identification of such relationships is hindered by extensive genetic heterogeneity in tumors, with different genes mutated in different patients, due, in part, to the fact that genes and mutations act in the context of pathways : it is therefore crucial to study mutations in the context of interactions among genes. In this work we study the problem of identifying subnetworks of a large gene-gene interaction network with mutations associated with survival time. We formally define the associated computational problem by using a score for subnetworks based on the log-rank statistical test to compare the survival of two given populations. We propose a novel approach, based on a new algorithm, called N etwork o f M utations A ssociated with S urvival (NoMAS) to find subnetworks of a large interaction network whose mutations are associated with survival time. NoMAS is based on the color-coding technique, that has been previously employed in other applications to find the highest scoring subnetwork with high probability when the subnetwork score is additive. In our case the score is not additive, so our algorithm cannot identify the optimal solution with the same guarantees associated to additive scores. Nonetheless, we prove that, under a reasonable model for mutations in cancer, NoMAS identifies the optimal solution with high probability. We also design a holdout approach to identify subnetworks significantly associated with survival time. We test NoMAS on simulated and cancer data, comparing it to approaches based on single gene tests and to various greedy approaches. We show that our method does indeed find the optimal solution and performs better than the other approaches. Moreover, on three cancer datasets our method identifies subnetworks with significant association to survival when none of the genes has significant association with survival when considered in isolation. [ABSTRACT FROM AUTHOR] more...
- Published
- 2019
- Full Text
- View/download PDF
41. Research in Computational Molecular Biology - 19th Annual International Conference, RECOMB 2015, Proceedings
- Author
-
Leiserson, Mark D. M, Hsin Ta, Wu, Vandin, Fabio, and Raphael, Benjamin J.
- Subjects
Computer Science (all) ,Theoretical Computer Science - Published
- 2015
42. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin
- Author
-
Hoadley, Katherine A., Yau, Christina, Wolf, Denise M., Cherniack, Andrew D., Tamborero, David, Sam, Ng, Leiserson, Max D. M., Niu, Beifang, Mclellan, Michael D., Uzunangelov, Vladislav, Zhang, Jiashan, Kandoth, Cyriac, Akbani, Rehan, Shen, Hui, Omberg, Larsson, Chu, Andy, Margolin, Adam A., Van'T Veer, Laura J., Lopez Bigas, Nuria, Laird, Peter W., Raphael, Benjamin J., Ding, Li, Robertson, A. Gordon, Byers, Lauren A., Mills, Gordon B., Weinstein, John N., Van Waes, Carter, Chen, Zhong, Collisson, Eric A., Benz, Christopher C, Perou, Charles M., Stuart, Joshua M., Rachel, Abbott, Scott, Abbott, Arman Aksoy, B., Kenneth, Aldape, Adrian, Ally, Samirku mar Amin, Dimitris, Anastassiou, Todd Auman, J., Baggerly, Keith A., Miruna, Balasundaram, Saianand, Balu, Baylin, Stephen B., Benz, Stephen C., Berman, Benjamin P., Brady, Bernard, Bhatt, Ami S., Inanc, Birol, Black, Aaron D., Tom, Bodenheimer, Bootwalla, Moiz S., Jay, Bowen, Ryan, Bressler, Bristow, Christopher A., Brooks, Angela N., Bradley, Broom, Elizabeth, Buda, Robert, Burton, Butterfield, Yaron S. N., Daniel, Carlin, Carter, Scott L., Casasent, Tod D., Kyle, Chang, Stephen, Chanock, Lynda, Chin, Dong Yeon Cho, Juok, Cho, Eric, Chuah, Chun, Hye Jung E., Kristian, Cibulskis, Giovanni, Ciriello, James Cle land, Melisssa, Cline, Brian, Craft, Creighton, Chad J., Ludmila, Danilova, Tanja, Davidsen, Caleb, Davis, Dees, Nathan D., Kim, Delehaunty, Demchok, John A., Noreen, Dhalla, Daniel, Dicara, Huyen, Dinh, Dobson, Jason R., Deepti, Dodda, Harshavardhan, Doddapaneni, Lawrence, Donehower, Dooling, David J., Gideon, Dresdner, Jennifer, Drummond, Andrea, Eakin, Mary, Edgerton, Eldred, Jim M., Greg, Eley, Kyle, Ellrott, Cheng, Fan, Suzanne, Fei, Ina, Felau, Scott, Frazer, Freeman, Samuel S., Jessica, Frick, Fronick, Catrina C., Ful ton, Lucinda L., Robert, Fulton, Gabriel, Stacey B., Jianjiong, Gao, Gastier Foster, Julie M., Nils, Gehlenborg, Myra, George, Gad, Getz, Richard, Gibbs, Mary, Goldman, Abel Gonzalez Perez, Benjamin, Gross, Ranabir, Guin, Preethi, Gunaratne, Angela, Hadjipanayis, Hamilton, Mark P., Hamilton, Stanley R., Leng, Han, Han, Yi, Harper, Hollie A., Psalm, Haseley, David, Haussler, Neil Hayes, D., Heiman, David I., Elena, Helman, Carmen, Helsel, Herbrich, Shelley M., Her man, James G., Toshinori, Hinoue, Carrie, Hirst, Martin, Hirst, Holt, Robert A., Hoyle, Alan P., Lisa, Iype, Anders, Jacobsen, Jeffreys, Stuart R., Jensen, Mark A., Jones, Corbin D., Jones, Steven J. M., Zhenlin, Ju, Joonil, Jung, Andre, Kahles, Ari, Kahn, Joelle Kalicki Veizer, Divya, Kalra, Krishna Latha Kanchi, Kane, David W., Hoon, Kim, Jaegil, Kim, Theo, Knijnenburg, Koboldt, Daniel C., Christie, Kovar, Roger, Kramer, Richard, Kreisberg, Raju, Kucherlapati, Marc, Ladanyi, Lander, Eric S., Larson, David E., Lawrence, Michael S., Darlene, Lee, Eunjung, Lee, Semin, Lee, William, Lee, Kjong Van Lehmann, Kalle, Leinonen, Ler aas, Kristen M., Seth, Lerner, Levine, Douglas A., Lora, Lewis, Ley, Timothy J., Haiyan I., Li, Jun, Li, Wei, Li, Han, Liang, Lichtenberg, Tara M., Jake, Lin, Ling, Lin, Pei, Lin, Wen bin Liu, Yingchun, Liu, Yuexin, Liu, Lorenzi, Philip L., Charles, Lu, Yiling, Lu, Luquette, Love lace J., Singer, Ma, Magrini, Vincent J., Mahadeshwar, Harshad S., Mardis, Elaine R., Adam, Margolin, Marra, Marco A., Michael, Mayo, Cynthia, Mcallister, Mcguire, Sean E., Mcmichael, Joshua F., James, Melott, Shaowu, Meng, Matthew, Meyerson, Mieczkowski, Piotr A., Miller, Christopher A., Miller, Martin L., Michael, Miller, Moore, Richard A., Margaret, Morgan, Donna, Morton, Mose, Lisle E., Mungall, Andrew J., Donna, Muzny, Lam, Nguyen, Noble, Michael S., Houtan, Noushmehr, Michelle, O’Laughlin, Ojesina, Akinyemi I., Tai Hsien Ou Yang, Brad, Ozenberger, Angeliki, Pantazi, Michael, Parfenov, Park, Peter J., Parker, Joel S., Evan, Paull, Chandra Sekhar Pedamallu, Todd, Pihl, Craig, Pohl, David, Pot, Alexei, Protopopov, Teresa, Przytycka, Amie Raden baugh, Ramirez, Nilsa C., Ricardo, Ramirez, Gunnar Ra, ̈ tsch, Jeffrey, Reid, Xiao jia Ren, Boris, Reva, Reynolds, Sheila M., Rhie, Suhn K., Jeffrey, Roach, Hector, Rovira, Michael, Ryan, Gordon, Saksena, Sofie, Salama, Chris, Sander, Netty, Santoso, Schein, Jacqueline E., Heather, Schmidt, Nikolaus, Schultz, Schumacher, Steven E., Jonathan, Seidman, Yasin, Senbabaoglu, Sahil, Seth, Saman tha Sharpe, Ronglai, Shen, Margi, Sheth, Yan, Shi, Ilya, Shmulevich, Silva, Grace O., Simons, Janae V., Rileen, Sinha, Payal, Sipahimalani, Smith, Scott M., Sofia, Heidi J., Artem, Sokolov, Soloway, Mathew G., Xingzhi, Song, Carrie Soug nez, Paul, Spellman, Louis, Staudt, Chip, Stewart, Petar, Stojanov, Xiaoping, Su, Onur Sumer, S., Yichao, Sun, Teresa, Swatloski, Barbara, Tabak, Angela, Tam, Donghui, Tan, Jiabin, Tang, Roy, Tarnuzzer, Taylor, Barry S., Nina, Thiessen, Ves teinn Thorsson, Timothy Triche, J. r., Van Den Berg, David J., Vandin, Fabio, Varhol, Richard J., Vaske, Charles J., Umadevi, Veluvolu, Roeland, Verhaak, Doug, Voet, Jason, Walker, Wallis, John W., Peter, Waltman, Yunhu, Wan, Min, Wang, Wenyi, Wang, Zhining, Wang, Scot, Waring, Nils, Weinhold, Weisenberger, Daniel J., Wendl, Michael C., David, Wheeler, Wilkerson, Matthew D., Wilson, Richard K., Lisa, Wise, Andrew, Wong, Chang Jiun Wu, Chia Chin Wu, Hsin Ta Wu, Junyuan, Wu, Todd, Wylie, Liu, Xi, Ruibin, Xi, Zheng, Xia, Andrew W., Xu, Yang, Da, Liming, Yang, Lixing, Yang, Yang, Yang, Jun, Yao, Rong, Yao, Kai, Ye, Ko suke Yoshihara, Yuan, Yuan, Yung, Alfred K., Travis, Zack, Dong, Zeng, Jean Claude Zenklusen, Hailei, Zhang, Jianhua, Zhang, Nianxiang, Zhang, Qunyuan, Zhang, Wei, Zhang, Wei, Zhao, Siyuan, Zheng, Jing, Zhu, Erik, Zmuda, and Lihua, Zou more...
- Subjects
Genetics and Molecular Biology (all) ,Cluster Analysis ,Humans ,Neoplasms ,Transcriptome ,Biochemistry, Genetics and Molecular Biology (all) ,Extramural ,Biochemistry, Genetics and Molecular Biology(all) ,Cancer ,Computational biology ,Disease ,Biology ,medicine.disease ,Bioinformatics ,Biochemistry ,General Biochemistry, Genetics and Molecular Biology ,Article ,3. Good health ,Molecular classification ,TP63 ,CLUSTERS (ANÁLISE) ,medicine ,Head and neck ,Gene - Abstract
Summary Recent genomic analyses of pathologically defined tumor types identify "within-a-tissue" disease subtypes. However, the extent to which genomic signatures are shared across tissues is still unclear. We performed an integrative analysis using five genome-wide platforms and one proteomic platform on 3,527 specimens from 12 cancer types, revealing a unified classification into 11 major subtypes. Five subtypes were nearly identical to their tissue-of-origin counterparts, but several distinct cancer types were found to converge into common subtypes. Lung squamous, head and neck, and a subset of bladder cancers coalesced into one subtype typified by TP53 alterations, TP63 amplifications, and high expression of immune and proliferation pathway genes. Of note, bladder cancers split into three pan-cancer subtypes. The multiplatform classification, while correlated with tissue-of-origin, provides independent information for predicting clinical outcomes. All data sets are available for data-mining from a unified resource to support further biological discoveries and insights into novel therapeutic strategies. more...
- Published
- 2014
- Full Text
- View/download PDF
43. Reconstructing Cancer Pathways and Their Mutation Order from Cross-Sectional Data
- Author
-
Raphael, Benjamin J. and Vandin, Fabio
- Published
- 2014
44. De novo pathway-based biomarker identification.
- Author
-
Alcaraz, Nicolas, List, Markus, Batra, Richa, Vandin, Fabio, Ditzel, Henrik J., and Baumbach, Jan
- Published
- 2017
- Full Text
- View/download PDF
45. Computational Methods for Characterizing Cancer Mutational Heterogeneity.
- Author
-
Vandin, Fabio
- Subjects
NUCLEOTIDE sequencing ,CANCER genetics ,HETEROGENEITY - Abstract
Advances in DNA sequencing technologies have allowed the characterization of somatic mutations in a large number of cancer genomes at an unprecedented level of detail, revealing the extreme genetic heterogeneity of cancer at two different levels: inter-tumor, with different patients of the same cancer type presenting different collections of somatic mutations, and intra-tumor, with different clones coexisting within the same tumor. Both inter-tumor and intra-tumor heterogeneity have crucial implications for clinical practices. Here, we review computational methods that use somatic alterations measured through next-generation DNA sequencing technologies for characterizing tumor heterogeneity and its association with clinical variables. We first review computational methods for studying inter-tumor heterogeneity, focusing on methods that attempt to summarize cancer heterogeneity by discovering pathways that are commonly mutated across different patients of the same cancer type. We then review computational methods for characterizing intra-tumor heterogeneity using information from bulk sequencing data or from single cell sequencing data. Finally, we present some of the recent computational methodologies that have been proposed to identify and assess the association between inter- or intra-tumor heterogeneity with clinical variables. [ABSTRACT FROM AUTHOR] more...
- Published
- 2017
- Full Text
- View/download PDF
46. An Efficient Branch and Cut Algorithm to Find Frequently Mutated Subnetworks in Cancer.
- Author
-
Bomersbach, Anna, Chiarandini, Marco, and Vandin, Fabio
- Published
- 2016
- Full Text
- View/download PDF
47. The Second Decade of the International Conference on Research in Computational Molecular Biology (RECOMB).
- Author
-
Hormozdiari, Farhad, Hormozdiari, Fereydoun, Kingsford, Carl, Medvedev, Paul, and Vandin, Fabio
- Published
- 2016
- Full Text
- View/download PDF
48. On the Sample Complexity of Cancer Pathways Identification.
- Author
-
Vandin, Fabio, Raphael, Benjamin J., and Upfal, Eli
- Published
- 2015
- Full Text
- View/download PDF
49. On the Sample Complexity of Cancer Pathways Identification.
- Author
-
Vandin, Fabio, Raphael, Benjamin J., and Upfal, Eli
- Subjects
- *
NUCLEOTIDE sequencing , *CANCER genetics , *SOMATIC mutation , *MACHINE learning , *GENOMICS - Abstract
Advances in DNA sequencing technologies have enabled large cancer sequencing studies, collecting somatic mutation data from a large number of cancer patients. One of the main goals of these studies is the identification of all cancer genes-genes associated with cancer. Its achievement is complicated by the extensive mutational heterogeneity of cancer, due to the fact that important mutations in cancer target combinations of genes (i.e., pathways). Recently, the pattern of mutual exclusivity among mutations in a cancer pathway has been observed, and methods that find significant combinations of cancer genes by detecting mutual exclusivity have been proposed. A key question in the analysis of mutual exclusivity is the computation of the minimum number of samples required to reliably find a meaningful set of mutually exclusive mutations in the data, or conclude that there is no such set. In general, the problem of determining the sample complexity, or the number of samples required to identify significant combinations of features, of genomic problems is largely unexplored. In this work we propose a framework to analyze the sample complexity of problems that arise in the study of genomic datasets. Our framework is based on tools from combinatorial analysis and statistical learning theory that have been used for the analysis of machine learning and probably approximately correct (PAC) learning. We use our framework to analyze the problem of the identification of cancer pathways through mutual exclusivity analysis. We analytically derive matching upper and lower bounds on the sample complexity of the problem, showing that sample sizes much larger than currently available may be required to identify all the cancer genes in a pathway. We also provide two algorithms to find a cancer pathway from a large genomic dataset. On simulated and cancer data, we show that our algorithms can be used to identify cancer pathways from large genomic datasets. [ABSTRACT FROM AUTHOR] more...
- Published
- 2016
- Full Text
- View/download PDF
50. CoMEt: a statistical approach to identify combinations of mutually exclusive alterations in cancer.
- Author
-
Leiserson, Mark D. M., Hsin-Ta Wu, Vandin, Fabio, and Raphael, Benjamin J.
- Published
- 2015
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.