48 results on '"Vandin, Fabio"'
Search Results
2. caSPiTa: mining statistically significant paths in time series data from an unknown network
- Author
-
Tonon, Andrea and Vandin, Fabio
- Published
- 2023
- Full Text
- View/download PDF
3. gRosSo: mining statistically robust patterns from a sequence of datasets
- Author
-
Tonon, Andrea and Vandin, Fabio
- Published
- 2022
- Full Text
- View/download PDF
4. Identifying Drug Sensitivity Subnetworks with NETPHIX
- Author
-
Kim, Yoo-Ah, Sarto Basso, Rebecca, Wojtowicz, Damian, Liu, Amanda S., Hochbaum, Dorit S., Vandin, Fabio, and Przytycka, Teresa M.
- Published
- 2020
- Full Text
- View/download PDF
5. Efficient mining of the most significant patterns with permutation testing
- Author
-
Pellegrina, Leonardo and Vandin, Fabio
- Published
- 2020
- Full Text
- View/download PDF
6. ALLSTAR: inference of reliAble causaL ruLes between Somatic muTAtions and canceR phenotypes.
- Author
-
Simionato, Dario, Collesei, Antonio, Miglietta, Federica, and Vandin, Fabio
- Subjects
SOMATIC mutation ,NP-hard problems ,PHENOTYPES ,DNA sequencing ,CAUSAL inference - Abstract
Motivation Recent advances in DNA sequencing technologies have allowed the detailed characterization of genomes in large cohorts of tumors, highlighting their extreme heterogeneity, with no two tumors sharing the same complement of somatic mutations. Such heterogeneity hinders our ability to identify somatic mutations important for the disease, including mutations that determine clinically relevant phenotypes (e.g. cancer subtypes). Several tools have been developed to identify somatic mutations related to cancer phenotypes. However, such tools identify correlations between somatic mutations and cancer phenotypes, with no guarantee of highlighting causal relations. Results We describe ALLSTAR , a novel tool to infer reliable causal relations between somatic mutations and cancer phenotypes. ALLSTAR identifies reliable causal rules highlighting combinations of somatic mutations with the highest impact in terms of average effect on the phenotype. While we prove that the underlying computational problem is NP-hard, we develop a branch-and-bound approach that employs protein–protein interaction networks and novel bounds for pruning the search space, while properly correcting for multiple hypothesis testing. Our extensive experimental evaluation on synthetic data shows that our tool is able to identify reliable causal relations in large cancer cohorts. Moreover, the reliable causal rules identified by our tool in cancer data show that our approach identifies several somatic mutations known to be relevant for cancer phenotypes as well as novel biologically meaningful relations. Availability and implementation Code, data, and scripts to reproduce the experiments available at https://github.com/VandinLab/ALLSTAR. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Enriched power of disease-concordant twin-case-only design in detecting interactions in genome-wide association studies
- Author
-
Li, Weilong, Baumbach, Jan, Mohammadnejad, Afsaneh, Brasch-Andersen, Charlotte, Vandin, Fabio, Korbel, Jan O., and Tan, Qihua
- Published
- 2019
- Full Text
- View/download PDF
8. SILVAN: Estimating Betweenness Centralities with Progressive Sampling and Non-uniform Rademacher Bounds.
- Author
-
Pellegrina, Leonardo and Vandin, Fabio
- Subjects
STATISTICAL learning ,APPROXIMATION algorithms - Abstract
"Sim Sala Bim!" —Silvan, https://en.wikipedia.org/wiki/Silvan%5f(illusionist) Betweenness centrality is a popular centrality measure with applications in several domains and whose exact computation is impractical for modern-sized networks. We present SILVAN, a novel, efficient algorithm to compute, with high probability, accurate estimates of the betweenness centrality of all nodes of a graph and a high-quality approximation of the top-k betweenness centralities. SILVAN follows a progressive sampling approach and builds on novel bounds based on Monte Carlo Empirical Rademacher Averages, a powerful and flexible tool from statistical learning theory. SILVAN relies on a novel estimation scheme providing non-uniform bounds on the deviation of the estimates of the betweenness centrality of all the nodes from their true values and a refined characterisation of the number of samples required to obtain a high-quality approximation. Our extensive experimental evaluation shows that SILVAN extracts high-quality approximations while outperforming, in terms of number of samples and accuracy, the state-of-the-art approximation algorithm with comparable quality guarantees. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. CoExpresso: assess the quantitative behavior of protein complexes in human cells
- Author
-
Chalabi, Morteza H., Tsiamis, Vasileios, Käll, Lukas, Vandin, Fabio, and Schwämmle, Veit
- Published
- 2019
- Full Text
- View/download PDF
10. Differentially mutated subnetworks discovery
- Author
-
Hajkarim, Morteza Chalabi, Upfal, Eli, and Vandin, Fabio
- Published
- 2019
- Full Text
- View/download PDF
11. Jllumina - A comprehensive Java-based API for statistical Illumina Infinium HumanMethylation450 and Infinium MethylationEPIC BeadChip data processing
- Author
-
Almeida Diogo, Skov Ida, Lund Jesper, Mohammadnejad Afsaneh, Silva Artur, Vandin Fabio, Tan Qihua, Baumbach Jan, and Röttger Richard
- Subjects
Biotechnology ,TP248.13-248.65 - Abstract
Measuring differential methylation of the DNA is the nowadays most common approach to linking epigenetic modifications to diseases (called epigenome-wide association studies, EWAS). For its low cost, its efficiency and easy handling, the Illumina HumanMethylation450 BeadChip and its successor, the Infinium MethylationEPIC BeadChip, is the by far most popular techniques for conduction EWAS in large patient cohorts. Despite the popularity of this chip technology, raw data processing and statistical analysis of the array data remains far from trivial and still lacks dedicated software libraries enabling high quality and statistically sound downstream analyses. As of yet, only R-based solutions are freely available for low-level processing of the Illumina chip data. However, the lack of alternative libraries poses a hurdle for the development of new bioinformatic tools, in particular when it comes to web services or applications where run time and memory consumption matter, or EWAS data analysis is an integrative part of a bigger framework or data analysis pipeline. We have therefore developed and implemented Jllumina, an open-source Java library for raw data manipulation of Illumina Infinium HumanMethylation450 and Infinium MethylationEPIC BeadChip data, supporting the developer with Java functions covering reading and preprocessing the raw data, down to statistical assessment, permutation tests, and identification of differentially methylated loci. Jllumina is fully parallelizable and publicly available at http://dimmer.compbio.sdu.dk/download.html
- Published
- 2016
- Full Text
- View/download PDF
12. Integrated Genomic Characterization of Papillary Thyroid Carcinoma
- Author
-
Agrawal, Nishant, Akbani, Rehan, Aksoy, B. Arman, Ally, Adrian, Arachchi, Harindra, Asa, Sylvia L., Auman, J. Todd, Balasundaram, Miruna, Balu, Saianand, Baylin, Stephen B., Behera, Madhusmita, Bernard, Brady, Beroukhim, Rameen, Bishop, Justin A., Black, Aaron D., Bodenheimer, Tom, Boice, Lori, Bootwalla, Moiz S., Bowen, Jay, Bowlby, Reanne, Bristow, Christopher A., Brookens, Robin, Brooks, Denise, Bryant, Robert, Buda, Elizabeth, Butterfield, Yaron S.N., Carling, Tobias, Carlsen, Rebecca, Carter, Scott L., Carty, Sally E., Chan, Timothy A., Chen, Amy Y., Cherniack, Andrew D., Cheung, Dorothy, Chin, Lynda, Cho, Juok, Chu, Andy, Chuah, Eric, Cibulskis, Kristian, Ciriello, Giovanni, Clarke, Amanda, Clayman, Gary L., Cope, Leslie, Copland, John A., Covington, Kyle, Danilova, Ludmila, Davidsen, Tanja, Demchok, John A., DiCara, Daniel, Dhalla, Noreen, Dhir, Rajiv, Dookran, Sheliann S., Dresdner, Gideon, Eldridge, Jonathan, Eley, Greg, El-Naggar, Adel K., Eng, Stephanie, Fagin, James A., Fennell, Timothy, Ferris, Robert L., Fisher, Sheila, Frazer, Scott, Frick, Jessica, Gabriel, Stacey B., Ganly, Ian, Gao, Jianjiong, Garraway, Levi A., Gastier-Foster, Julie M., Getz, Gad, Gehlenborg, Nils, Ghossein, Ronald, Gibbs, Richard A., Giordano, Thomas J., Gomez-Hernandez, Karen, Grimsby, Jonna, Gross, Benjamin, Guin, Ranabir, Hadjipanayis, Angela, Harper, Hollie A., Hayes, D. Neil, Heiman, David I., Herman, James G., Hoadley, Katherine A., Hofree, Matan, Holt, Robert A., Hoyle, Alan P., Huang, Franklin W., Huang, Mei, Hutter, Carolyn M., Ideker, Trey, Iype, Lisa, Jacobsen, Anders, Jefferys, Stuart R., Jones, Corbin D., Jones, Steven J.M., Kasaian, Katayoon, Kebebew, Electron, Khuri, Fadlo R., Kim, Jaegil, Kramer, Roger, Kreisberg, Richard, Kucherlapati, Raju, Kwiatkowski, David J., Ladanyi, Marc, Lai, Phillip H., Laird, Peter W., Lander, Eric, Lawrence, Michael S., Lee, Darlene, Lee, Eunjung, Lee, Semin, Lee, William, Leraas, Kristen M., Lichtenberg, Tara M., Lichtenstein, Lee, Lin, Pei, Ling, Shiyun, Liu, Jinze, Liu, Wenbin, Liu, Yingchun, LiVolsi, Virginia A., Lu, Yiling, Ma, Yussanne, Mahadeshwar, Harshad S., Marra, Marco A., Mayo, Michael, McFadden, David G., Meng, Shaowu, Meyerson, Matthew, Mieczkowski, Piotr A., Miller, Michael, Mills, Gordon, Moore, Richard A., Mose, Lisle E., Mungall, Andrew J., Murray, Bradley A., Nikiforov, Yuri E., Noble, Michael S., Ojesina, Akinyemi I., Owonikoko, Taofeek K., Ozenberger, Bradley A., Pantazi, Angeliki, Parfenov, Michael, Park, Peter J., Parker, Joel S., Paull, Evan O., Pedamallu, Chandra Sekhar, Perou, Charles M., Prins, Jan F., Protopopov, Alexei, Ramalingam, Suresh S., Ramirez, Nilsa C., Ramirez, Ricardo, Raphael, Benjamin J., Rathmell, W. Kimryn, Ren, Xiaojia, Reynolds, Sheila M., Rheinbay, Esther, Ringel, Matthew D., Rivera, Michael, Roach, Jeffrey, Robertson, A. Gordon, Rosenberg, Mara W., Rosenthal, Matthew, Sadeghi, Sara, Saksena, Gordon, Sander, Chris, Santoso, Netty, Schein, Jacqueline E., Schultz, Nikolaus, Schumacher, Steven E., Seethala, Raja R., Seidman, Jonathan, Senbabaoglu, Yasin, Seth, Sahil, Sharpe, Samantha, Shaw, Kenna R. Mills, Shen, John P., Shen, Ronglai, Sherman, Steven, Sheth, Margi, Shi, Yan, Shmulevich, Ilya, Sica, Gabriel L., Simons, Janae V., Sinha, Rileen, Sipahimalani, Payal, Smallridge, Robert C., Sofia, Heidi J., Soloway, Matthew G., Song, Xingzhi, Sougnez, Carrie, Stewart, Chip, Stojanov, Petar, Stuart, Joshua M., Sumer, S. Onur, Sun, Yichao, Tabak, Barbara, Tam, Angela, Tan, Donghui, Tang, Jiabin, Tarnuzzer, Roy, Taylor, Barry S., Thiessen, Nina, Thorne, Leigh, Thorsson, Vésteinn, Tuttle, R. Michael, Umbricht, Christopher B., Van Den Berg, David J., Vandin, Fabio, Veluvolu, Umadevi, Verhaak, Roel G.W., Vinco, Michelle, Voet, Doug, Walter, Vonn, Wang, Zhining, Waring, Scot, Weinberger, Paul M., Weinhold, Nils, Weinstein, John N., Weisenberger, Daniel J., Wheeler, David, Wilkerson, Matthew D., Wilson, Jocelyn, Williams, Michelle, Winer, Daniel A., Wise, Lisa, Wu, Junyuan, Xi, Liu, Xu, Andrew W., Yang, Liming, Yang, Lixing, Zack, Travis I., Zeiger, Martha A., Zeng, Dong, Zenklusen, Jean Claude, Zhao, Ni, Zhang, Hailei, Zhang, Jianhua, Zhang, Jiashan (Julia), Zhang, Wei, Zmuda, Erik, and Zou, Lihua
- Published
- 2014
- Full Text
- View/download PDF
13. Computational pan-genomics: status, promises and challenges
- Author
-
Marschall, Tobias, Marz, Manja, Abeel, Thomas, Dijkstra, Louis, Dutilh, Bas E, Ghaffaari, Ali, Kersey, Paul, Kloosterman, Wigard P, Mäkinen, Veli, Novak, Adam M, Paten, Benedict, Porubsky, David, Rivals, Eric, Alkan, Can, Baaijens, Jasmijn A, De Bakker, Paul I W, Boeva, Valentina, Bonnal, Raoul J P, Chiaromonte, Francesca, Chikhi, Rayan, Ciccarelli, Francesca D, Cijvat, Robin, Datema, Erwin, Van Duijn, Cornelia M, Eichler, Evan E, Ernst, Corinna, Eskin, Eleazar, Garrison, Erik, El-Kebir, Mohammed, Klau, Gunnar W, Korbel, Jan O, Lameijer, Eric-Wubbo, Langmead, Benjamin, Martin, Marcel, Medvedev, Paul, Mu, John C, Neerincx, Pieter, Ouwens, Klaasjan, Peterlongo, Pierre, Pisanti, Nadia, Rahmann, Sven, Raphael, Ben, Reinert, Knut, de Ridder, Dick, de Ridder, Jeroen, Schlesner, Matthias, Schulz-Trieglaff, Ole, Sanders, Ashley D, Sheikhizadeh, Siavash, Shneider, Carl, Smit, Sandra, Valenzuela, Daniel, Wang, Jiayin, Wessels, Lodewyk, Zhang, Ying, Guryev, Victor, Vandin, Fabio, Ye, Kai, and Schönhuth, Alexander
- Published
- 2018
- Full Text
- View/download PDF
14. Disease‐Concordant Twins Empower Genetic Association Studies
- Author
-
Tan, Qihua, Li, Weilong, and Vandin, Fabio
- Published
- 2017
- Full Text
- View/download PDF
15. Differentially Methylated Genomic Regions in Birth-Weight Discordant Twin Pairs
- Author
-
Chen, Mubo, Baumbach, Jan, Vandin, Fabio, Röttger, Richard, Barbosa, Eudes, Dong, Mingchui, Frost, Morten, Christiansen, Lene, and Tan, Qihua
- Published
- 2016
- Full Text
- View/download PDF
16. MCRapper: Monte-Carlo Rademacher Averages for Poset Families and Approximate Pattern Mining.
- Author
-
PELLEGRINA, LEONARDO, COUSINS, CYRUS, VANDIN, FABIO, and RIONDATO, MATTEO
- Subjects
PARTIALLY ordered sets ,STATISTICAL learning ,STATISTICAL power analysis - Abstract
We present MCRapper, an algorithm for efficient computation of Monte-Carlo Empirical Rademacher Averages (MCERA) for families of functions exhibiting poset (e.g., lattice) structure, such as those that arise in many pattern mining tasks. The MCERA allows us to compute upper bounds to the maximum deviation of sample means from their expectations, thus it can be used to find both (1) statistically-significant functions (i.e., patterns) when the available data is seen as a sample from an unknown distribution, and (2) approximations of collections of high-expectation functions (e.g., frequent patterns) when the available data is a small sample froma large dataset. This flexibility offered byMCRapper is a big advantage over previously proposed solutions, which could only achieve one of the two. MCRapper uses upper bounds to the discrepancy of the functions to efficiently explore and prune the search space, a technique borrowed from pattern mining itself. To show the practical use of MCRapper, we employ it to develop an algorithm TFP-R for the task of True Frequent Pattern (TFP) mining, by appropriately computing approximations of the negative and positive borders of the collection of patterns of interest, which allow an effective pruning of the pattern space and the computation of strong bounds to the supremum deviation. TFP-R gives guarantees on the probability of including any false positives (precision) and exhibits higher statistical power (recall) than existing methods offering the same guarantees. We evaluate MCRapper and TFP-R and show that they outperform the state-of-the-art for their respective tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
17. Mutational landscape and significance across 12 major cancer types
- Author
-
Kandoth, Cyriac, McLellan, Michael D., Vandin, Fabio, Ye, Kai, Niu, Beifang, Lu, Charles, Xie, Mingchao, Zhang, Qunyuan, McMichael, Joshua F., Wyczalkowski, Matthew A., Leiserson, Mark D. M., Miller, Christopher A., Welch, John S., Walter, Matthew J., Wendl, Michael C., Ley, Timothy J., Wilson, Richard K., Raphael, Benjamin J., and Ding, Li
- Published
- 2013
- Full Text
- View/download PDF
18. Mining top-K frequent itemsets through progressive sampling
- Author
-
Pietracaprina, Andrea, Riondato, Matteo, Upfal, Eli, and Vandin, Fabio
- Published
- 2010
- Full Text
- View/download PDF
19. Discovering significant evolutionary trajectories in cancer phylogenies.
- Author
-
Pellegrina, Leonardo and Vandin, Fabio
- Subjects
- *
INTERNET servers , *GENE regulatory networks , *ACUTE myeloid leukemia , *ARBORETUMS - Abstract
Motivation Tumors are the result of a somatic evolutionary process leading to substantial intra-tumor heterogeneity. Single-cell and multi-region sequencing enable the detailed characterization of the clonal architecture of tumors and have highlighted its extensive diversity across tumors. While several computational methods have been developed to characterize the clonal composition and the evolutionary history of tumors, the identification of significantly conserved evolutionary trajectories across tumors is still a major challenge. Results We present a new algorithm, MAximal tumor treeS TRajectOries (MASTRO), to discover significantly conserved evolutionary trajectories in cancer. MASTRO discovers all conserved trajectories in a collection of phylogenetic trees describing the evolution of a cohort of tumors, allowing the discovery of conserved complex relations between alterations. MASTRO assesses the significance of the trajectories using a conditional statistical test that captures the coherence in the order in which alterations are observed in different tumors. We apply MASTRO to data from nonsmall-cell lung cancer bulk sequencing and to acute myeloid leukemia data from single-cell panel sequencing, and find significant evolutionary trajectories recapitulating and extending the results reported in the original studies. Availability and implementation MASTRO is available at https://github.com/VandinLab/MASTRO. Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
20. Comprehensive molecular characterization of clear cell renal cell carcinoma
- Author
-
Creighton, Chad J., Morgan, Margaret, Gunaratne, Preethi H., Wheeler, David A., Gibbs, Richard A., Robertson, Gordon A., Chu, Andy, Beroukhim, Rameen, Cibulskis, Kristian, Signoretti, Sabina, Hsin-Ta Wu, Fabio Vandin, Raphael, Benjamin J., Verhaak, Roel G. W., Tamboli, Pheroze, Torres-Garcia, Wandaliz, Akbani, Rehan, Weinstein, John N., Reuter, Victor, Hsieh, James J., Brannon, Rose A., Ari Hakimi, A., Jacobsen, Anders, Ciriello, Giovanni, Reva, Boris, Ricketts, Christopher J., Linehan, Marston W., Stuart, Joshua M., Rathmell, Kimryn W., Shen, Hui, Laird, Peter W., Muzny, Donna, Davis, Caleb, Xi, Liu, Chang, Kyle, Kakkar, Nipun, Treviño, Lisa R., Benton, Susan, Reid, Jeffrey G., Morton, Donna, Doddapaneni, Harsha, Han, Yi, Lewis, Lora, Dinh, Huyen, Kovar, Christie, Zhu, Yiming, Santibanez, Jireh, Wang, Min, Hale, Walker, Kalra, Divya, Getz, Gad, Lawrence, Michael S., Sougnez, Carrie, Carter, Scott L., Sivachenko, Andrey, Lichtenstein, Lee, Stewart, Chip, Voet, Doug, Fisher, Sheila, Gabriel, Stacey B., Lander, Eric, Schumacher, Steve E., Tabak, Barbara, Saksena, Gordon, Onofrio, Robert C., Cherniack, Andrew D., Gentry, Jeff, Ardlie, Kristin, Meyerson, Matthew, Chun, Hye-Jung E., Mungall, Andrew J., Sipahimalani, Payal, Stoll, Dominik, Ally, Adrian, Balasundaram, Miruna, Butterfield, Yaron S. N., Carlsen, Rebecca, Carter, Candace, Chuah, Eric, Coope, Robin J. N., Dhalla, Noreen, Gorski, Sharon, Guin, Ranabir, Hirst, Carrie, Hirst, Martin, Holt, Robert A., Lebovitz, Chandra, Lee, Darlene, Li, Haiyan I., Mayo, Michael, Moore, Richard A., Pleasance, Erin, Plettner, Patrick, Schein, Jacqueline E., Shafiei, Arash, Slobodan, Jared R., Tam, Angela, Thiessen, Nina, Varhol, Richard J., Wye, Natasja, Zhao, Yongjun, Birol, Inanc, Jones, Steven J. M., Marra, Marco A., Auman, Todd J., Tan, Donghui, Jones, Corbin D., Hoadley, Katherine A., Mieczkowski, Piotr A., Mose, Lisle E., Jefferys, Stuart R., Topal, Michael D., Liquori, Christina, Turman, Yidi J., Shi, Yan, Waring, Scot, Buda, Elizabeth, Walsh, Jesse, Wu, Junyuan, Bodenheimer, Tom, Hoyle, Alan P., Simons, Janae V., Soloway, Mathew G., Balu, Saianand, Parker, Joel S., Hayes, Neil D., Perou, Charles M., Kucherlapati, Raju, Park, Peter, Triche, Timothy, Jr, Weisenberger, Daniel J., Lai, Phillip H., Bootwalla, Moiz S., Maglinte, Dennis T., Mahurkar, Swapna, Berman, Benjamin P., Van Den Berg, David J., Cope, Leslie, Baylin, Stephen B., Noble, Michael S., DiCara, Daniel, Zhang, Hailei, Cho, Juok, Heiman, David I., Gehlenborg, Nils, Mallard, William, Lin, Pei, Frazer, Scott, Stojanov, Petar, Liu, Yingchun, Zhou, Lihua, Kim, Jaegil, Chin, Lynda, Vandin, Fabio, Wu, Hsin-Ta, Benz, Christopher, Yau, Christina, Reynolds, Sheila M., Shmulevich, Ilya, Verhaak, Roel G.W., Vegesna, Rahul, Kim, Hoon, Zhang, Wei, Cogdell, David, Jonasch, Eric, Ding, Zhiyong, Lu, Yiling, Zhang, Nianxiang, Unruh, Anna K., Casasent, Tod D., Wakefield, Chris, Tsavachidou, Dimitra, Mills, Gordon B., Schultz, Nikolaus, Antipin, Yevgeniy, Gao, Jianjiong, Cerami, Ethan, Gross, Benjamin, Aksoy, Arman B., Sinha, Rileen, Weinhold, Nils, Sumer, Onur S., Taylor, Barry S., Shen, Ronglai, Ostrovnaya, Irina, Berger, Michael F., Ladanyi, Marc, Sander, Chris, Fei, Suzanne S., Stout, Andrew, Spellman, Paul T., Rubin, Daniel L., Liu, Tiffany T., Ng, Sam, Paull, Evan O., Carlin, Daniel, Goldstein, Theodore, Waltman, Peter, Ellrott, Kyle, Zhu, Jing, Haussler, David, Xiao, Weimin, Shelton, Candace, Gardner, Johanna, Penny, Robert, Sherman, Mark, Mallery, David, Morris, Scott, Paulauskis, Joseph, Burnett, Ken, Shelton, Troy, Kaelin, William G., Choueiri, Toni, Atkins, Michael B., Curley, Erin, Tickoo, Satish, Thorne, Leigh, Boice, Lori, Huang, Mei, Fisher, Jennifer C., Vocke, Cathy D., Peterson, James, Worrell, Robert, Merino, Maria J., Schmidt, Laura S., Czerniak, Bogdan A., Aldape, Kenneth D., Wood, Christopher G., Boyd, Jeff, Weaver, JoEllen, Iacocca, Mary V., Petrelli, Nicholas, Witkin, Gary, Brown, Jennifer, Czerwinski, Christine, Huelsenbeck-Dill, Lori, Rabeno, Brenda, Myers, Jerome, Morrison, Carl, Bergsten, Julie, Eckman, John, Harr, Jodi, Smith, Christine, Tucker, Kelinda, Zach, Leigh Anne, Bshara, Wiam, Gaudioso, Carmelo, Dhir, Rajiv, Maranchie, Jodi, Nelson, Joel, Parwani, Anil, Potapova, Olga, Fedosenko, Konstantin, Cheville, John C., Thompson, Houston R., Mosquera, Juan M., Rubin, Mark A., Blute, Michael L., Pihl, Todd, Jensen, Mark, Sfeir, Robert, Kahn, Ari, Chu, Anna, Kothiyal, Prachi, Snyder, Eric, Pontius, Joan, Ayala, Brenda, Backus, Mark, Walton, Jessica, Baboud, Julien, Berton, Dominique, Nicholls, Matthew, Srinivasan, Deepak, Raman, Rohini, Girshik, Stanley, Kigonya, Peter, Alonso, Shelley, Sanbhadti, Rashmi, Barletta, Sean, Pot, David, Sheth, Margi, Demchok, John A., Davidsen, Tanja, Wang, Zhining, Yang, Liming, Tarnuzzer, Roy W., Zhang, Jiashan, Eley, Greg, Ferguson, Martin L., Mills Shaw, Kenna R., Guyer, Mark S., Ozenberger, Bradley A., and Sofia, Heidi J.
- Published
- 2013
- Full Text
- View/download PDF
21. SPRISS: approximating frequent k-mers by sampling reads, and applications.
- Author
-
Santoro, Diego, Pellegrina, Leonardo, Comin, Matteo, and Vandin, Fabio
- Subjects
SINGLE nucleotide polymorphisms ,NUCLEOTIDE sequencing ,FRACTIONS - Abstract
Motivation The extraction of k -mers is a fundamental component in many complex analyses of large next-generation sequencing datasets, including reads classification in genomics and the characterization of RNA-seq datasets. The extraction of all k -mers and their frequencies is extremely demanding in terms of running time and memory, owing to the size of the data and to the exponential number of k -mers to be considered. However, in several applications, only frequent k -mers, which are k -mers appearing in a relatively high proportion of the data, are required by the analysis. Results In this work, we present SPRISS, a new efficient algorithm to approximate frequent k -mers and their frequencies in next-generation sequencing data. SPRISS uses a simple yet powerful reads sampling scheme, which allows to extract a representative subset of the dataset that can be used, in combination with any k -mer counting algorithm, to perform downstream analyses in a fraction of the time required by the analysis of the whole data, while obtaining comparable answers. Our extensive experimental evaluation demonstrates the efficiency and accuracy of SPRISS in approximating frequent k -mers, and shows that it can be used in various scenarios, such as the comparison of metagenomic datasets, the identification of discriminative k -mers, and SNP (single nucleotide polymorphism) genotyping, to extract insights in a fraction of the time required by the analysis of the whole dataset. Availability and implementation SPRISS [a preliminary version (Santoro et al. , 2021) of this work was presented at RECOMB 2021] is available at https://github.com/VandinLab/SPRISS. Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
22. The mutational landscape of lethal castration-resistant prostate cancer
- Author
-
Grasso, Catherine S., Wu, Yi-Mi, Robinson, Dan R., Cao, Xuhong, Dhanasekaran, Saravana M., Khan, Amjad P., Quist, Michael J., Jing, Xiaojun, Lonigro, Robert J., Brenner, Chad J., Asangani, Irfan A., Ateeq, Bushra, Chun, Sang Y., Siddiqui, Javed, Sam, Lee, Anstett, Matt, Mehra, Rohit, Prensner, John R., Palanisamy, Nallasivam, Ryslik, Gregory A., Vandin, Fabio, Raphael, Benjamin J., Kunju, Lakshmi P., Rhodes, Daniel R., Pienta, Kenneth J., Chinnaiyan, Arul M., and Tomlins, Scott A.
- Published
- 2012
- Full Text
- View/download PDF
23. Finding driver pathways in cancer: models and algorithms
- Author
-
Vandin Fabio, Upfal Eli, and Raphael Benjamin J
- Subjects
Cancer ,Somatic Mutations ,Driver mutations ,Pathways ,Background mutation rate ,Generative models ,Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract Background Cancer sequencing projects are now measuring somatic mutations in large numbers of cancer genomes. A key challenge in interpreting these data is to distinguish driver mutations, mutations important for cancer development, from passenger mutations that have accumulated in somatic cells but without functional consequences. A common approach to identify genes harboring driver mutations is a single gene test that identifies individual genes that are recurrently mutated in a significant number of cancer genomes. However, the power of this test is reduced by: (1) the necessity of estimating the background mutation rate (BMR) for each gene; (2) the mutational heterogeneity in most cancers meaning that groups of genes (e.g. pathways), rather than single genes, are the primary target of mutations. Results We investigate the problem of discovering driver pathways, groups of genes containing driver mutations, directly from cancer mutation data and without prior knowledge of pathways or other interactions between genes. We introduce two generative models of somatic mutations in cancer and study the algorithmic complexity of discovering driver pathways in both models. We show that a single gene test for driver genes is highly sensitive to the estimate of the BMR. In contrast, we show that an algorithmic approach that maximizes a straightforward measure of the mutational properties of a driver pathway successfully discovers these groups of genes without an estimate of the BMR. Moreover, this approach is also successful in the case when the observed frequencies of passenger and driver mutations are indistinguishable, a situation where single gene tests fail. Conclusions Accurate estimation of the BMR is a challenging task. Thus, methods that do not require an estimate of the BMR, such as the ones we provide here, can give increased power for the discovery of driver genes.
- Published
- 2012
- Full Text
- View/download PDF
24. Attention-Based Deep Learning Framework for Human Activity Recognition With User Adaptation.
- Author
-
Buffelli, Davide and Vandin, Fabio
- Abstract
Sensor-based human activity recognition (HAR) requires to predict the action of a person based on sensor-generated time series data. HAR has attracted major interest in the past few years, thanks to the large number of applications enabled by modern ubiquitous computing devices. While several techniques based on hand-crafted feature engineering have been proposed, the current state-of-the-art is represented by deep learning architectures that automatically obtain high level representations and that use recurrent neural networks (RNNs) to extract temporal dependencies in the input. RNNs have several limitations, in particular in dealing with long-term dependencies. We propose a novel deep learning framework, TrASenD, based on a purely attention-based mechanism, that overcomes the limitations of the state-of-the-art. We show that our proposed attention-based architecture is considerably more powerful than previous approaches, with an average increment, of more than 7% on the F1 score over the previous best performing model. Furthermore, we consider the problem of personalizing HAR deep learning models, which is of great importance in several applications. We propose a simple and effective transfer-learning based strategy to adapt a model to a specific user, providing an average increment of 6% on the F1 score on the predictions for that user. Our extensive experimental evaluation proves the significantly superior capabilities of our proposed framework over the current state-of-the-art and the effectiveness of our user adaptation technique. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
25. Comparison of microbiome samples: methods and computational challenges.
- Author
-
Comin, Matteo, Camillo, Barbara Di, Pizzi, Cinzia, and Vandin, Fabio
- Subjects
METAGENOMICS ,NUCLEOTIDE sequencing ,SAMPLING methods ,PHENOTYPES ,GENOMES ,MICROBIAL communities - Abstract
The study of microbial communities crucially relies on the comparison of metagenomic next-generation sequencing data sets, for which several methods have been designed in recent years. Here, we review three key challenges in the comparison of such data sets: species identification and quantification, the efficient computation of distances between metagenomic samples and the identification of metagenomic features associated with a phenotype such as disease status. We present current solutions for such challenges, considering both reference-based methods relying on a database of reference genomes and reference-free methods working directly on all sequencing reads from the samples. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
26. MiSoSouP: Mining Interesting Subgroups with Sampling and Pseudodimension.
- Author
-
RIONDATO, MATTEO and VANDIN, FABIO
- Subjects
STATISTICAL learning ,LINGUISTICS ,STATISTICAL sampling - Abstract
We present MiSoSouP, a suite of algorithms for extracting high-quality approximations of the most interesting subgroups, according to different popular interestingness measures, from a random sample of a transactional dataset. We describe a new formulation of these measures as functions of averages, that makes it possible to approximate them using sampling. We then discuss how pseudodimension, a key concept from statistical learning theory, relates to the sample size needed to obtain an high-quality approximation of the most interesting subgroups. We prove an upper bound on the pseudodimension of the problem at hand, which depends on characteristic quantities of the dataset and of the language of patterns of interest. This upper bound then leads to small sample sizes. Our evaluation on real datasets shows that MiSoSouP outperforms state-of-the-art algorithms offering the same guarantees, and it vastly speeds up the discovery of subgroups w.r.t. analyzing the whole dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
27. Fast Approximation of Frequent k-Mers and Applications to Metagenomics.
- Author
-
Pellegrina, Leonardo, Pizzi, Cinzia, and Vandin, Fabio
- Published
- 2020
- Full Text
- View/download PDF
28. An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets.
- Author
-
Kirsch, Adam, Mitzenmacher, Michael, Pietracaprina, Andrea, Pucci, Geppino, Upfal, Eli, and Vandin, Fabio
- Subjects
ALGORITHMS ,STATISTICAL significance ,PATTERN recognition systems ,DATA mining ,FALSE discovery rate ,DATABASE searching - Abstract
As advances in technology allow for the collection, storage, and analysis of vast amounts of data, the task of screening and assessing the significance of discovered patterns is becoming a major challenge in data mining applications. In this work, we address significance in the context of frequent itemset mining. Specifically, we develop a novel methodology to identify a meaningful support threshold s* for a dataset, such that the number of itemsets with support at least s* represents a substantial deviation from what would be expected in a random dataset with the same number of transactions and the same individual item frequencies. These itemsets can then be flagged as statistically significant with a small false discovery rate. We present extensive experimental results to substantiate the effectiveness of our methodology. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
29. Efficient algorithms to discover alterations with complementary functional association in cancer.
- Author
-
Sarto Basso, Rebecca, Hochbaum, Dorit S., and Vandin, Fabio
- Subjects
CANCER genetics ,PERTURBATION theory ,PHENOTYPES ,ALGORITHMS ,COMPUTATIONAL biology - Abstract
Recent large cancer studies have measured somatic alterations in an unprecedented number of tumours. These large datasets allow the identification of cancer-related sets of genetic alterations by identifying relevant combinatorial patterns. Among such patterns, mutual exclusivity has been employed by several recent methods that have shown its effectiveness in characterizing gene sets associated to cancer. Mutual exclusivity arises because of the complementarity, at the functional level, of alterations in genes which are part of a group (e.g., a pathway) performing a given function. The availability of quantitative target profiles, from genetic perturbations or from clinical phenotypes, provides additional information that can be leveraged to improve the identification of cancer related gene sets by discovering groups with complementary functional associations with such targets. In this work we study the problem of finding groups of mutually exclusive alterations associated with a quantitative (functional) target. We propose a combinatorial formulation for the problem, and prove that the associated computational problem is computationally hard. We design two algorithms to solve the problem and implement them in our tool UNCOVER. We provide analytic evidence of the effectiveness of UNCOVER in finding high-quality solutions and show experimentally that UNCOVER finds sets of alterations significantly associated with functional targets in a variety of scenarios. In particular, we show that our algorithms find sets which are better than the ones obtained by the state-of-the-art method, even when sets are evaluated using the statistical score employed by the latter. In addition, our algorithms are much faster than the state-of-the-art, allowing the analysis of large datasets of thousands of target profiles from cancer cell lines. We show that on two such datasets, one from project Achilles and one from the Genomics of Drug Sensitivity in Cancer project, UNCOVER identifies several significant gene sets with complementary functional associations with targets. Software available at: . [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
30. NoMAS: A Computational Approach to Find Mutated Subnetworks Associated With Survival in Genome-Wide Cancer Studies.
- Author
-
Altieri, Federico, Hansen, Tommy V., and Vandin, Fabio
- Subjects
SOMATIC mutation ,LOG-rank test ,CANCER - Abstract
Next-generation sequencing technologies allow to measure somatic mutations in a large number of patients from the same cancer type: one of the main goals in their analysis is the identification of mutations associated with clinical parameters. The identification of such relationships is hindered by extensive genetic heterogeneity in tumors, with different genes mutated in different patients, due, in part, to the fact that genes and mutations act in the context of pathways : it is therefore crucial to study mutations in the context of interactions among genes. In this work we study the problem of identifying subnetworks of a large gene-gene interaction network with mutations associated with survival time. We formally define the associated computational problem by using a score for subnetworks based on the log-rank statistical test to compare the survival of two given populations. We propose a novel approach, based on a new algorithm, called N etwork o f M utations A ssociated with S urvival (NoMAS) to find subnetworks of a large interaction network whose mutations are associated with survival time. NoMAS is based on the color-coding technique, that has been previously employed in other applications to find the highest scoring subnetwork with high probability when the subnetwork score is additive. In our case the score is not additive, so our algorithm cannot identify the optimal solution with the same guarantees associated to additive scores. Nonetheless, we prove that, under a reasonable model for mutations in cancer, NoMAS identifies the optimal solution with high probability. We also design a holdout approach to identify subnetworks significantly associated with survival time. We test NoMAS on simulated and cancer data, comparing it to approaches based on single gene tests and to various greedy approaches. We show that our method does indeed find the optimal solution and performs better than the other approaches. Moreover, on three cancer datasets our method identifies subnetworks with significant association to survival when none of the genes has significant association with survival when considered in isolation. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
31. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia
- Author
-
Ley, Timothy, Miller, Christopher, Ding, Li, Raphael, Benjamin J., Mungall, Andrew J., Robertson, A. Gordon, Hoadley, Katherine, Triche, Timothy J., Laird, Peter W., Baty, Jack D., Fulton, Lucinda L., Fulton, Robert, Heath, Sharon E., Kalicki Veizer, Joelle, Kandoth, Cyriac, Klco, Jeffery M., Koboldt, Daniel C., Kanchi, Krishna Latha, Shashikant, Kulkarni, M. S., P. h. D., F. A. C. M. G., Lamprecht, Tamara L., B. S., Washington, University, Louis, S. t., Larson, David E., P. h. D., Ling, Lin, M. S., Charles, Lu, Mclellan, Michael D., Mcmichael, Joshua F., the Genome Institute at Washington University, Jacqueline, Payton, M. D., P. h. D., Heather, Schmidt, Spencer, David H., Tomasson, Michael H., M. D., Siteman Cancer Center, S. t. Louis, Wallis, John W., Wartman, Lukas D., Watson, Mark A., John, Welch, Wendl, Michael C., Adrian, Ally, B. S. c., Miruna, Balasundaram, B. A. S. c., Inanc, Birol, Yaron, Butterfield, Readman, Chiu, M. S. c., Andy, Chu, Eric, Chuah, Hye Jung Chun, Richard, Corbett, Noreen, Dhalla, Ranabir, Guin, An, He, Carrie, Hirst, Martin, Hirst, Holt, Robert A., Steven, Jones, Aly, Karsan, Darlene, Lee, Haiyan I., Li, Marra, Marco A., Michael, Mayo, Moore, Richard A., Karen, Mungall, Jeremy, Parker, Erin, Pleasance, Patrick, Plettner, Jacquie, Schein, Dominik, Stoll, Lucas, Swanson, Angela, Tam, Nina, Thiessen, Richard, Varhol, Natasja, Wye, Yongjun, Zhao, M. S. c., D. V. M., British Columbia Cancer Agency's Genome Sciences Centre, Vancouver, Canada, Stacey, Gabriel, Gad, Getz, Carrie, Sougnez, Lihua, Zou, Broad Institute of Harvard, Massachusetts Institute of Technology, Cambridge, Ma, Mark D. M. Leiserson, B. A., Vandin, Fabio, Hsin Ta Wu, Brown, University, Center for Computational Molecular Biology, Providence, Ri, Frederick, Applebaum, Fred Hutchinson Cancer Research Center, Division of Medical Oncology, Seattle Cancer Care Alliance, Seattle, Baylin, Stephen B., Johns Hopkins University, Baltimore, Rehan, Akbani, Broom, Bradley M., Ken, Chen, Motter, Thomas C., B. A., Khanh, Nguyen, Weinstein, John N., Nianziang, Zhang, Anderson Cancer Center, University of Texas M. D., Houston, Ferguson, Martin L., Mlf, Consulting, Biotechnology Consultant, Boston, Christopher, Adams, Aaron, Black, Jay, Bowen, Julie Gastier Foster, Thomas, Grossman, Tara, Lichtenberg, Lisa, Wise, the Research Institute at Nationwide Children's Hospital, Columbus, Oh, Tanja, Davidsen, Demchok, John A., Mills Shaw, Kenna R., Margi, Sheth, National Cancer Institute, Bethesda, Md, Sofia, Heidi J., P. h. D., M. P. H., National Human Genome Research Institute, Liming, Yang, Downing, James R., Jude Children's Research Hospital, S. t., Memphis, Greg, Eley, Sciementis, Llc, Statham, Ga, Shelley, Alonso, Brenda, Ayala, Julien, Baboud, Mark, Backus, Barletta, Sean P., Berton, Dominique L., M. S. C. S., Chu, Anna L., Stanley, Girshik, Jensen, Mark A., Ari, Kahn, Prachi, Kothiyal, Nicholls, Matthew C., Pihl, Todd D., Pot, David A., Rohini, Raman, B. E., Sanbhadti, Rashmi N., Snyder, Eric E., Deepak, Srinivasan, Jessica, Walton, Yunhu, Wan, Zhining, Wang, Sra, International, Fairfax, Va, Issa, Jean Pierre J., Temple, University, Philadelphia, Michelle Le Beau, University of Chicago, Chicago, Martin, Carroll, University of Pennsylvania, Hagop Kantarjian, M. D., Steven, Kornblau, Bootwalla, Moiz S., B. S. c., M. S., Lai, Phillip H., Hui, Shen, Van Den Berg, David J., Weisenberger, Daniel J., University of Southern California, Epigenome, Center, Los, Angeles, Daniel C. Link, M. D., Walter, Matthew J., Ozenberger, Bradley A., Mardis, Elaine R., Peter, Westervelt, Graubert, Timothy A., Dipersio, John F., and Wilson, Richard K.
- Subjects
Myeloid ,Adult ,Epigenomics ,Male ,NPM1 ,Gene Expression ,CpG Islands ,DNA Methylation ,Female ,Gene Fusion ,Genome, Human ,Humans ,Leukemia, Myeloid, Acute ,MicroRNAs ,Middle Aged ,Sequence Analysis, DNA ,Mutation ,Acute ,Enasidenib ,Biology ,CEBPA ,Genetics ,Genome ,Leukemia ,Massive parallel sequencing ,MicroRNA sequencing ,Myeloid leukemia ,DNA ,General Medicine ,KMT2A ,biology.protein ,Sequence Analysis ,Nucleophosmin ,Human ,Comparative genomic hybridization - Abstract
BACKGROUND—Many mutations that contribute to the pathogenesis of acute myeloid leukemia (AML) are undefined. The relationships between patterns of mutations and epigenetic phenotypes are not yet clear. METHODS—We analyzed the genomes of 200 clinically annotated adult cases of de novo AML, using either whole-genome sequencing (50 cases) or whole-exome sequencing (150 cases), along with RNA and microRNA sequencing and DNA-methylation analysis. RESULTS—AML genomes have fewer mutations than most other adult cancers, with an average of only 13 mutations found in genes. Of these, an average of 5 are in genes that are recurrently mutated in AML. A total of 23 genes were significantly mutated, and another 237 were mutated in two or more samples. Nearly all samples had at least 1 nonsynonymous mutation in one of nine categories of genes that are almost certainly relevant for pathogenesis, including transcriptionfactor fusions (18% of cases), the gene encoding nucleophosmin (NPM1) (27%), tumorsuppressor genes (16%), DNA-methylation–related genes (44%), signaling genes (59%), chromatin-modifying genes (30%), myeloid transcription-factor genes (22%), cohesin-complex genes (13%), and spliceosome-complex genes (14%). Patterns of cooperation and mutual exclusivity suggested strong biologic relationships among several of the genes and categories. CONCLUSIONS—We identified at least one potential driver mutation in nearly all AML samples and found that a complex interplay of genetic events contributes to AML pathogenesis in individual patients. The databases from this study are widely available to serve as a foundation for further investigations of AML pathogenesis, classification, and risk stratification. (Funded by the National Institutes of Health.) The molecular pathogenesis of acute myeloid leukemia (AML) has been studied with the use of cytogenetic analysis for more than three decades. Recurrent chromosomal structural variations are well established as diagnostic and prognostic markers, suggesting that acquired genetic abnormalities (i.e., somatic mutations) have an essential role in pathogenesis. 1,2 However, nearly 50% of AML samples have a normal karyotype, and many of these genomes lack structural abnormalities, even when assessed with high-density comparative genomic hybridization or single-nucleotide polymorphism (SNP) arrays 3-5 (see Glossary). Targeted sequencing has identified recurrent mutations in FLT3, NPM1, KIT, CEBPA, and TET2. 6-8 Massively parallel sequencing enabled the discovery of recurrent mutations in DNMT3A 9,10 and IDH1. 11 Recent studies have shown that many patients with
- Published
- 2013
32. De novo pathway-based biomarker identification.
- Author
-
Alcaraz, Nicolas, List, Markus, Batra, Richa, Vandin, Fabio, Ditzel, Henrik J., and Baumbach, Jan
- Published
- 2017
- Full Text
- View/download PDF
33. Computational Methods for Characterizing Cancer Mutational Heterogeneity.
- Author
-
Vandin, Fabio
- Subjects
NUCLEOTIDE sequencing ,CANCER genetics ,HETEROGENEITY - Abstract
Advances in DNA sequencing technologies have allowed the characterization of somatic mutations in a large number of cancer genomes at an unprecedented level of detail, revealing the extreme genetic heterogeneity of cancer at two different levels: inter-tumor, with different patients of the same cancer type presenting different collections of somatic mutations, and intra-tumor, with different clones coexisting within the same tumor. Both inter-tumor and intra-tumor heterogeneity have crucial implications for clinical practices. Here, we review computational methods that use somatic alterations measured through next-generation DNA sequencing technologies for characterizing tumor heterogeneity and its association with clinical variables. We first review computational methods for studying inter-tumor heterogeneity, focusing on methods that attempt to summarize cancer heterogeneity by discovering pathways that are commonly mutated across different patients of the same cancer type. We then review computational methods for characterizing intra-tumor heterogeneity using information from bulk sequencing data or from single cell sequencing data. Finally, we present some of the recent computational methodologies that have been proposed to identify and assess the association between inter- or intra-tumor heterogeneity with clinical variables. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
34. On the Sample Complexity of Cancer Pathways Identification.
- Author
-
Vandin, Fabio, Raphael, Benjamin J., and Upfal, Eli
- Subjects
- *
NUCLEOTIDE sequencing , *CANCER genetics , *SOMATIC mutation , *MACHINE learning , *GENOMICS - Abstract
Advances in DNA sequencing technologies have enabled large cancer sequencing studies, collecting somatic mutation data from a large number of cancer patients. One of the main goals of these studies is the identification of all cancer genes-genes associated with cancer. Its achievement is complicated by the extensive mutational heterogeneity of cancer, due to the fact that important mutations in cancer target combinations of genes (i.e., pathways). Recently, the pattern of mutual exclusivity among mutations in a cancer pathway has been observed, and methods that find significant combinations of cancer genes by detecting mutual exclusivity have been proposed. A key question in the analysis of mutual exclusivity is the computation of the minimum number of samples required to reliably find a meaningful set of mutually exclusive mutations in the data, or conclude that there is no such set. In general, the problem of determining the sample complexity, or the number of samples required to identify significant combinations of features, of genomic problems is largely unexplored. In this work we propose a framework to analyze the sample complexity of problems that arise in the study of genomic datasets. Our framework is based on tools from combinatorial analysis and statistical learning theory that have been used for the analysis of machine learning and probably approximately correct (PAC) learning. We use our framework to analyze the problem of the identification of cancer pathways through mutual exclusivity analysis. We analytically derive matching upper and lower bounds on the sample complexity of the problem, showing that sample sizes much larger than currently available may be required to identify all the cancer genes in a pathway. We also provide two algorithms to find a cancer pathway from a large genomic dataset. On simulated and cancer data, we show that our algorithms can be used to identify cancer pathways from large genomic datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
35. CoMEt: a statistical approach to identify combinations of mutually exclusive alterations in cancer.
- Author
-
Leiserson, Mark D. M., Hsin-Ta Wu, Vandin, Fabio, and Raphael, Benjamin J.
- Published
- 2015
- Full Text
- View/download PDF
36. Simultaneous Inference of Cancer Pathways and Tumor Progression from Cross-Sectional Mutation Data.
- Author
-
Raphael, Benjamin J. and Vandin, Fabio
- Subjects
- *
SOMATIC mutation , *CANCER cells , *TUMORS , *NUCLEOTIDE sequence , *CANCER patients - Abstract
Recent cancer sequencing studies provide a wealth of somatic mutation data from a large number of patients. One of the most intriguing and challenging questions arising from this data is to determine whether the temporal order of somatic mutations in a cancer follows any common progression. Since we usually obtain only one sample from a patient, such inferences are commonly made from cross-sectional data from different patients. This analysis is complicated by the extensive variation in the somatic mutations across different patients, variation that is reduced by examining combinations of mutations in various pathways. Thus far, methods to reconstruct tumor progression at the pathway level have restricted attention to known, a priori defined pathways. In this work we show how to simultaneously infer pathways and the temporal order of their mutations from cross-sectional data, leveraging on the exclusivity property of driver mutations within a pathway. We define the pathway linear progression model, and derive a combinatorial formulation for the problem of finding the optimal model from mutation data. We show that with enough samples the optimal solution to this problem uniquely identifies the correct model with high probability even when errors are present in the mutation data. We then formulate the problem as an integer linear program (ILP), which allows the analysis of datasets from recent studies with large numbers of samples. We use our algorithm to analyze somatic mutation data from three cancer studies, including two studies from The Cancer Genome Atlas (TCGA) on large number of samples on colorectal cancer and glioblastoma. The models reconstructed with our method capture most of the current knowledge of the progression of somatic mutations in these cancer types, while also providing new insights on the tumor progression at the pathway level. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
37. Accurate Computation of Survival Statistics in Genome-Wide Studies.
- Author
-
Vandin, Fabio, Papoutsaki, Alexandra, Raphael, Benjamin J., and Upfal, Eli
- Subjects
- *
SURVIVAL analysis (Biometry) , *CANCER genetics , *GENOMICS , *LOG-rank test , *SOMATIC mutation , *NUCLEOTIDE sequencing , *ALGORITHMS - Abstract
A key challenge in genomics is to identify genetic variants that distinguish patients with different survival time following diagnosis or treatment. While the log-rank test is widely used for this purpose, nearly all implementations of the log-rank test rely on an asymptotic approximation that is not appropriate in many genomics applications. This is because: the two populations determined by a genetic variant may have very different sizes; and the evaluation of many possible variants demands highly accurate computation of very small p-values. We demonstrate this problem for cancer genomics data where the standard log-rank test leads to many false positive associations between somatic mutations and survival time. We develop and analyze a novel algorithm, Exact Log-rank Test (ExaLT), that accurately computes the p-value of the log-rank statistic under an exact distribution that is appropriate for any size populations. We demonstrate the advantages of ExaLT on data from published cancer genomics studies, finding significant differences from the reported p-values. We analyze somatic mutations in six cancer types from The Cancer Genome Atlas (TCGA), finding mutations with known association to survival as well as several novel associations. In contrast, standard implementations of the log-rank test report dozens-hundreds of likely false positive associations as more significant than these known associations. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
38. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes.
- Author
-
Leiserson, Mark D M, Raphael, Benjamin J, Thomas, Jacob L, Vandin, Fabio, Ding, Li, Eldridge, Jonathan V, Kim, Younhun, McLellan, Michael, Gonzalez-Perez, Abel, Cheng, Yuwei, Dobson, Jason R, Niu, Beifang, Ryslik, Gregory A, Lopez-Bigas, Nuria, Wu, Hsin-Ta, Getz, Gad, Papoutsaki, Alexandra, Lawrence, Michael S, and Tamborero, David
- Subjects
CANCER ,DNA mutational analysis ,HETEROGENEITY ,SOMATIC hybrids ,PROTEINS - Abstract
Cancers exhibit extensive mutational heterogeneity, and the resulting long-tail phenomenon complicates the discovery of genes and pathways that are significantly mutated in cancer. We perform a pan-cancer analysis of mutated networks in 3,281 samples from 12 cancer types from The Cancer Genome Atlas (TCGA) using HotNet2, a new algorithm to find mutated subnetworks that overcomes the limitations of existing single-gene, pathway and network approaches. We identify 16 significantly mutated subnetworks that comprise well-known cancer signaling pathways as well as subnetworks with less characterized roles in cancer, including cohesin, condensin and others. Many of these subnetworks exhibit co-occurring mutations across samples. These subnetworks contain dozens of genes with rare somatic mutations across multiple cancers; many of these genes have additional evidence supporting a role in cancer. By illuminating these rare combinations of mutations, pan-cancer network analyses provide a roadmap to investigate new diagnostic and therapeutic opportunities across cancer types. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
39. Efficient detection of differentially methylated regions using DiMmeR.
- Author
-
Almeida, Diogo, Skov, Ida, Silva, Artur, Vandin, Fabio, Qihua Tan, Röttger, Richard, and Baumbach, Jan
- Subjects
CHEMICALS ,DNA methylation ,EPIDEMIOLOGY ,METABOLISM ,GRAPHICAL user interfaces - Abstract
Motivation: Epigenome-wide association studies (EWAS) generate big epidemiological datasets. They aim for detecting differentially methylated DNA regions that are likely to influence transcriptional gene activity and, thus, the regulation of metabolic processes. The by far most widely used technology is the Illumina Methylation BeadChip, which measures the methylation levels of 450 (850) thousand cytosines, in the CpG dinucleotide context in a set of patients compared to a control group. Many bioinformatics tools exist for raw data analysis. However, most of them require some knowledge in the programming language R, have no user interface, and do not offer all necessary steps to guide users from raw data all the way down to statistically significant differentially methylated regions (DMRs) and the associated genes. Results: Here, we present DiMmeR (Discovery of Multiple Differentially Methylated Regions), the first free standalone software that interactively guides with a user-friendly graphical user interface (GUI) scientists the whole way through EWAS data analysis. It offers parallelized statistical methods for efficiently identifying DMRs in both Illumina 450K and 850K EPIC chip data. DiMmeR computes empirical P-values through randomization tests, even for big datasets of hundreds of patients and thousands of permutations within a few minutes on a standard desktop PC. It is independent of any third-party libraries, computes regression coefficients, P-values and empirical P-values, and it corrects for multiple testing. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
40. Ballast: A Ball-based Algorithm for Structural Motifs.
- Author
-
He, Lu, Vandin, Fabio, Pandurangan, Gopal, and Bailey-Kellogg, Chris
- Subjects
- *
AMINO acid sequence , *PROTEIN structure , *MOLECULAR biology , *DATABASE evaluation , *COMPUTATIONAL complexity , *CHEMICAL warfare agents - Abstract
Structural motifs encapsulate local sequence-structure-function relationships characteristic of related proteins, enabling the prediction of functional characteristics of new proteins, providing molecular-level insights into how those functions are performed, and supporting the development of variants specifically maintaining or perturbing function in concert with other properties. Numerous computational methods have been developed to search through databases of structures for instances of specified motifs. However, it remains an open problem how best to leverage the local geometric and chemical constraints underlying structural motifs in order to develop motif-finding algorithms that are both theoretically and practically efficient. We present a simple, general, efficient approach, called B allast (ball-based algorithm for structural motifs), to match given structural motifs to given structures. B allast combines the best properties of previously developed methods, exploiting the composition and local geometry of a structural motif and its possible instances in order to effectively filter candidate matches. We show that on a wide range of motif-matching problems, B allast efficiently and effectively finds good matches, and we provide theoretical insights into why it works well. By supporting generic measures of compositional and geometric similarity, B allast provides a powerful substrate for the development of motif-matching algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
41. MADMX: A Strategy for Maximal Dense Motif Extraction.
- Author
-
GROSSI, ROBERTO, PIETRACAPRINA, ANDREA, PISANTI, NADIA, PUCCI, GEPPINO, UPFAL, ELI, and VANDIN, FABIO
- Published
- 2011
- Full Text
- View/download PDF
42. Pan-Cancer Network Analysis Identifies Combinations of Rare Somatic Mutations across Pathways and Protein Complexes
- Author
-
Leiserson, Mark D.M., Vandin, Fabio, Wu, Hsin-Ta, Dobson, Jason R., Eldridge, Jonathan V., Thomas, Jacob L., Papoutsaki, Alexandra, Kim, Younhun, Niu, Beifang, McLellan, Michael, Lawrence, Michael S., Gonzalez-Perez, Abel, Tamborero, David, Cheng, Yuwei, Ryslik, Gregory A., Lopez-Bigas, Nuria, Getz, Gad, Ding, Li, and Raphael, Benjamin J.
- Abstract
Cancers exhibit extensive mutational heterogeneity and the resulting long tail phenomenon complicates the discovery of the genes and pathways that are significantly mutated in cancer. We perform a Pan-Cancer analysis of mutated networks in 3281 samples from 12 cancer types from The Cancer Genome Atlas (TCGA) using HotNet2, a novel algorithm to find mutated subnetworks that overcomes limitations of existing single gene and pathway/network approaches.. We identify 14 significantly mutated subnetworks that include well-known cancer signaling pathways as well as subnetworks with less characterized roles in cancer including cohesin, condensin, and others. Many of these subnetworks exhibit co-occurring mutations across samples. These subnetworks contain dozens of genes with rare somatic mutations across multiple cancers; many of these genes have additional evidence supporting a role in cancer. By illuminating these rare combinations of mutations, Pan-Cancer network analyses provide a roadmap to investigate new diagnostic and therapeutic opportunities across cancer types.
- Published
- 2014
- Full Text
- View/download PDF
43. Mining Sequential Patterns with VC-Dimension and Rademacher Complexity.
- Author
-
Santoro, Diego, Tonon, Andrea, and Vandin, Fabio
- Subjects
SEQUENTIAL pattern mining ,STATISTICAL learning ,DATA mining ,APPROXIMATION algorithms - Abstract
Sequential pattern mining is a fundamental data mining task with application in several domains. We study two variants of this task—the first is the extraction of frequent sequential patterns, whose frequency in a dataset of sequential transactions is higher than a user-provided threshold; the second is the mining of true frequent sequential patterns, which appear with probability above a user-defined threshold in transactions drawn from the generative process underlying the data. We present the first sampling-based algorithm to mine, with high confidence, a rigorous approximation of the frequent sequential patterns from massive datasets. We also present the first algorithms to mine approximations of the true frequent sequential patterns with rigorous guarantees on the quality of the output. Our algorithms are based on novel applications of Vapnik-Chervonenkis dimension and Rademacher complexity, advanced tools from statistical learning theory, to sequential pattern mining. Our extensive experimental evaluation shows that our algorithms provide high-quality approximations for both problems we consider. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
44. Erratum to: CoMEt: a statistical approach to identify combinations of mutually exclusive alterations in cancer.
- Author
-
Leiserson, Mark D. M., Hsin-Ta Wu, Vandin, Fabio, and Raphael, Benjamin J.
- Published
- 2016
- Full Text
- View/download PDF
45. Identifying significant mutations in large cohorts of cancer genomes.
- Author
-
Vandin, Fabio, Upfal, Eli, and Raphael, Ben
- Published
- 2013
- Full Text
- View/download PDF
46. Workshop: Algorithms for discovery of mutated pathways in cancer.
- Author
-
Vandin, Fabio, Upfal, Eli, and Raphael, Benjamin J.
- Abstract
We apply our algorithms to several cancer types including glioblastoma multiforme (GBM), lung adenocarcinoma, and ovarian carcinoma (OV). HotNet identifies significant subnetworks that are part of well-known cancer pathways as well as novel subnetworks. Among the most significant subnetworks identified in OV data is the Notch signaling pathway, and this result appears in the first TCGA OV publication [1]. We also extend HotNet to identify mutated pathways associated with patient survival [2]. In the TCGA OV data, we discover 9 subnetworks containing genes whose mutations are associated with survival. Genes in 4 of these subnetworks overlap pathways known to be associated to survival, including focal adhesion and cell adhesion pathways. In GBM and lung Dendrix finds significant sets of genes that are mutated in large subsets of patients and whose mutations are approximately exclusive, including genes in well known cancer pathways (e.g., Rb1 and p53 pathways). [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
47. Algorithms and Genome Sequencing: Identifying Driver Pathways in Cancer.
- Author
-
Vandin, Fabio, Upfal, Eli, and Raphael, Benjamin
- Subjects
- *
NUCLEOTIDE sequence , *ALGORITHMS , *CANCER genetics , *GENETIC mutation , *BIOTECHNOLOGY - Abstract
Two proposed algorithms predict which combinations of mutations in cancer genomes are priorities for experimental study. One relies on interaction network data to identify recurrently mutated sets of genes, while the other searches for groups of mutations that exhibit specific combinatorial properties. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
48. De novo discovery of mutated driver pathways in cancer.
- Author
-
Vandin, Fabio, Upfal, Eli, and Raphael, Benjamin J.
- Subjects
- *
NUCLEOTIDE sequence , *GENOMES , *SOMATIC mutation , *CANCER patients , *GENES , *ALGORITHMS - Abstract
Next-generation DNA sequencing technologies are enabling genome-wide measurements of somatic mutations in large numbers of cancer patients. A major challenge in the interpretation of these data is to distinguish functional "driver mutations" important for cancer development from random "passenger mutations." A common approach for identifying driver mutations is to find genes that are mutated at significant frequency in a large cohort of cancer genomes. This approach is confounded by the observation that driver mutations target multiple cellular signaling and regulatory pathways. Thus, each cancer patient may exhibit a different combination of mutations that are sufficient to perturb these pathways. This mutational heterogeneity presents a problem for predicting driver mutations solely from their frequency of occurrence. We introduce two combinatorial properties, coverage and exclusivity, that distinguish driver pathways, or groups of genes containing driver mutations, from groups of genes with passenger mutations. We derive two algorithms, called Dendrix, to find driver pathways de novo from somatic mutation data. We apply Dendrix to analyze somatic mutation data from 623 genes in 188 lung adenocarcinoma patients, 601 genes in 84 glioblastoma patients, and 238 known mutations in 1000 patients with various cancers. In all data sets, we find groups of genes that are mutated in large subsets of patients and whose mutations are approximately exclusive. Our Dendrix algorithms scale to whole-genome analysis of thousands of patients and thus will prove useful for larger data sets to come from The Cancer Genome Atlas (TCGA) and other large-scale cancer genome sequencing projects. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.