19 results on '"Kundeti, Vamsi"'
Search Results
2. Efficient algorithms for self assembling non-rectangular nano structures
- Author
-
Kundeti, Vamsi and Rajasekaran, Sanguthevar
- Published
- 2011
- Full Text
- View/download PDF
3. Optimal Algorithms for Some Polygon Enclosure Problems for VLSI Layout Analysis
- Author
-
Kundeti, Vamsi Krishna and Gupta, Prosenjit
- Published
- 2006
- Full Text
- View/download PDF
4. Minimotif miner 2nd release: a database and web system for motif search
- Author
-
Rajasekaran, Sanguthevar, Balla, Sudha, Gradie, Patrick, Gryk, Michael R., Kadaveru, Krishna, Kundeti, Vamsi, Maciejewski, Mark W., Mi, Tian, Rubino, Nicholas, Vyas, Jay, and Schiller, Martin R.
- Published
- 2009
5. PMS5: an efficient exact algorithm for the (ℓ, d)-motif finding problem
- Author
-
Dinh Hieu, Rajasekaran Sanguthevar, and Kundeti Vamsi K
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Motifs are patterns found in biological sequences that are vital for understanding gene function, human disease, drug design, etc. They are helpful in finding transcriptional regulatory elements, transcription factor binding sites, and so on. As a result, the problem of identifying motifs is very crucial in biology. Results Many facets of the motif search problem have been identified in the literature. One of them is (ℓ, d)-motif search (or Planted Motif Search (PMS)). The PMS problem has been well investigated and shown to be NP-hard. Any algorithm for PMS that always finds all the (ℓ, d)-motifs on a given input set is called an exact algorithm. In this paper we focus on exact algorithms only. All the known exact algorithms for PMS take exponential time in some of the underlying parameters in the worst case scenario. But it does not mean that we cannot design exact algorithms for solving practical instances within a reasonable amount of time. In this paper, we propose a fast algorithm that can solve the well-known challenging instances of PMS: (21, 8) and (23, 9). No prior exact algorithm could solve these instances. In particular, our proposed algorithm takes about 10 hours on the challenging instance (21, 8) and about 54 hours on the challenging instance (23, 9). The algorithm has been run on a single 2.4GHz PC with 3GB RAM. The implementation of PMS5 is freely available on the web at http://www.pms.engr.uconn.edu/downloads/PMS5.zip. Conclusions We present an efficient algorithm PMS5 that uses some novel ideas and combines them with well-known algorithm PMS1 and PMSPrune. PMS5 can tackle the large challenging instances (21, 8) and (23, 9). Therefore, we hope that PMS5 will help biologists discover longer motifs in the futures.
- Published
- 2011
- Full Text
- View/download PDF
6. Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs
- Author
-
Vaughn Matthew, Dinh Hieu, Rajasekaran Sanguthevar, Kundeti Vamsi K, and Thapar Vishal
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p) time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(nΣ) messages (Σ being the size of the alphabet). Results In this paper we present a Θ(n/p) time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Θ(nlog(n/B)Blog(M/B)) (M being the main memory size and B being the size of the disk block). We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster - both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem. Conclusions The bi-directed de Bruijn graph is a fundamental data structure for any sequence assembly program based on Eulerian approach. Our algorithms for constructing Bi-directed de Bruijn graphs are efficient in parallel and out of core settings. These algorithms can be used in building large scale bi-directed de Bruijn graphs. Furthermore, our algorithms do not employ any all-to-all communications in a parallel setting and perform better than the prior algorithms. Finally our out-of-core algorithm is extremely memory efficient and can replace the existing graph construction algorithm in VELVET.
- Published
- 2010
- Full Text
- View/download PDF
7. MimoSA: a system for minimotif annotation
- Author
-
Kundeti Vamsi, Gryk Michael R, Kadaveru Krishna, Sargeant David, Meusburger Thomas, Nowling Ronald J, Vyas Jay, Rajasekaran Sanguthevar, and Schiller Martin R
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Minimotifs are short peptide sequences within one protein, which are recognized by other proteins or molecules. While there are now several minimotif databases, they are incomplete. There are reports of many minimotifs in the primary literature, which have yet to be annotated, while entirely novel minimotifs continue to be published on a weekly basis. Our recently proposed function and sequence syntax for minimotifs enables us to build a general tool that will facilitate structured annotation and management of minimotif data from the biomedical literature. Results We have built the MimoSA application for minimotif annotation. The application supports management of the Minimotif Miner database, literature tracking, and annotation of new minimotifs. MimoSA enables the visualization, organization, selection and editing functions of minimotifs and their attributes in the MnM database. For the literature components, Mimosa provides paper status tracking and scoring of papers for annotation through a freely available machine learning approach, which is based on word correlation. The paper scoring algorithm is also available as a separate program, TextMine. Form-driven annotation of minimotif attributes enables entry of new minimotifs into the MnM database. Several supporting features increase the efficiency of annotation. The layered architecture of MimoSA allows for extensibility by separating the functions of paper scoring, minimotif visualization, and database management. MimoSA is readily adaptable to other annotation efforts that manually curate literature into a MySQL database. Conclusions MimoSA is an extensible application that facilitates minimotif annotation and integrates with the Minimotif Miner database. We have built MimoSA as an application that integrates dynamic abstract scoring with a high performance relational model of minimotif syntax. MimoSA's TextMine, an efficient paper-scoring algorithm, can be used to dynamically rank papers with respect to context.
- Published
- 2010
- Full Text
- View/download PDF
8. Border Length Minimization Problem on a Square Array.
- Author
-
Kundeti, Vamsi, Rajasekaran, Sanguthevar, and Dinh, Hieu
- Subjects
- *
PROTEIN microarrays , *CANCER diagnosis , *BIOMARKERS , *DNA microarrays , *MATHEMATICAL proofs , *DNA probes - Abstract
Protein/peptide microarrays are rapidly gaining momentum in the diagnosis of cancer. High-density and high-throughput peptide arrays are being extensively used to detect tumor biomarkers, examine kinase activity, identify antibodies having low serum titers, and locate antibody signatures. Improving the yield of microarray fabrication involves solving a hard combinatorial optimization problem called the border length minimization problem (BLMP). An important question that remained open for the past 7 years is if the BLMP is tractable or not. We settle this open problem by proving that the BLMP is -hard. We also present a hierarchical refinement algorithm that can refine any heuristic solution for the BLMP and prove that the TSP+1-threading heuristic is an O( N)-approximation. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
9. Secondary Structure, a Missing Component of Sequence-Based Minimotif Definitions.
- Author
-
Sargeant, David P., Gryk, Michael R., Maciejewski, Mark W., Thapar, Vishal, Kundeti, Vamsi, Rajasekaran, Sanguthevar, Romero, Pedro, Dunker, Keith, Shun-Cheng Li, Kaneko, Tomonori, and Schiller, Martin R.
- Subjects
PROTEIN-protein interactions ,CELLS ,GENETIC transduction ,MOLECULAR association ,BIOMOLECULES ,ORGANIC compounds - Abstract
Minimotifs are short contiguous segments of proteins that have a known biological function. The hundreds of thousands of minimotifs discovered thus far are an important part of the theoretical understanding of the specificity of protein-protein interactions, posttranslational modifications, and signal transduction that occur in cells. However, a longstanding problem is that the different abstractions of the sequence definitions do not accurately capture the specificity, despite decades of effort by many labs. We present evidence that structure is an essential component of minimotif specificity, yet is not used in minimotif definitions. Our analysis of several known minimotifs as case studies, analysis of occurrences of minimotifs in structured and disordered regions of proteins, and review of the literature support a new model for minimotif definitions that includes sequence, structure, and function. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
10. AN EFFICIENT ALGORITHM FOR CHINESE POSTMAN WALK ON BI-DIRECTED DE BRUIJN GRAPHS.
- Author
-
KUNDETI, VAMSI, RAJASEKARAN, SANGUTHEVAR, and DINH, HEIU
- Abstract
Sequence assembly from short reads is an important problem in biology. It is known that solving the sequence assembly problem exactly on a bi-directed de Bruijn graph or a string graph is intractable. However, finding a shortest double stranded DNA string (SDDNA) containing all the k-long words in the reads seems to be a good heuristic to get close to the original genome. This problem is equivalent to finding a cyclic Chinese Postman (CP) walk on the underlying unweighted bi-directed de Bruijn graph built from the reads. The Chinese Postman walk Problem (CPP) is solved by reducing it to a general bi-directed flow on this graph which runs in O(|E|2log2(|V|)) time. In this paper we show that the cyclic CPP on bi-directed graphs can be solved without reducing it to bi-directed flow. We present a Θ(p(|V| + |E|)log(|V|) + (dmaxp)3) time algorithm to solve the cyclic CPP on a weighted bi-directed de Bruijn graph, where p = max{|{v|din(v) - dout(v) > 0}|, |{v|din(v) - dout(v) < 0}|} and dmax = max{|din(v) - dout(v)}. Our algorithm performs asymptotically better than the bi-directed flow algorithm when the number of imbalanced nodes p is much less than the nodes in the bi-directed graph. From our experimental results on various datasets, we have noticed that the value of p/|V| lies between 0.08% and 0.13% with 95% probability. Many practical bi-directed de Bruijn graphs do not have cyclic CP walks. In such cases it is not clear how the bi-directed flow can be useful in identifying contigs. Our algorithm can handle such situations and identify maximal bi-directed sub-graphs that have CP walks. A Θ(p(|V| + |E|)) time heuristic algorithm based on these ideas has been implemented for the SDDNA problem. This algorithm was tested on short reads from a plant genome and achieves an approximation ratio of at most 1.0134. We also present a Θ((|V| + |E|)log(V)) time algorithm for the single source shortest path problem on bi-directed de Bruijn graphs, which may be of independent interest. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
11. Efficient out-of-core sorting algorithms for the Parallel Disks Model
- Author
-
Kundeti, Vamsi and Rajasekaran, Sanguthevar
- Subjects
- *
ALGORITHMS , *MATHEMATICAL models , *COMPUTER storage devices , *SORTING (Electronic computers) , *INFORMATION storage & retrieval systems , *MATHEMATICAL constants - Abstract
Abstract: In this paper, 1 [1] A preliminary version of this paper has been presented in HiPC 2008 Kundeti and Rajasekaran (2008) . we present efficient algorithms for sorting on the Parallel Disks Model (PDM). Numerous asymptotically optimal algorithms have been proposed in the literature. However, many of these merge based algorithms have large underlying constants in the time bounds, because they suffer from the lack of read parallelism on the PDM. The irregular consumption of the runs during the merge affects the read parallelism and contributes to the increased sorting time. In this paper, we first introduce a novel idea called the dirty sequence accumulation that improves the read parallelism. Next, we show analytically that this idea can reduce the number of parallel I/O’s required to sort the input close to the lower bound of . We verify experimentally our dirty sequence idea with the standard -Way merge and show that our idea can reduce the number of parallel I/Os to sort on the PDM significantly. [Copyright &y& Elsevier]
- Published
- 2011
- Full Text
- View/download PDF
12. Efficient Algorithms for Computing With Protein-Based Volumetric Memory Processors.
- Author
-
Rajasekaran, Sanguthevar, Kundeti, Vamsi, Birge, Robert, Kumar, Vipin, and Sahni, Sartaj
- Abstract
With an ever increasing volume of digital data there is a huge increase in the demand for much faster, smaller, and denser storage technologies. Conventional 2-D (surface) storage/memory technologies may soon be replaced with much faster and denser 3-D volumetric (holographic) storage technologies. Photo sensitive protein bacteriorhodopsin has been proven to have great chemical, thermal, and holographic properties and is a good choice for both associative and volumetric memories. Associative memory systems have a wide range of practical applications. However, there is a lack of a formal computational model that can be used to analyze the performance of different algorithms on architectures that support associative memory. We first address this issue by defining a new computational model on protein-based associative memory processors. We also present and analyze algorithms for several fundamental problems on this new model. Secondly, we employ balanced modulated codes in volumetric memories to reduce the bit error rate and improve fidelity. Conventional coding schemes such as 6:8 coding, limit the size of the page to 8 bits and achieve a code rate (utility) of only 75\%. As the technology matures we need efficient algorithms to produce these codes with better utility. In this paper,^1 we address this problem and give algorithms that can generate these codes with superior utility. [ABSTRACT FROM PUBLISHER]
- Published
- 2011
- Full Text
- View/download PDF
13. A computational tool for identifying minimotifs in protein-protein interactions and improving the accuracy of minimotif predictions.
- Author
-
Rajasekaran, Sanguthevar, Merlin, Jerlin Camilus, Kundeti, Vamsi, Mi, Tian, Oommen, Aaron, Vyas, Jay, Alaniz, Izua, Chung, Keith, Chowdhury, Farah, Deverasatty, Sandeep, Irvey, Tenisha M., Lacambacal, David, Lara, Darlene, Panchangam, Subhasree, Rathnayake, Viraj, Watts, Paula, and Schiller, Martin R.
- Abstract
Protein-protein interactions are important to understanding cell functions; however, our theoretical understanding is limited. There is a general discontinuity between the well-accepted physical and chemical forces that drive protein-protein interactions and the large collections of identified protein-protein interactions in various databases. Minimotifs are short functional peptide sequences that provide a basis to bridge this gap in knowledge. However, there is no systematic way to study minimotifs in the context of protein-protein interactions or vice versa. Here we have engineered a set of algorithms that can be used to identify minimotifs in known protein-protein interactions and implemented this for use by scientists in Minimotif Miner. By globally testing these algorithms on verified data and on 100 individual proteins as test cases, we demonstrate the utility of these new computation tools. This tool also can be used to reduce false-positive predictions in the discovery of novel minimotifs. The statistical significance of these algorithms is demonstrated by an ROC analysis ( P = 0.001). Proteins 2010. © 2010 Wiley-Liss, Inc. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
14. ON THE HARDNESS OF THE BORDER LENGTH MINIMIZATION PROBLEM ON A RECTANGULAR ARRAY.
- Author
-
KUNDETI, VAMSI, RAJASEKARAN, SANGUTHEVAR, and Sahni, S
- Subjects
- *
DNA microarrays , *COMBINATORIAL optimization , *MATHEMATICAL proofs , *ALGORITHMS , *PROBLEM solving , *LINEAR programming , *COMBINATORICS - Abstract
DNA microarray technology has proven to be an invaluable tool for molecular biologists. Microarrays are used extensively in SNP detection, genomic hybridization, alternative splicing and gene expression profiling. However the manufacturers of the microarrays are often stuck with the problem of minimizing the effects of unwanted illumination (border length minimization (BLM)) which is a hard combinatorial problem. In this paper we prove that the BLM problem on a rectangular grid is NP-hard - this however does not mean the BLM problem on a square grid is NP-hard. We also give the first integer linear programming (ILP) formulation to solve BLM problem optimally. Experimental results indicate that our ILP method produces superior results (both in runtime and cost) compared to the current state of the art algorithms to solve the BLM problem optimally. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
15. Fast algorithms for placing large entries along the diagonal of a sparse matrix
- Author
-
Kundeti, Vamsi and Rajasekaran, Sanguthevar
- Subjects
- *
ALGORITHMS , *SPARSE matrices , *NUMERICAL solutions to linear differential equations , *SIMULATION methods & models , *PERMUTATIONS , *ITERATIVE methods (Mathematics) , *FACTORIZATION , *STOCHASTIC convergence - Abstract
Abstract: Solving a sparse system of linear equations is one of the most fundamental operations inside any circuit simulator. The equations/rows in the matrix are often rearranged/permuted before factorization and applying direct or iterative methods to obtain the solution. Permuting the rows of the matrix so that the entries with large absolute values lie on the diagonal has several advantages like better numerical stability for direct methods (e.g., Gaussian elimination) and faster convergence for indirect methods (such as the Jacobi method). Duff (2009) has formulated this as a weighted bipartite matching problem (the MC64 algorithm). In this paper we improve the performance of the MC64 algorithm with a new labeling technique which improves the asymptotic complexity of updating dual variables from to , where is the order of the matrix and is the number of non-zeros. Experimental results from using the new algorithm, when benchmarked with both industry benchmarks and UFL sparse matrix collection, are very promising. Our algorithm is more than 60 times faster (than Duff’s algorithm) for sparse matrices with at least a million non-zeros. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
16. Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.
- Author
-
Kundeti, Vamsi K., Rajasekaran, Sanguthevar, Hieu Dinh, Vaughn, Matthew, and Thapar, Vishal
- Subjects
- *
GENOMICS , *COMPUTATIONAL biology , *ALGORITHMS , *GRAPHIC methods , *SCALABILITY - Abstract
Background: Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p) time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating ϴ(nΣ) messages (Σ being the size of the alphabet). Results: In this paper we present a ϴ(n/p) time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of ϴ (log(n/B)/Blog(M/B) (M being the main memory size and B being the size of the disk block). We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster - both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem. Conclusions: The bi-directed de Bruijn graph is a fundamental data structure for any sequence assembly program based on Eulerian approach. Our algorithms for constructing Bi-directed de Bruijn graphs are efficient in parallel and out of core settings. These algorithms can be used in building large scale bi-directed de Bruijn graphs. Furthermore, our algorithms do not employ any all-to-all communications in a parallel setting and perform better than the prior algorithms. Finally our out-of-core algorithm is extremely memory efficient and can replace the existing graph construction algorithm in VELVET. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
17. MimoSA: a system for minimotif annotation.
- Author
-
Vyas, Jay, Nowling, Ronald J., Meusburger, Thomas, Sargeant, David, Kadaveru, Krishna, Gryk, Michael R., Kundeti, Vamsi, Rajasekaran, Sanguthevar, and Schiller, Martin R.
- Subjects
PEPTIDES ,AMINO acid sequence ,DATABASE searching ,MACHINE learning ,MACHINE theory - Abstract
Background: Minimotifs are short peptide sequences within one protein, which are recognized by other proteins or molecules. While there are now several minimotif databases, they are incomplete. There are reports of many minimotifs in the primary literature, which have yet to be annotated, while entirely novel minimotifs continue to be published on a weekly basis. Our recently proposed function and sequence syntax for minimotifs enables us to build a general tool that will facilitate structured annotation and management of minimotif data from the biomedical literature. Results: We have built the MimoSA application for minimotif annotation. The application supports management of the Minimotif Miner database, literature tracking, and annotation of new minimotifs. MimoSA enables the visualization, organization, selection and editing functions of minimotifs and their attributes in the MnM database. For the literature components, Mimosa provides paper status tracking and scoring of papers for annotation through a freely available machine learning approach, which is based on word correlation. The paper scoring algorithm is also available as a separate program, TextMine. Form-driven annotation of minimotif attributes enables entry of new minimotifs into the MnM database. Several supporting features increase the efficiency of annotation. The layered architecture of MimoSA allows for extensibility by separating the functions of paper scoring, minimotif visualization, and database management. MimoSA is readily adaptable to other annotation efforts that manually curate literature into a MySQL database. Conclusions: MimoSA is an extensible application that facilitates minimotif annotation and integrates with the Minimotif Miner database. We have built MimoSA as an application that integrates dynamic abstract scoring with a high performance relational model of minimotif syntax. MimoSA's TextMine, an efficient paper-scoring algorithm, can be used to dynamically rank papers with respect to context. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
18. SPECTRUM BASED TECHNIQUES FOR GRAPH ISOMORPHISM.
- Author
-
Rajasekaran, Sanguthevar and Kundeti, Vamsi
- Subjects
- *
ISOMORPHISM (Mathematics) , *PROPOSITIONAL calculus , *ALGORITHMS , *EIGENVALUES , *GRAPH theory , *POLYNOMIALS - Abstract
The graph isomorphism problem is to check if two given graphs are isomorphic. Graph isomorphism is a well studied problem and numerous algorithms are available for its solution. In this paper we present algorithms for graph isomorphism that employ the spectra of graphs. An open problem that has fascinated many a scientist is if there exists a polynomial time algorithm for graph isomorphism. Though we do not solve this problem in this paper, the algorithms we present take polynomial time. These algorithms have been tested on a good collection of instances. However, we have not been able to prove that our algorithms will work on all possible instances. In this paper, we also give a new construction for cospectral graphs. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
19. AN EFFICIENT ALGORITHM FOR CHINESE POSTMAN WALK ON BI-DIRECTED DE BRUIJN GRAPHS.
- Author
-
Kundeti V, Rajasekaran S, and Dinh H
- Abstract
Sequence assembly from short reads is an important problem in biology. It is known that solving the sequence assembly problem exactly on a bi-directed de Bruijn graph or a string graph is intractable. However, finding a shortest double stranded DNA string (SDDNA) containing all the k -long words in the reads seems to be a good heuristic to get close to the original genome. This problem is equivalent to finding a cyclic Chinese Postman (CP) walk on the underlying unweighted bi-directed de Bruijn graph built from the reads. The Chinese Postman walk Problem (CPP) is solved by reducing it to a general bi-directed flow on this graph which runs in O (| E |
2 log2 (| V |)) time. In this paper we show that the cyclic CPP on bi-directed graphs can be solved without reducing it to bi-directed flow. We present a Θ( p (| V | + | E |) log(| V |) + ( dmax p )3 ) time algorithm to solve the cyclic CPP on a weighted bi-directed de Bruijn graph, where p = max{|{ υ ∣ din ( υ ) - dout ( υ ) > 0}|, |{ υ ∣ din ( υ ) - dout ( υ ) < 0}|} and dmax = max{∣ din ( υ ) - dout ( υ )}. Our algorithm performs asymptotically better than the bi-directed flow algorithm when the number of imbalanced nodes p is much less than the nodes in the bi-directed graph. From our experimental results on various datasets, we have noticed that the value of p /| V | lies between 0.08% and 0.13% with 95% probability. Many practical bi-directed de Bruijn graphs do not have cyclic CP walks. In such cases it is not clear how the bi-directed flow can be useful in identifying contigs. Our algorithm can handle such situations and identify maximal bi-directed sub-graphs that have CP walks. A Θ( p (| V | + | E |)) time heuristic algorithm based on these ideas has been implemented for the SDDNA problem. This algorithm was tested on short reads from a plant genome and achieves an approximation ratio of at most 1.0134. We also present a Θ((| V | + | E |) log( V )) time algorithm for the single source shortest path problem on bi-directed de Bruijn graphs, which may be of independent interest.- Published
- 2010
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.