175 results on '"BWT"'
Search Results
2. Longest Common Prefix Arrays for Succinct k-Spectra
- Author
-
Alanko, Jarno N., Biagi, Elena, Puglisi, Simon J., Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Nardini, Franco Maria, editor, Pisanti, Nadia, editor, and Venturini, Rossano, editor
- Published
- 2023
- Full Text
- View/download PDF
3. Implementation of Speech Enhancement Using Bionic Wavelet Transform
- Author
-
Bhagya, R., Ashwini, P. R., Bharathi, R., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Joshi, Amit, editor, Mahmud, Mufti, editor, and Ragel, Roshan G., editor
- Published
- 2023
- Full Text
- View/download PDF
4. phyBWT2: phylogeny reconstruction via eBWT positional clustering
- Author
-
Veronica Guerrini, Alessio Conte, Roberto Grossi, Gianni Liti, Giovanna Rosone, and Lorenzo Tattini
- Subjects
Phylogeny ,Partition tree ,BWT ,Positional cluster ,Alignment-free ,Reference-free ,Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract Background Molecular phylogenetics studies the evolutionary relationships among the individuals of a population through their biological sequences. It may provide insights about the origin and the evolution of viral diseases, or highlight complex evolutionary trajectories. A key task is inferring phylogenetic trees from any type of sequencing data, including raw short reads. Yet, several tools require pre-processed input data e.g. from complex computational pipelines based on de novo assembly or from mappings against a reference genome. As sequencing technologies keep becoming cheaper, this puts increasing pressure on designing methods that perform analysis directly on their outputs. From this viewpoint, there is a growing interest in alignment-, assembly-, and reference-free methods that could work on several data including raw reads data. Results We present phyBWT2, a newly improved version of phyBWT (Guerrini et al. in 22nd International Workshop on Algorithms in Bioinformatics (WABI) 242:23–12319, 2022). Both of them directly reconstruct phylogenetic trees bypassing both the alignment against a reference genome and de novo assembly. They exploit the combinatorial properties of the extended Burrows-Wheeler Transform (eBWT) and the corresponding eBWT positional clustering framework to detect relevant blocks of the longest shared substrings of varying length (unlike the k-mer-based approaches that need to fix the length k a priori). As a result, they provide novel alignment-, assembly-, and reference-free methods that build partition trees without relying on the pairwise comparison of sequences, thus avoiding to use a distance matrix to infer phylogeny. In addition, phyBWT2 outperforms phyBWT in terms of running time, as the former reconstructs phylogenetic trees step-by-step by considering multiple partitions, instead of just one partition at a time, as previously done by the latter. Conclusions Based on the results of the experiments on sequencing data, we conclude that our method can produce trees of quality comparable to the benchmark phylogeny by handling datasets of different types (short reads, contigs, or entire genomes). Overall, the experiments confirm the effectiveness of phyBWT2 that improves the performance of its previous version phyBWT, while preserving the accuracy of the results.
- Published
- 2023
- Full Text
- View/download PDF
5. phyBWT2: phylogeny reconstruction via eBWT positional clustering.
- Author
-
Guerrini, Veronica, Conte, Alessio, Grossi, Roberto, Liti, Gianni, Rosone, Giovanna, and Tattini, Lorenzo
- Subjects
- *
MOLECULAR phylogeny , *PHYLOGENY , *VIRUS diseases , *NUCLEOTIDE sequencing - Abstract
Background: Molecular phylogenetics studies the evolutionary relationships among the individuals of a population through their biological sequences. It may provide insights about the origin and the evolution of viral diseases, or highlight complex evolutionary trajectories. A key task is inferring phylogenetic trees from any type of sequencing data, including raw short reads. Yet, several tools require pre-processed input data e.g. from complex computational pipelines based on de novo assembly or from mappings against a reference genome. As sequencing technologies keep becoming cheaper, this puts increasing pressure on designing methods that perform analysis directly on their outputs. From this viewpoint, there is a growing interest in alignment-, assembly-, and reference-free methods that could work on several data including raw reads data. Results: We present phyBWT2, a newly improved version of phyBWT (Guerrini et al. in 22nd International Workshop on Algorithms in Bioinformatics (WABI) 242:23–12319, 2022). Both of them directly reconstruct phylogenetic trees bypassing both the alignment against a reference genome and de novo assembly. They exploit the combinatorial properties of the extended Burrows-Wheeler Transform (eBWT) and the corresponding eBWT positional clustering framework to detect relevant blocks of the longest shared substrings of varying length (unlike the k-mer-based approaches that need to fix the length k a priori). As a result, they provide novel alignment-, assembly-, and reference-free methods that build partition trees without relying on the pairwise comparison of sequences, thus avoiding to use a distance matrix to infer phylogeny. In addition, phyBWT2 outperforms phyBWT in terms of running time, as the former reconstructs phylogenetic trees step-by-step by considering multiple partitions, instead of just one partition at a time, as previously done by the latter. Conclusions: Based on the results of the experiments on sequencing data, we conclude that our method can produce trees of quality comparable to the benchmark phylogeny by handling datasets of different types (short reads, contigs, or entire genomes). Overall, the experiments confirm the effectiveness of phyBWT2 that improves the performance of its previous version phyBWT, while preserving the accuracy of the results. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
6. Relationships between body weight, body condition score at breeding and reproductive and progeny performance in Kiko meat goats over two breeding cycles
- Author
-
Chukwuemeka Okere, Frank Abrahamsen, and Nar Gurung
- Subjects
bcs ,bwt ,boer ,kiko ,reproductive performance ,Agriculture - Abstract
Body condition scores (BCS) and/or body weights (BWT) are often used as visual or tactile management tools to evaluate and improve reproductive competency in pasture-managed small ruminant animals. This study was designed to evaluate relationships between BCS, BWT and reproductive traits (number born alive and weaned, litter size, birth and weaning weights) in 16 purebred Kiko, 11 purebred Boer and 7 Kiko x Boer crossbred meat goat dams that were semi-intensively managed and bred to both Boer and Kiko bucks. BWT was recorded using a scale and palpable BCS scale of 1 to 5 (1= emaciated to 5= obese) and was subjectively determined at breeding, parturition and weaning. Pearson correlation coefficient (r) analysis was used to determine the relationships between residual values of reproductive and progeny performance and BWT or BCS. Pre-partum BCS and weaning BCS had a correlation of r=0.09. A moderate correlation was observed between BWT at breeding and the number born alive (r=0.36) suggesting that pre-partum BWT is the key body conformation measurement linked to the reproductive performance of dams both at birth and weaning. Both at breeding and at weaning BCS were negatively correlated with litter size (r= -0.11) and birth weight (r=-0.32) and weakly correlated with the number born alive (r=0.06). Also, negative correlations were obtained between BCS at weaning and kid weaning weight (r=-0.58) and number weaned (r=-0.26). Although BCS had no significant (P≥0.05) effect on kg kids born per dam, birth weight of kids, and kids weaning weights, it is evident that a BCS score of 3 at the mating time could optimize reproductive performance. The results of this project established the important roles that pre-breeding BWT and/or BCS have on reproductive performance (kidding rate) in meat goat herds. We recommend their evaluation as a useful management tool for distinguishing differences in the pre-partum nutritional needs of meat goat herds, especially in the pasture-based production system.
- Published
- 2022
- Full Text
- View/download PDF
7. Analysis and Comparison of Accuracy in Brain Tumor using Berkeley Wavelet Transform and Robust Principal Component Analysis
- Author
-
Reddy Chappidi Sree Teja and Ramalingam Geetha
- Subjects
bwt ,robpca ,brain tumor detection ,magnetic resonance imaging (mri) ,brain tumors ,cnn ,tumor ,tumor detection ,mortality rate ,Environmental sciences ,GE1-350 - Abstract
The main objective of this study is to compare Berkeley wavelet transform (BWT) and robust principal component analysis (ROBPCA) techniques in tumor analysis to improve the accuracy of image processing. Based on the sample sizes of BWT (N=16) and ROBPCA (N=16), tumor MR pictures of various brain tumor illnesses have been gathered. Image segmentation has been finished, and textural features have been retrieved using image processing methods. The accuracy and sensitivity of the parameter are taken into consideration by both organizations when evaluating tumor detection and evaluation. The sample size for each group could be determined by maintaining the enrollment ratio at 1, the threshold alpha at 0.05, the g power at 80%, and the confidence interval at 95%. The absence of a statistically significant difference (p = 0.182) between the two groups was verified using an Independent Sample T-test. The accuracy numbers in BWT are 81.5%, while 84% is the accuracy value in ROBPCA. When it comes to brain tumor detection and analysis, ROBPCA has performed well when compared to BWT.
- Published
- 2024
- Full Text
- View/download PDF
8. Estimates of Genomic Heritability and the Marker-Derived Gene for Re(Production) Traits in Xinggao Sheep.
- Author
-
Liu, Zaixia, Fu, Shaoyin, He, Xiaolong, Liu, Xuewen, Shi, Caixia, Dai, Lingli, Wang, Biao, Chai, Yuan, Liu, Yongbin, and Zhang, Wenguang
- Subjects
- *
EWES , *HERITABILITY , *SHEEP , *SHEEP breeds , *GENOME-wide association studies , *SINGLE nucleotide polymorphisms , *SHEEP breeding - Abstract
Xinggao sheep are a breed of Chinese domestic sheep that are adapted to the extremely cold climatic features of the Hinggan League in China. The economically vital reproductive trait of ewes (litter size, LS) and productive traits of lambs (birth weight, BWT; weaning weight, WWT; and average daily gain, ADG) are expressed in females and later in life after most of the selection decisions have been made. This study estimated the genetic parameters for four traits to explore the genetic mechanisms underlying the variation, and we performed genome-wide association study (GWAS) tests on a small sample size to identify novel marker trait associations (MTAs) associated with prolificacy and growth. We detected two suggestive significant single-nucleotide polymorphisms (SNPs) associated with LS and eight significant SNPs for BWT, WWT, and ADG. These candidate loci and genes also provide valuable information for further fine-mapping of QTLs and improvement of reproductive and productive traits in sheep. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
9. String inference from longest-common-prefix array.
- Author
-
Kärkkäinen, Juha, Piątkowski, Marcin, and Puglisi, Simon J.
- Subjects
- *
LINEAR complementarity problem , *POLYNOMIAL time algorithms , *DATA structures , *SUFFIXES & prefixes (Grammar) - Abstract
The suffix array, perhaps the most important data structure in modern string processing, is often augmented with the longest common prefix (LCP) array which stores the lengths of the longest common prefixes for lexicographically adjacent suffixes of a string. Together the two arrays are roughly equivalent to the suffix tree with the LCP array representing the tree shape. In order to better understand the combinatorics of LCP arrays, we consider the problem of inferring a string from an LCP array, i.e., determining whether a given array of integers is a valid LCP array, and if it is, reconstructing some string or all strings with that LCP array. There are recent studies of inferring a string from a suffix tree shape but using significantly more information (in the form of suffix links) than is available in the LCP array. We provide two main results. (1) We describe two algorithms for inferring strings from an LCP array when we allow a generalized form of LCP array defined for a multiset of cyclic strings: a linear time algorithm for binary alphabet and a general algorithm with polynomial time complexity for a constant alphabet size. (2) We prove that determining whether a given integer array is a valid LCP array is NP-complete when we require more restricted forms of LCP array defined for a single cyclic or non-cyclic string or a multiset of non-cyclic strings. The result holds whether or not the alphabet is restricted to be binary. In combination, the two results show that the generalized form of LCP array for a multiset of cyclic strings is fundamentally different from the other more restricted forms. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
10. Protection of Nine-Phase Transmission Line Using Biorthogonal-2.2 Wavelet Transform
- Author
-
Kapoor, Gaurav, Bansal, Jagdish Chand, Series Editor, Deep, Kusum, Series Editor, Nagar, Atulya K., Series Editor, Shorif Uddin, Mohammad, editor, Sharma, Avdhesh, editor, Agarwal, Kusum Lata, editor, and Saraswat, Mukesh, editor
- Published
- 2021
- Full Text
- View/download PDF
11. Kommunens roll i trygghetsskapande åtgärder av den fysiska miljön i stadsrummet : En kvalitativ fallstudie i Upplands Väsby kommun
- Author
-
Ernman, Elena and Ernman, Elena
- Abstract
Studien handlar om vilka åtgärder som vidtagits inom kommunen för att förebygga brottslighet samt bidra till trygghet och välmående för kommuninvånarna genom förbättring av den fysiska miljön. Studien består av kvalitativa semistrukturerade intervjuer med politiker, tjänstemän, brottsförebyggare, stadsarkitekter och andra aktörer med direkt eller indirekt inflytande på trygghetsskapande och brottsförebyggande arbete. Min undersökning avgränsas till den fysiska utformningen och vilken betydelse denna har för ett strategiskt arbete att värna trygghet. Resultatet tyder på att den fysiska utformningen har ett betydande inflytande på trygghetsupplevelse och säkerhet. Detta förhållande är i kombination med andra faktorer något som påverkar social hållbarhet avseende trygghet. Resultaten visar därutöver att kommunen, i samverkan med flera andra aktörer, jobbar med att anpassa den fysiska designen samt utformningen av stadsrummet för att skapa förutsättningar för invånarna att få 1) social kontroll, 2) naturlig övervakning. Dessa förklaras genom koncepten: 1.ögon mot gatan och 2. livliga gator. Resultatet visar även att det råder brist på social hållbarhet som inkluderar både social sammanhållning och socialt engagemang i Upplands Väsby kommun och att kommunen inte alltid vidtar åtgärder i tid.
- Published
- 2024
12. Smaller Fully-Functional Bidirectional BWT Indexes
- Author
-
Belazzougui, Djamal, Cunial, Fabio, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Boucher, Christina, editor, and Thankachan, Sharma V., editor
- Published
- 2020
- Full Text
- View/download PDF
13. Variable-order reference-free variant discovery with the Burrows-Wheeler Transform
- Author
-
Nicola Prezza, Nadia Pisanti, Marinella Sciortino, and Giovanna Rosone
- Subjects
SNP ,INDEL ,BWT ,Alignment-free ,Assembly-free ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background In [Prezza et al., AMB 2019], a new reference-free and alignment-free framework for the detection of SNPs was suggested and tested. The framework, based on the Burrows-Wheeler Transform (BWT), significantly improves sensitivity and precision of previous de Bruijn graphs based tools by overcoming several of their limitations, namely: (i) the need to establish a fixed value, usually small, for the order k, (ii) the loss of important information such as k-mer coverage and adjacency of k-mers within the same read, and (iii) bad performance in repeated regions longer than k bases. The preliminary tool, however, was able to identify only SNPs and it was too slow and memory consuming due to the use of additional heavy data structures (namely, the Suffix and LCP arrays), besides the BWT. Results In this paper, we introduce a new algorithm and the corresponding tool ebwt2InDel that (i) extend the framework of [Prezza et al., AMB 2019] to detect also INDELs, and (ii) implements recent algorithmic findings that allow to perform the whole analysis using just the BWT, thus reducing the working space by one order of magnitude and allowing the analysis of full genomes. Finally, we describe a simple strategy for effectively parallelizing our tool for SNP detection only. On a 24-cores machine, the parallel version of our tool is one order of magnitude faster than the sequential one. The tool ebwt2InDel is available at github.com/nicolaprezza/ebwt2InDel . Conclusions Results on a synthetic dataset covered at 30x (Human chromosome 1) show that our tool is indeed able to find up to 83% of the SNPs and 72% of the existing INDELs. These percentages considerably improve the 71% of SNPs and 51% of INDELs found by the state-of-the art tool based on de Bruijn graphs. We furthermore report results on larger (real) Human whole-genome sequencing experiments. Also in these cases, our tool exhibits a much higher sensitivity than the state-of-the art tool.
- Published
- 2020
- Full Text
- View/download PDF
14. Pro k rustean Graph: A substring index for rapid k-mer size analysis.
- Author
-
Park A and Koslickia D
- Abstract
Despite the widespread adoption of k -mer-based methods in bioinformatics, understanding the influence of k -mer sizes remains a persistent challenge. Selecting an optimal k -mer size or employing multiple k -mer sizes is often arbitrary, application-specific, and fraught with computational complexities. Typically, the influence of k -mer size is obscured by the outputs of complex bioinformatics tasks, such as genome analysis, comparison, assembly, alignment, and error correction. However, it is frequently overlooked that every method is built above a well-defined k -mer-based object like Jaccard Similarity, de Bruijn graphs, k -mer spectra, and Bray-Curtis Dissimilarity. Despite these objects offering a clearer perspective on the role of k -mer sizes, the dynamics of k -mer-based objects with respect to k -mer sizes remain surprisingly elusive. This paper introduces a computational framework that generalizes the transition of k -mer-based objects across k -mer sizes, utilizing a novel substring index, the Pro k rustean graph. The primary contribution of this framework is to compute quantities associated with k -mer-based objects for all k -mer sizes, where the computational complexity depends solely on the number of maximal repeats and is independent of the range of k -mer sizes. For example, counting vertices of compacted de Bruijn graphs for k = 1 , … , 100 can be accomplished in mere seconds with our substring index constructed on a gigabase-sized read set. Additionally, we derive a space-efficient algorithm to extract the Pro k rustean graph from the Burrows-Wheeler Transform. It becomes evident that modern substring indices, mostly based on longest common prefixes of suffix arrays, inherently face difficulties at exploring varying k -mer sizes due to their limitations at grouping co-occurring substrings. We have implemented four applications that utilize quantities critical in modern pangenomics and metagenomics. The code for these applications and the construction algorithm is available at https://github.com/KoslickiLab/prokrustean.
- Published
- 2024
- Full Text
- View/download PDF
15. Better quality score compression through sequence-based quality smoothing
- Author
-
Yoshihiro Shibuya and Matteo Comin
- Subjects
FASTQ compression ,BWT ,FM-Index ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Motivation Current NGS techniques are becoming exponentially cheaper. As a result, there is an exponential growth of genomic data unfortunately not followed by an exponential growth of storage, leading to the necessity of compression. Most of the entropy of NGS data lies in the quality values associated to each read. Those values are often more diversified than necessary. Because of that, many tools such as Quartz or GeneCodeq, try to change (smooth) quality scores in order to improve compressibility without altering the important information they carry for downstream analysis like SNP calling. Results We use the FM-Index, a type of compressed suffix array, to reduce the storage requirements of a dictionary of k-mers and an effective smoothing algorithm to maintain high precision for SNP calling pipelines, while reducing quality scores entropy. We present YALFF (Yet Another Lossy Fastq Filter), a tool for quality scores compression by smoothing leading to improved compressibility of FASTQ files. The succinct k-mers dictionary allows YALFF to run on consumer computers with only 5.7 GB of available free RAM. YALFF smoothing algorithm can improve genotyping accuracy while using less resources. Availability https://github.com/yhhshb/yalff
- Published
- 2019
- Full Text
- View/download PDF
16. SNPs detection by eBWT positional clustering
- Author
-
Nicola Prezza, Nadia Pisanti, Marinella Sciortino, and Giovanna Rosone
- Subjects
BWT ,LCP array ,SNPs ,Reference-free ,Assembly-free ,Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract Background Sequencing technologies keep on turning cheaper and faster, thus putting a growing pressure for data structures designed to efficiently store raw data, and possibly perform analysis therein. In this view, there is a growing interest in alignment-free and reference-free variants calling methods that only make use of (suitably indexed) raw reads data. Results We develop the positional clustering theory that (i) describes how the extended Burrows–Wheeler Transform (eBWT) of a collection of reads tends to cluster together bases that cover the same genome position (ii) predicts the size of such clusters, and (iii) exhibits an elegant and precise LCP array based procedure to locate such clusters in the eBWT. Based on this theory, we designed and implemented an alignment-free and reference-free SNPs calling method, and we devised a consequent SNPs calling pipeline. Experiments on both synthetic and real data show that SNPs can be detected with a simple scan of the eBWT and LCP arrays as, in accordance with our theoretical framework, they are within clusters in the eBWT of the reads. Finally, our tool intrinsically performs a reference-free evaluation of its accuracy by returning the coverage of each SNP. Conclusions Based on the results of the experiments on synthetic and real data, we conclude that the positional clustering framework can be effectively used for the problem of identifying SNPs, and it appears to be a promising approach for calling other type of variants directly on raw sequencing data. Availability The software ebwt2snp is freely available for academic use at: https://github.com/nicolaprezza/ebwt2snp.
- Published
- 2019
- Full Text
- View/download PDF
17. A Lightweight Algorithm for Computing BWT from Suffix Array in Disk
- Author
-
Xie, Jing Yi, Lao, Bin, Nong, Ge, Barbosa, Simone Diniz Junqueira, Series editor, Chen, Phoebe, Series editor, Filipe, Joaquim, Series editor, Kotenko, Igor, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, Yuan, Junsong, Series editor, Zhou, Lizhu, Series editor, Chen, Guoliang, editor, Shen, Hong, editor, and Chen, Mingrui, editor
- Published
- 2017
- Full Text
- View/download PDF
18. Greedy Shortest Common Superstring Approximation in Compact Space
- Author
-
Alanko, Jarno, Norri, Tuukka, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Fici, Gabriele, editor, Sciortino, Marinella, editor, and Venturini, Rossano, editor
- Published
- 2017
- Full Text
- View/download PDF
19. EPR-Dictionaries: A Practical and Fast Data Structure for Constant Time Searches in Unidirectional and Bidirectional FM Indices
- Author
-
Pockrandt, Christopher, Ehrhardt, Marcel, Reinert, Knut, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, and Sahinalp, S. Cenk, editor
- Published
- 2017
- Full Text
- View/download PDF
20. FMLRC: Hybrid long read error correction using an FM-index
- Author
-
Jeremy R. Wang, James Holt, Leonard McMillan, and Corbin D. Jones
- Subjects
de novo assembly ,Hybrid error correction ,Long read ,Pacbio ,BWT ,FM-Index ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and time to assemble novel genomes by leveraging “hybrid” assemblies that use long reads for scaffolding and short reads for accuracy. Results We describe a novel method leveraging a multi-string Burrows-Wheeler Transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. We demonstrate that our method efficiently produces significantly more high quality corrected sequence than existing hybrid error-correction methods. We also show that our method produces more contiguous assemblies, in many cases, than existing state-of-the-art hybrid and long-read only de novo assembly methods. Conclusion Our method accurately corrects long read sequence data using complementary short reads. We demonstrate higher total throughput of corrected long reads and a corresponding increase in contiguity of the resulting de novo assemblies. Improved throughput and computational efficiency than existing methods will help better economically utilize emerging long read sequencing technologies.
- Published
- 2018
- Full Text
- View/download PDF
21. Enhanced imperialist competitive algorithm based efficient reversible data hiding technique.
- Author
-
Prabha, K. R. and Jagadeeswari, M.
- Subjects
IMPERIALIST competitive algorithm ,DISCRETE cosine transforms ,VECTOR quantization ,MATHEMATICAL notation ,DATA compression ,IMAGE compression ,DATA extraction - Abstract
This paper presents a novel reversible data hiding into a Vector Quantization (VQ) and Side Match Vector Quantization (SMVQ) based compression image to embed high capacity secret bits and recover cover image after data extraction. For optimal embedding capacity and to achieve exact recovery of cover image, this paper uses Enhanced Imperialist Competitive Algorithm (EICA). The threshold value is determined by the fitness function contrast sensitivity in EICA in order to signify embedding rate of each region in a cover image based on the size of the secret message. During data hiding, the output size of code stream is preserved in hiding two secret bits into a single index value. Discrete Cosine Transform (DCT) and Burrows Wheeler Transform (BWT) is applied before quantization for exact recovery of cover image and to achieve high compression ratio. Excellent energy compaction is provided by DCT and BWT reorders the symbols according to their context. Thus the proposed method provides a novel technique to embed secret bits into the cover image and compresses the embedded image. The output will be in the form of code streams with preserved size. The experimental results show that the proposed technique achieves high embedding capacity and compression rate. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
22. Fast Online Lempel-Ziv Factorization in Compressed Space
- Author
-
Policriti, Alberto, Prezza, Nicola, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Iliopoulos, Costas, editor, Puglisi, Simon, editor, and Yilmaz, Emine, editor
- Published
- 2015
- Full Text
- View/download PDF
23. BWTCP: A Parallel Method for Constructing BWT in Large Collection of Genomic Reads
- Author
-
Wang, Heng, Peng, Shaoliang, Lu, Yutong, Wu, Chengkun, Wen, Jiajun, Liu, Jie, Zhu, Xiaoqian, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Kunkel, Julian M., editor, and Ludwig, Thomas, editor
- Published
- 2015
- Full Text
- View/download PDF
24. Built Environment Wind Turbine Roadmap
- Author
-
Oteri, F. [National Renewable Energy Lab. (NREL), Golden, CO (United States)]
- Published
- 2012
- Full Text
- View/download PDF
25. Efficient construction of the BWT for repetitive text using string compression.
- Author
-
Díaz-Domínguez, Diego and Navarro, Gonzalo
- Subjects
- *
PATTERN matching , *HUMAN genome , *SHORT-term memory , *SUFFIXES & prefixes (Grammar) , *GRAMMAR - Abstract
We present a new semi-external algorithm that builds the Burrows–Wheeler transform variant of Bauer et al. (a.k.a., BCR BWT) in linear expected time. Our method uses compression techniques to reduce computational costs when the input is massive and repetitive. Concretely, we build on induced suffix sorting (ISS) and resort to run-length and grammar compression to maintain our intermediate results in compact form. Our compression format not only saves space but also speeds up the required computations. Our experiments show important space and computation time savings when the text is repetitive. In moderate-size collections of real human genome assemblies (14.2 GB - 75.05 GB), our memory peak is, on average, 1.7x smaller than the peak of the state-of-the-art BCR BWT construction algorithm (ropebwt2), while running 5x faster. Our current implementation was also able to compute the BCR BWT of 400 real human genome assemblies (1.2 TB) in 41.21 hours using 118.83 GB of working memory (around 10% of the input size). Interestingly, the results we report in the 1.2 TB file are dominated by the difficulties of scanning huge files under memory constraints (specifically, I/O operations). This fact indicates we can perform much better with a more careful implementation of our method, thus scaling to even bigger sizes efficiently. • We introduce a new algorithm to build the Burrows-Wheeler Transform on massive and highly repetitive text collections. • We build on Induced Suffix Sorting and use grammar compression to store intermediate results. • Our experiments demonstrate that our particular format saves significant space and computation time. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
26. Implementing Efficient Updates in Compressed Big Text Databases
- Author
-
Böttcher, Stefan, Bültmann, Alexander, Hartel, Rita, Schlüßler, Jonathan, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Decker, Hendrik, editor, Lhotská, Lenka, editor, Link, Sebastian, editor, Basl, Josef, editor, and Tjoa, A Min, editor
- Published
- 2013
- Full Text
- View/download PDF
27. Suffixes, Conjugates and Lyndon Words
- Author
-
Bonomo, Silvia, Mantaci, Sabrina, Restivo, Antonio, Rosone, Giovanna, Sciortino, Marinella, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Béal, Marie-Pierre, editor, and Carton, Olivier, editor
- Published
- 2013
- Full Text
- View/download PDF
28. Lightweight LCP Construction for Next-Generation Sequencing Datasets
- Author
-
Bauer, Markus J., Cox, Anthony J., Rosone, Giovanna, Sciortino, Marinella, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Istrail, Sorin, editor, Pevzner, Pavel, editor, Waterman, Michael S., editor, Raphael, Ben, editor, and Tang, Jijun, editor
- Published
- 2012
- Full Text
- View/download PDF
29. Memory-Aware BWT by Segmenting Sequences to Support Subsequence Search
- Author
-
Wang, Jiaying, Yang, Xiaochun, Wang, Bin, Zhu, Huaijie, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Sheng, Quan Z., editor, Wang, Guoren, editor, Jensen, Christian S., editor, and Xu, Guandong, editor
- Published
- 2012
- Full Text
- View/download PDF
30. Lightweight BWT Construction for Very Large String Collections
- Author
-
Bauer, Markus J., Cox, Anthony J., Rosone, Giovanna, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Giancarlo, Raffaele, editor, and Manzini, Giovanni, editor
- Published
- 2011
- Full Text
- View/download PDF
31. 一种带有长度和位置约束的字符串索引方法.
- Author
-
于长永, 高明, 柏禄一, and 赵宇海
- Abstract
An index method of string collection was proposed based on BWT (Burrows-wheeler-transform) for solving the exact substring queries with string length and matching position constraints. Firstly, the BWT and exact string query based on it were discussed. Then the impact of string collection, string length and substring position upon the original BWT index was analyzed. Finally, the fast calculation problem was discussed and solved from the position of the matching suffix to the string ID and position on the string of the matching substring. The approximate string matching was conducted on three real string collections and compared the results of index method proposed and the original one. The experimental results showed that the method proposed based on BWT speeded up the process of exact substring queries with string length and matching position constraints considerably in the case of keeping the index size. Therefore, the proposed method was suitable for indexing large-scale string collection for string similarity match and joint. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
32. COMPRESSION OF TEXTUAL COLUMN-ORIENTED DATA.
- Author
-
GARCIA, Vinicius Fulber and MERGEN, Sergio Luis Sardi
- Subjects
DATA compression ,INFORMATION retrieval ,ENTROPY (Information theory) ,SKEWNESS (Probability theory) ,DATABASES - Abstract
Column-oriented data are well suited for compression. Since values of the same column are stored contiguously on disk, the information entropy is lower if compared to the physical data organization of conventional databases. There are many useful light-weight compression techniques targeted at specific data types and domains, like integers and small lists of distinct values, respectively. However, compression of textual values formed by skewed and high-cardinality words is usually restricted to variations of the LZ compression algorithm. So far there are no empirical evaluations that verify how other sophisticated compression methods address columnar data that store text. In this paper we shed a light on this subject by revisiting concepts of those algorithms. We also analyse how they behave in terms of compression and speed when dealing with textual columns where values appear in adjacent positions. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
33. Diatoms from the late Holocene of the western Chukchi Sea, Arctic Ocean: environmental signals and palaeoceanography
- Author
-
Browaldh, Erik and Browaldh, Erik
- Abstract
The sediment Core SWERUS-L2-2-PC1 (2PC) retrieved from the Chukchi Sea, Arctic Ocean sits in an oceanographically dynamic location at the Arctic-Pacific Ocean gateway. The 8.3 m-long core was retrieved in Herald Canyon at the marginal ice zone at 57 m depth. Core 2PC is well-positioned to record variability in inflow of Bering Sea Water (BSW) and Pacific Water (PW) in Herald Canyon. With the 2PC high sedimentation rate (200 cm/kyr), two independent age models (radiocarbon and palaeomagnetism) based on tephra age markers, and a richness in well-preserved siliceous sediment, validate 2PC as an outstanding sequence for applying diatom assemblage analysis as a proxy for ocean-climate change back to 4250 years BP, including the past few hundred years where global warming and sea ice decline is recorded by instrumental records. These characteristics make Core-2PC a useful record for investigating the role of PW on sea ice variability in the Chukchi Sea, both in the past and predicting the future. To investigate the impact of PW on ocean and sea ice conditions in the Chukchi Sea, diatom assemblage analysis was performed on 49 samples through the Late Holocene. The over-arching goal was to test the hypothesis, suggested by existing research on 2PC using benthic foraminifera Mg/Ca palaeothermometry, that the strength of PW inflow into the Chukchi Sea via Herald Canyon has varied on a time scales of ~500-1000 years in the past 4000 years. PW is slightly warmer than resident Arctic surface waters and is known to be an important control on Arctic sea-ice. The diatom assemblage approach assumes that there are recognizable differences between end-member diatom assemblages that are characteristic of PW versus Arctic Ocean type environments associated with extensive sea-ice conditions. The mapping of species in the Herald Canyon was used to test the idea of variability of sea-ice extent and the role of the Pacific Ocean forcings into the western Chukchi Sea. The results reveal dive
- Published
- 2022
34. phyBWT: Alignment-Free Phylogeny via eBWT Positional Clustering
- Author
-
Veronica Guerrini and Alessio Conte and Roberto Grossi and Gianni Liti and Giovanna Rosone and Lorenzo Tattini, Guerrini, Veronica, Conte, Alessio, Grossi, Roberto, Liti, Gianni, Rosone, Giovanna, Tattini, Lorenzo, Veronica Guerrini and Alessio Conte and Roberto Grossi and Gianni Liti and Giovanna Rosone and Lorenzo Tattini, Guerrini, Veronica, Conte, Alessio, Grossi, Roberto, Liti, Gianni, Rosone, Giovanna, and Tattini, Lorenzo
- Abstract
Molecular phylogenetics is a fundamental branch of biology. It studies the evolutionary relationships among the individuals of a population through their biological sequences, and may provide insights about the origin and the evolution of viral diseases, or highlight complex evolutionary trajectories. In this paper we develop a method called phyBWT, describing how to use the extended Burrows-Wheeler Transform (eBWT) for a collection of DNA sequences to directly reconstruct phylogeny, bypassing the alignment against a reference genome or de novo assembly. Our phyBWT hinges on the combinatorial properties of the eBWT positional clustering framework. We employ eBWT to detect relevant blocks of the longest shared substrings of varying length (unlike the k-mer-based approaches that need to fix the length k a priori), and build a suitable decomposition leading to a phylogenetic tree, step by step. As a result, phyBWT is a new alignment-, assembly-, and reference-free method that builds a partition tree without relying on the pairwise comparison of sequences, thus avoiding to use a distance matrix to infer phylogeny. The preliminary experimental results on sequencing data show that our method can handle datasets of different types (short reads, contigs, or entire genomes), producing trees of quality comparable to that found in the benchmark phylogeny.
- Published
- 2022
- Full Text
- View/download PDF
35. Efficient Construction of the BWT for Repetitive Text Using String Compression
- Author
-
Diego Díaz-Domínguez and Gonzalo Navarro, Díaz-Domínguez, Diego, Navarro, Gonzalo, Diego Díaz-Domínguez and Gonzalo Navarro, Díaz-Domínguez, Diego, and Navarro, Gonzalo
- Abstract
We present a new semi-external algorithm that builds the Burrows-Wheeler transform variant of Bauer et al. (a.k.a., BCR BWT) in linear expected time. Our method uses compression techniques to reduce the computational costs when the input is massive and repetitive. Concretely, we build on induced suffix sorting (ISS) and resort to run-length and grammar compression to maintain our intermediate results in compact form. Our compression format not only saves space, but it also speeds up the required computations. Our experiments show important savings in both space and computation time when the text is repetitive. On average, we are 3.7x faster than the baseline compressed approach, while maintaining a similar memory consumption. These results make our method stand out as the only one (to our knowledge) that can build the BCR BWT of a collection of 25 human genomes (75 GB) in about 7.3 hours, and using only 27 GB of working memory.
- Published
- 2022
- Full Text
- View/download PDF
36. Measuring the clustering effect of BWT via RLE.
- Author
-
Mantaci, Sabrina, Restivo, Antonio, Rosone, Giovanna, Sciortino, Marinella, and Versari, Luca
- Subjects
- *
CLUSTER analysis (Statistics) , *MATHEMATICAL transformations , *BIOINFORMATICS , *COMBINATORIAL optimization , *MEASURE theory , *COMPUTATIONAL complexity - Abstract
The Burrows–Wheeler Transform (BWT) is a reversible transformation on which are based several text compressors and many other tools used in Bioinformatics and Computational Biology. The BWT is not actually a compressor, but a transformation that performs a context-dependent permutation of the letters of the input text that often create runs of equal letters (clusters) longer than the ones in the original text, usually referred to as the “clustering effect” of BWT. In particular, from a combinatorial point of view, great attention has been given to the case in which the BWT produces the fewest number of clusters (cf. [5,16,21,23] ). In this paper we are concerned about the cases when the clustering effect of the BWT is not achieved. For this purpose we introduce a complexity measure that counts the number of equal-letter runs of a word. This measure highlights that there exist many words for which BWT gives an “un-clustering effect”, that is BWT produce a great number of short clusters. More in general we show that the application of BWT to any word at worst doubles the number of equal-letter runs. Moreover, we prove that this bound is tight by exhibiting some families of words where such upper bound is always reached. We also prove that for binary words the case in which the BWT produces the maximal number of clusters is related to the very well known Artin's conjecture on primitive roots. The study of some combinatorial properties underlying this transformation could be useful for improving indexing and compression strategies. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
37. Variable-order reference-free variant discovery with the Burrows-Wheeler Transform
- Author
-
Nadia Pisanti, Nicola Prezza, Marinella Sciortino, Giovanna Rosone, University of Pisa - Università di Pisa, Equipe de recherche européenne en algorithmique et biologie formelle et expérimentale (ERABLE), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Università degli studi di Palermo - University of Palermo, Prezza N., Pisanti N., Sciortino M., and Rosone G.
- Subjects
Burrows–Wheeler transform ,Computer science ,[SDV]Life Sciences [q-bio] ,Value (computer science) ,SNP ,Assembly-free ,0102 computer and information sciences ,lcsh:Computer applications to medicine. Medical informatics ,01 natural sciences ,Biochemistry ,Polymorphism, Single Nucleotide ,03 medical and health sciences ,BWT ,Chromosome (genetic algorithm) ,Structural Biology ,Humans ,Sensitivity (control systems) ,Molecular Biology ,lcsh:QH301-705.5 ,Alignment-free ,INDEL ,030304 developmental biology ,De Bruijn sequence ,0303 health sciences ,Settore INF/01 - Informatica ,Applied Mathematics ,Research ,Genomics ,Sequence Analysis, DNA ,Data structure ,Graph ,Computer Science Applications ,Variable (computer science) ,lcsh:Biology (General) ,010201 computation theory & mathematics ,Adjacency list ,lcsh:R858-859.7 ,Suffix ,[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] ,Algorithm ,Algorithms - Abstract
Background In [Prezza et al., AMB 2019], a new reference-free and alignment-free framework for the detection of SNPs was suggested and tested. The framework, based on the Burrows-Wheeler Transform (BWT), significantly improves sensitivity and precision of previous de Bruijn graphs based tools by overcoming several of their limitations, namely: (i) the need to establish a fixed value, usually small, for the order k, (ii) the loss of important information such as k-mer coverage and adjacency of k-mers within the same read, and (iii) bad performance in repeated regions longer than k bases. The preliminary tool, however, was able to identify only SNPs and it was too slow and memory consuming due to the use of additional heavy data structures (namely, the Suffix and LCP arrays), besides the BWT. Results In this paper, we introduce a new algorithm and the corresponding tool ebwt2InDel that (i) extend the framework of [Prezza et al., AMB 2019] to detect also INDELs, and (ii) implements recent algorithmic findings that allow to perform the whole analysis using just the BWT, thus reducing the working space by one order of magnitude and allowing the analysis of full genomes. Finally, we describe a simple strategy for effectively parallelizing our tool for SNP detection only. On a 24-cores machine, the parallel version of our tool is one order of magnitude faster than the sequential one. The tool ebwt2InDel is available at github.com/nicolaprezza/ebwt2InDel. Conclusions Results on a synthetic dataset covered at 30x (Human chromosome 1) show that our tool is indeed able to find up to 83% of the SNPs and 72% of the existing INDELs. These percentages considerably improve the 71% of SNPs and 51% of INDELs found by the state-of-the art tool based on de Bruijn graphs. We furthermore report results on larger (real) Human whole-genome sequencing experiments. Also in these cases, our tool exhibits a much higher sensitivity than the state-of-the art tool.
- Published
- 2020
- Full Text
- View/download PDF
38. Efficient Construction of the BWT for Repetitive Text Using String Compression
- Author
-
Díaz-Domínguez, Diego, Navarro, Gonzalo, Bannai, Hideo, Holub, Jan, Department of Computer Science, and Algorithmic Bioinformatics
- Subjects
FOS: Computer and information sciences ,BWT ,string compression ,Computer Science - Data Structures and Algorithms ,Data Structures and Algorithms (cs.DS) ,Data_CODINGANDINFORMATIONTHEORY ,repetitive text ,113 Computer and information sciences ,Theory of computation → Data compression - Abstract
We present a new semi-external algorithm that builds the Burrows-Wheeler transform variant of Bauer et al. (a.k.a., BCR BWT) in linear expected time. Our method uses compression techniques to reduce the computational costs when the input is massive and repetitive. Concretely, we build on induced suffix sorting (ISS) and resort to run-length and grammar compression to maintain our intermediate results in compact form. Our compression format not only saves space, but it also speeds up the required computations. Our experiments show important savings in both space and computation time when the text is repetitive. On average, we are 3.7x faster than the baseline compressed approach, while maintaining a similar memory consumption. These results make our method stand out as the only one (to our knowledge) that can build the BCR BWT of a collection of 25 human genomes (75 GB) in about 7.3 hours, and using only 27 GB of working memory., Comment: Accepted at CPM'22
- Published
- 2022
- Full Text
- View/download PDF
39. Diatoméer från sen holocen i västra Tjukjerhavet, Arktiska oceanen: miljösignaler och paleoceanografi
- Author
-
Browaldh, Erik
- Subjects
bottom water temperature ,Shionodiscus oestrupii ,cryophilic ,Paralia sulcata ,warm water diatom ,benthic foraminifera ,Chaetoceros sp. 7 ,Geology ,Chukchi Sea ,Chaetoceros ,Herald Canyon ,diatoms ,sympagic ,Bering Sea Water species ,BWT ,palaeothermometry ,SWERUS-L2-2-PC1 ,late holocene ,Fragilariopsis ,Thalassiosira simonsenii ,ice-algae ,Geologi ,Fossula arctica ,ice dynamics - Abstract
The sediment Core SWERUS-L2-2-PC1 (2PC) retrieved from the Chukchi Sea, Arctic Ocean sits in an oceanographically dynamic location at the Arctic-Pacific Ocean gateway. The 8.3 m-long core was retrieved in Herald Canyon at the marginal ice zone at 57 m depth. Core 2PC is well-positioned to record variability in inflow of Bering Sea Water (BSW) and Pacific Water (PW) in Herald Canyon. With the 2PC high sedimentation rate (200 cm/kyr), two independent age models (radiocarbon and palaeomagnetism) based on tephra age markers, and a richness in well-preserved siliceous sediment, validate 2PC as an outstanding sequence for applying diatom assemblage analysis as a proxy for ocean-climate change back to 4250 years BP, including the past few hundred years where global warming and sea ice decline is recorded by instrumental records. These characteristics make Core-2PC a useful record for investigating the role of PW on sea ice variability in the Chukchi Sea, both in the past and predicting the future. To investigate the impact of PW on ocean and sea ice conditions in the Chukchi Sea, diatom assemblage analysis was performed on 49 samples through the Late Holocene. The over-arching goal was to test the hypothesis, suggested by existing research on 2PC using benthic foraminifera Mg/Ca palaeothermometry, that the strength of PW inflow into the Chukchi Sea via Herald Canyon has varied on a time scales of ~500-1000 years in the past 4000 years. PW is slightly warmer than resident Arctic surface waters and is known to be an important control on Arctic sea-ice. The diatom assemblage approach assumes that there are recognizable differences between end-member diatom assemblages that are characteristic of PW versus Arctic Ocean type environments associated with extensive sea-ice conditions. The mapping of species in the Herald Canyon was used to test the idea of variability of sea-ice extent and the role of the Pacific Ocean forcings into the western Chukchi Sea. The results reveal diverse diatom assemblages throughout the past 4000 years in Herald Canyon, showing this core to be very useful for diatom palaeoclimate reconstructions. A total of 126 species with abundance >1% are recognized. Several generalist species typically dominate assemblages especially Chaetoceros, ice-algae, marine-neritic and near ice or cold-water planktic centric diatoms. Distinct changes in stratigraphy are illustrated by changes in identified diatom assemblage zones. The 2PC diatom assemblages were contrasted with records from Chukchi-, Laptev-, East Siberian- and Bering Sea and North Pacific Ocean. At 2PC, sympagic (sea-ice related), planktic and neritic species abundance varies on time scales of ~500-1000 years. Importantly, there is a clear similarity between the timing of diatom assemblage changes and the 2PC benthic foraminifera Mg/Ca bottom water temperature (BWT) reconstruction. In particular, abundance changes in the warm water species Thalassionema nitzschioides, Shionodiscus oestrupii and Thalassionema simonsenii, tychoplanktic Paralia sulcata, Ice algae- and sympagic assemblages and cold-water indicators correspond best to BWT fluctuations shown by the Mg/Ca reconstruction. These oscillations are suggestive of changes in warmer PW inflow. Other aspects of the diatom data appear to correlate with colder and warmer climate events and suggest that changes in PW inflow amplified the effects of these events in the Chukchi Sea region through the Late Holocene in the Northern Hemisphere. It can thus, be concluded that diatoms from 2PC, support the palaeoceanographic reconstruction suggested by the benthic foraminifera Mg/Ca palaeothermometry and that variations in PW inflow through Herald Canyon is an important driver of sea ice variability on thousand-year time scales.
- Published
- 2022
40. phyBWT: Alignment-Free Phylogeny via eBWT Positional Clustering
- Author
-
Guerrini, Veronica, Conte, Alessio, Grossi, Roberto, Liti, Gianni, Rosone, Giovanna, Tattini, Lorenzo, University of Pisa - Università di Pisa, Equipe de recherche européenne en algorithmique et biologie formelle et expérimentale (ERABLE), Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Inria Lyon, Institut National de Recherche en Informatique et en Automatique (Inria), and CNRS UMR 7284, Inserm U 1081, Université Côte d'Azur
- Subjects
positional cluster ,BWT ,Bioinformatics ,[SDV]Life Sciences [q-bio] ,partition tree ,assembly-free ,[INFO]Computer Science [cs] ,Mathematics of computing → Combinatorial algorithms ,alignment-free ,reference-free ,referencefree ,Phylogeny ,Applied computing → Bioinformatics - Abstract
Molecular phylogenetics is a fundamental branch of biology. It studies the evolutionary relationships among the individuals of a population through their biological sequences, and may provide insights about the origin and the evolution of viral diseases, or highlight complex evolutionary trajectories. In this paper we develop a method called phyBWT, describing how to use the extended Burrows-Wheeler Transform (eBWT) for a collection of DNA sequences to directly reconstruct phylogeny, bypassing the alignment against a reference genome or de novo assembly. Our phyBWT hinges on the combinatorial properties of the eBWT positional clustering framework. We employ eBWT to detect relevant blocks of the longest shared substrings of varying length (unlike the k-mer-based approaches that need to fix the length k a priori), and build a suitable decomposition leading to a phylogenetic tree, step by step. As a result, phyBWT is a new alignment-, assembly-, and reference-free method that builds a partition tree without relying on the pairwise comparison of sequences, thus avoiding to use a distance matrix to infer phylogeny. The preliminary experimental results on sequencing data show that our method can handle datasets of different types (short reads, contigs, or entire genomes), producing trees of quality comparable to that found in the benchmark phylogeny., LIPIcs, Vol. 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022), pages 23:1-23:19
- Published
- 2022
- Full Text
- View/download PDF
41. Estimates of Genomic Heritability and the Marker-Derived Gene for Re(Production) Traits in Xinggao Sheep
- Author
-
Zaixia Liu, Shaoyin Fu, Xiaolong He, Xuewen Liu, Caixia Shi, Lingli Dai, Biao Wang, Yuan Chai, Yongbin Liu, and Wenguang Zhang
- Subjects
genetic parameter evaluation ,BWT ,LS ,Genetics ,Xinggao sheep ,MATs ,Genetics (clinical) ,WWT ,ADG - Abstract
Xinggao sheep are a breed of Chinese domestic sheep that are adapted to the extremely cold climatic features of the Hinggan League in China. The economically vital reproductive trait of ewes (litter size, LS) and productive traits of lambs (birth weight, BWT; weaning weight, WWT; and average daily gain, ADG) are expressed in females and later in life after most of the selection decisions have been made. This study estimated the genetic parameters for four traits to explore the genetic mechanisms underlying the variation, and we performed genome-wide association study (GWAS) tests on a small sample size to identify novel marker trait associations (MTAs) associated with prolificacy and growth. We detected two suggestive significant single-nucleotide polymorphisms (SNPs) associated with LS and eight significant SNPs for BWT, WWT, and ADG. These candidate loci and genes also provide valuable information for further fine-mapping of QTLs and improvement of reproductive and productive traits in sheep.
- Published
- 2023
- Full Text
- View/download PDF
42. Burrows-Wheeler based JPEG
- Author
-
Yair Wiseman
- Subjects
BWT ,JPEG ,Data Compression ,Science (General) ,Q1-390 - Abstract
Recently, the use of the Burrows-Wheeler method for data compression has been expanded. A method of enhancing the compression efficiency of the common JPEG standard is presented in this paper, exploiting the Burrows-Wheeler compression technique. The paper suggests a replacement of the traditional Huffman compression used by JPEG by the Burrows-Wheeler compression. When using high quality images, this replacement will yield a better compression ratio. If the image is synthetic, even a poor quality image can be compressed better.
- Published
- 2007
- Full Text
- View/download PDF
43. A Comparative Analysis of Compression Techniques – The Sparse Coding and BWT.
- Author
-
Pradhan, Annapurna, Pati, Nibedita, Rup, Suvendu, Panda, Avipsa S., and Kanoje, Lalit Kumar
- Subjects
COMPARATIVE studies ,PIXELS ,ALGORITHMIC randomness ,ALGEBRA ,ORTHOGONAL arrays - Abstract
The process of image compression has been the most researched area for decades. Image compression is a necessity for the transmission of images and the storage of images in an efficient manner. This is because image compression represents image having less correlated pixels, eliminates redundancy and also removes irrelevant pixels. The most commonly known techniques for image compression are JPEG and JPEG 2000. But these two have certain drawbacks and thus various other techniques have been popping up, of late. Recently, a growing interest has been marked for the use of basis selection algorithms for signal approximation and compression. In the recent past, the orthogonal and bi-orthogonal complete dictionaries (like the Discrete Cosine Transform (DCT) or wavelets) have been the dominant transform domain representations. But, the DCT and the wavelet transform techniques experience blocking and ringing artefacts and also these are not capable of capturing directional information. Hence, sparse coding method (by Orthogonal Matching Pursuit (OMP) algorithm) comes into picture. Another, novel technique that has taken up recent interests of the image compression area is the Burrows-Wheeler transform (BWT). BWT is generally applied prior to entropy encoding for a better regularity structure. The paper puts forth the comparison results of the methods of sparse approximation and BWT. The comparison analysis was done using the two techniques on various images, out of which one has been given in the paper. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
44. COMPARISON OF OPEN SOURCE COMPRESSION ALGORITHMS ON VHR REMOTE SENSING IMAGES FOR EFFICIENT STORAGE HIERARCHY.
- Author
-
Akoguz, A., Bozkurt, S., Gozutok, A. A., Alp, G., Turan, E. G., Bogaz, M., and Kent, S.
- Subjects
REMOTE sensing ,IMAGE processing - Abstract
High resolution level in satellite imagery came with its fundamental problem as big amount of telemetry data which is to be stored after the downlink operation. Moreover, later the post-processing and image enhancement steps after the image is acquired, the file sizes increase even more and then it gets a lot harder to store and consume much more time to transmit the data from one source to another; hence, it should be taken into account that to save even more space with file compression of the raw and various levels of processed data is a necessity for archiving stations to save more space. Lossless data compression algorithms that will be examined in this study aim to provide compression without any loss of data holding spectral information. Within this objective, well-known open source programs supporting related compression algorithms have been implemented on processed GeoTIFF images of Airbus Defence & Spaces SPOT 6 & 7 satellites having 1.5 m. of GSD, which were acquired and stored by ITU Center for Satellite Communications and Remote Sensing (ITU CSCRS), with the algorithms Lempel-Ziv-Welch (LZW), Lempel-Ziv-Markov chain Algorithm (LZMA & LZMA2), Lempel-Ziv-Oberhumer (LZO), Deflate & Deflate 64, Prediction by Partial Matching (PPMd or PPM2), Burrows-Wheeler Transform (BWT) in order to observe compression performances of these algorithms over sample datasets in terms of how much of the image data can be compressed by ensuring lossless compression. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
45. 采用 BWT 的多核并行的子串匹配算法.
- Author
-
王佳英, 王斌, 李晓华, and 杨晓春
- Abstract
In order to solve the problem that P-BWT (Burrows-Wheeler transform) could only support short queries, and work on a uniprocessor, a multi-core parallel exact matching algorithm was proposed which any query length could be supposed. Firstly, the search process on P-BWT index was modified. When a query spans multiple data fragments, it first searches on the last segment, then verifies on the other segments. Further, a parallel algorithm was proposed to reduce the iterations in the search and verify process. Finally, the experimental study show that using the proposed algorithm, the substring matching task could be accomplished efficiently in parallel manner. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
46. Fast, accurate, and lightweight analysis of BS-treated reads with ERNE 2.
- Author
-
Prezza, Nicola, Vezzi, Francesco, Käller, Max, and Policriti, Alberto
- Subjects
- *
EPIGENETICS , *DNA methylation , *SULFITES , *BIOINFORMATICS , *MEMORY - Abstract
Background: Bisulfite treatment of DNA followed by sequencing (BS-seq) has become a standard technique in epigenetic studies, providing researchers with tools for generating single-base resolution maps of whole methylomes. Aligning bisulfite-treated reads, however, is a computationally difficult task: bisulfite treatment decreases the (lexical) complexity of low-methylated genomic regions, and C-to-T mismatches may reflect cytosine unmethylation rather than SNPs or sequencing errors. Further challenges arise both during and after the alignment phase: data structures used by the aligner should be fast and should fit into main memory, and the methylation-caller output should be somehow compressed, due to its significant size. Methods: As far as data structures employed to align bisulfite-treated reads are concerned, solutions proposed in the literature can be roughly grouped into two main categories: those storing pointers at each text position (e.g. hash tables, suffix trees/arrays), and those using the information-theoretic minimum number of bits (e.g. FM indexes and compressed suffix arrays). The former are fast and memory consuming. The latter are much slower and light. In this paper, we try to close this gap proposing a data structure for aligning bisulfite-treated reads which is at the same time fast, light, and very accurate. We reach this objective by combining a recent theoretical result on succinct hashing with a bisulfite-aware hash function. Furthermore, the new versions of the tools implementing our ideas [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
47. Better quality score compression through sequence-based quality smoothing
- Author
-
Matteo Comin and Yoshihiro Shibuya
- Subjects
Quality Control ,Compressed suffix array ,FASTQ format ,BWT ,FASTQ compression ,FM-Index ,Computer science ,Data_CODINGANDINFORMATIONTHEORY ,Lossy compression ,computer.software_genre ,lcsh:Computer applications to medicine. Medical informatics ,Polymorphism, Single Nucleotide ,Biochemistry ,Exponential growth ,Structural Biology ,Humans ,Entropy (information theory) ,Molecular Biology ,lcsh:QH301-705.5 ,Base Sequence ,Research ,Applied Mathematics ,High-Throughput Nucleotide Sequencing ,Data Compression ,Computer Science Applications ,ROC Curve ,lcsh:Biology (General) ,Quality Score ,lcsh:R858-859.7 ,Data mining ,computer ,Algorithms ,Software ,Smoothing ,FM-index - Abstract
Motivation Current NGS techniques are becoming exponentially cheaper. As a result, there is an exponential growth of genomic data unfortunately not followed by an exponential growth of storage, leading to the necessity of compression. Most of the entropy of NGS data lies in the quality values associated to each read. Those values are often more diversified than necessary. Because of that, many tools such as Quartz or GeneCodeq, try to change (smooth) quality scores in order to improve compressibility without altering the important information they carry for downstream analysis like SNP calling. Results We use the FM-Index, a type of compressed suffix array, to reduce the storage requirements of a dictionary of k-mers and an effective smoothing algorithm to maintain high precision for SNP calling pipelines, while reducing quality scores entropy. We present YALFF (Yet Another Lossy Fastq Filter), a tool for quality scores compression by smoothing leading to improved compressibility of FASTQ files. The succinct k-mers dictionary allows YALFF to run on consumer computers with only 5.7 GB of available free RAM. YALFF smoothing algorithm can improve genotyping accuracy while using less resources. Availability https://github.com/yhhshb/yalff
- Published
- 2019
- Full Text
- View/download PDF
48. Variable-order reference-free variant discovery with the Burrows-Wheeler Transform
- Author
-
Prezza, Nicola, Pisanti, Nadia, Sciortino, Marinella, and Rosone, Giovanna
- Published
- 2020
- Full Text
- View/download PDF
49. The pseudo-distance technique for parallel lossless compression of color-mapped images.
- Author
-
Koc, Basar, Arnavut, Ziya, and Koçak, Hüseyin
- Subjects
- *
PSEUDODISTANCES , *PARALLEL computers , *DATA compression , *IMAGE processing , *COLOR image processing - Abstract
Data compression is a challenging process with important practical applications. Specialized techniques for lossy and lossless data compression have been the subject of numerous investigations during last several decades. Previously, we studied the use of the pseudo-distance technique (PDT) in lossless compression of color-mapped images and its parallel implementation. In this paper we present a new technique (PDT2) to improve compression gain of PDT. We also present a parallelized implementation of the new technique, which results in substantial gains in compression time while providing the desired compression efficiency. We demonstrate that on non-dithered images PDT2 outperforms PDT by 22.4% and PNG by 29.3%. On dithered images, PDT2 achieves compression gains of 7.1% over PDT and 23.8% over PNG. We also show that the parallel implementation of PDT2, while compromising compression less than 0.3%, achieves near linear speedup and utilization of Intel Hyper-Threading technology on supported systems improves speedup on average 18%. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
50. Surface defect detection and classification in mandarin fruits using fuzzy image thresholding, binary wavelet transform and linear classifier model.
- Author
-
Kamalakannan, Anandhanarayanan and Rajamanickam, Govindaraj
- Abstract
Machine vision systems with effective image processing methods are used in quality grading of agricultural products. A pattern recognition technique was developed to detect and classify surface defects such as pitting, splitting and stem-end rot found in images of mandarin fruits. The developed technique employs fuzzy thresholding for image segmentation, binary wavelet transform (BWT) for feature extraction and a rule based linear classifier model for detection and classification of the defects. The moment invariants computed from the detail subimage of BWT were taken as feature values. This paper in detail describes about the pattern recognition algorithm and its implementation. The detection and classification results obtained from the algorithm are reported and discussed. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.