1. fLPS 2.0: rapid annotation of compositionally-biased regions in biological sequences
- Author
-
Paul M. Harrison
- Subjects
Low-complexity ,Intrinsic disorder ,Bioinformatics ,Computer science ,Annotation ,Computational biology ,General Biochemistry, Genetics and Molecular Biology ,DNA sequencing ,Low complexity ,03 medical and health sciences ,0302 clinical medicine ,Filter (mathematics) ,Molecular Biology ,030304 developmental biology ,Sequence (medicine) ,0303 health sciences ,Compositional bias ,General Neuroscience ,Protein ,Skew ,Computational Biology ,General Medicine ,DNA ,Masking ,Domains ,Tool ,Medicine ,General Agricultural and Biological Sciences ,030217 neurology & neurosurgery - Abstract
Compositionally-biased (CB) regions in biological sequences are enriched for a subset of sequence residue types. These can be shorter regions with a concentrated bias (i.e., those termed ‘low-complexity’), or longer regions that have a compositional skew. These regions comprise a prominent class of the uncharacterized ‘dark matter’ of the protein universe. Here, I report the latest version of the fLPS package for the annotation of CB regions, which includes added consideration of DNA sequences, to label the eight possible biased regions of DNA. In this version, the user is now able to restrict analysis to a specified subset of residue types, and also to filter for previously annotated domains to enable detection of discontinuous CB regions. A ‘thorough’ option has been added which enables the labelling of subtler biases, typically made from a skew for several residue types. In the output, protein CB regions are now labelled with bias classes reflecting the physico-chemical character of the biasing residues. The fLPS 2.0 package is available from: https://github.com/pmharrison/flps2 or in a Supplemental File of this paper.
- Published
- 2021