Back to Search Start Over

SATIN: a micro and mini satellite mining tool of total genome and coding regions with analysis of perfect repeats polymorphism in coding regions.

Authors :
Dantas, Carlos Willian Dias
da Costa Neto, Sebastião Rodrigues
Alves, Sandy Ingrid Aguiar
da Costa Pinheiro, Kenny
De Los Santos, Edian Franklin Franco
Ramos, Rommel Thiago Jucá
Source :
BMC Bioinformatics; 6/18/2024, Vol. 25 Issue 1, p1-14, 14p
Publication Year :
2024

Abstract

Background: Tandem repeats are specific sequences in genomic DNA repeated in tandem that are present in all organisms. Among the subcategories of TRs we have Satellite repeats, that is divided into macrosatellites, minisatellites, and microsatellites, being the last two of specific interest because they can identify polymorphisms between organisms due to their instability. Currently, most mining tools focus on Simple Sequence Repeats (SSR) mining, and only a few can identify SSRs in the coding regions. Results: We developed a microsatellite mining software called SATIN (Micro and Mini SATellite IdentificatioN tool) based on a new sliding window algorithm written in C and Python. It represents a new approach to SSR mining by addressing the limitations of existing tools, particularly in coding region SSR mining. SATIN is available at https://github.com/labgm/SATIN.git. It was shown to be the second fastest for perfect and compound SSR mining. It can identify SSRs from coding regions plus SSRs with motif sizes bigger than 6. Besides the SSR mining, SATIN can also analyze SSRs polymorphism on coding-regions from pre-determined groups, and identify SSRs differentially abundant among them on a per-gene basis. To validate, we analyzed SSRs from two groups of Escherichia coli (K12 and O157) and compared the results with 5 known SSRs from coding regions. SATIN identified all 5 SSRs from 237 genes with at least one SSR on it. Conclusions: The SATIN is a novel microsatellite search software that utilizes an innovative sliding window technique based on a numerical list for repeat region search to identify perfect, and composite SSRs while generating comprehensible and analyzable outputs. It is a tool capable of using files in fasta or GenBank format as input for microsatellite mining, also being able to identify SSRs present in coding regions for GenBank files. In conclusion, we expect SATIN to help identify potential SSRs to be used as genetic markers. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
14712105
Volume :
25
Issue :
1
Database :
Complementary Index
Journal :
BMC Bioinformatics
Publication Type :
Academic Journal
Accession number :
177963659
Full Text :
https://doi.org/10.1186/s12859-024-05842-2