1. A proposed metric set for evaluation of genome assembly quality.
- Author
-
Wang, Peng and Wang, Fei
- Subjects
- *
TANDEM repeats , *CENTROMERE , *QUALITY control , *TELOMERES , *CHROMOSOMES - Abstract
Quality control is essential for genome assemblies; however, a consensus has yet to be reached on what metrics should be adopted for the evaluation of assembly quality. N50 is widely used for contiguity measurement, but its effectiveness is constantly in question. Prevailing metrics for the completeness evaluation focus on gene space, yet challenging areas such as tandem repeats are commonly overlooked. Achieving correctness has become an indispensable dimension for quality control, while prevailing assembly releases lack scores reflecting this aspect. We propose a metric set with a set of statistic indexes for effective, comprehensive evaluation of assemblies and provide a score of a finished assembly for each metric, which can be utilized as a benchmark for achieving high-quality genome assemblies. The lack of a consistent language highlights the value for consensus of the metric set for quality evaluation of genome assembly. We propose a metric set covering eight metrics to measure assembly quality. N50 is not robust enough to measure assembly contiguity. We suggest the ratio of contig counting to chromosome pair number to compensate for the flaws of N50. To assess completeness, the major challenge lies in tandem repeats, including centromeres, telomeres, and ribosomal loci. Also, organellar genomes should be concerned. Correctness is frequently overlooked. To measure it, we suggest assessing both base-level and structural error. Technologies are reaching adulthood to generate a finished eukaryotic assembly. We provide a score of a finished genome for each metric as a reference to measure assembly quality. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF