Back to Search
Start Over
Recommendations for the formatting of Variant Call Format (VCF) files to make plant genotyping data FAIR [version 2; peer review: 2 approved]
- Source :
- F1000Research. 11:ELIXIR-231
- Publication Year :
- 2022
- Publisher :
- London, UK: F1000 Research Limited, 2022.
-
Abstract
- In this opinion article, we discuss the formatting of files from (plant) genotyping studies, in particular the formatting of metadata in Variant Call Format (VCF) files. The flexibility of the VCF format specification facilitates its use as a generic interchange format across domains but can lead to inconsistency between files in the presentation of metadata. To enable fully autonomous machine actionable data flow, generic elements need to be further specified. We strongly support the merits of the FAIR principles and see the need to facilitate them also through technical implementation specifications. They form a basis for the proposed VCF extensions here. We have learned from the existing application of VCF that the definition of relevant metadata using controlled standards, vocabulary and the consistent use of cross-references via resolvable identifiers (machine-readable) are particularly necessary and propose their encoding. VCF is an established standard for the exchange and publication of genotyping data. Other data formats are also used to capture variant data (for example, the HapMap and the gVCF formats), but none currently have the reach of VCF. For the sake of simplicity, we will only discuss VCF and our recommendations for its use, but these recommendations could also be applied to gVCF. However, the part of the VCF standard relating to metadata (as opposed to the actual variant calls) defines a syntactic format but no vocabulary, unique identifier or recommended content. In practice, often only sparse descriptive metadata is included. When descriptive metadata is provided, proprietary metadata fields are frequently added that have not been agreed upon within the community which may limit long-term and comprehensive interoperability. To address this, we propose recommendations for supplying and encoding metadata, focusing on use cases from plant sciences. We expect there to be overlap, but also divergence, with the needs of other domains.
- Subjects :
- Opinion Article
Articles
FAIR
plant
genotyping
snp
vcf
data management
phenotyping
ELIXIR
Subjects
Details
- ISSN :
- 20461402
- Volume :
- 11
- Database :
- F1000Research
- Journal :
- F1000Research
- Notes :
- Revised Amendments from Version 1 In version 2 of this article, we have revised the Abstract and added larger sections to both the Introduction and the Conclusion. In particular, we have addressed the reviewers' comments on the introduction of the VCF recommendation in the broader community as well as various aspects of the FAIRness of the adapted metadata. Throughout the article, we have adjusted and clarified some unclear passages and taken greater care in the correct designation of pronouns and gender-neutral language. We have also submitted a sample dataset to EVA that meets the VCF metadata specifications in this article and added guidance in the FAIR Cookbook on submitting genomic and genotypic data to EMBL-EBI., , [version 2; peer review: 2 approved]
- Publication Type :
- Academic Journal
- Accession number :
- edsfor.10.12688.f1000research.109080.2
- Document Type :
- opinion-article
- Full Text :
- https://doi.org/10.12688/f1000research.109080.2