1. SARS-CoV-2 consensus genome reconstruction, quality control, and lineage analysis v1
- Author
-
Benjamin Schwessinger
- Abstract
This protocols is part of the ANU Biosecurity mini-research project #2 "An SARS-COV2 incursion scenario: Genomics, phylogenetics, and incursions." This mini-research project is modeled on the yearly Quality Assurance Program of The Royal College of Pathologists of Australia (RCPAQAP), we take part in together with ACT Pathology. This research project is split into two major parts, identical to how the official RCPAQAP is run every year. Part #1 is focusing on the 'wet- lab' by sequencing SARS-COV2 from real world RNA samples provided by ACT Pathology especially for our ANU biosecurity course (Thank YOU!). Here you will amplify and sequence five (5) RNA samples per research group. You will assess the SARS-COV2 genome sequences for their lineage assignments using online programs, put sequences into a global context, estimate the collection date based on genetic information, and describe mutations in the spike protein. Part #2 is focusing on the 'dry-lab' by investigating a hypothetical incursion scenario in the so-called city Fantastica. You will combine genomic surveillance of SARS-COV2 with case interview data to trace the spread into of SARS-COV2 in the community and into high risk settings. We will provide you with real publicly available SARS-COV2 genome and fantasized case interviews. You will put these two together to trace the spread and suggest potential improvements in containment strategies with a focus on high risk settings. This protocol describes the analysis component of Part #1. The metrics you are suppose to report for each of your samples are mostly borrowed from the official SARS-CoV-2 QAP. Don't worry if not all of these mean something to you at the moment as we will explain them again during the prac. In case all/most of your samples have < 50% genome coverage please also include the analysis of MakeUp for points 1 to 7 and TimeMakeUp for point 8. You can access the MakeUp data here (ANU only). Make sure to read the README file so you understand what each item relates to. The metrics you have to report for each of our samples (or MakeUps) include the following. Consensus genome coverage. Average read depth; You might want to include detailed read depth plots here as well. Pangolin Lineage. NextClade Lineage. Base pair differences relative to the original SARS-CoV-2 genome. Amino acid replacements and deletions in the S (spike) protein sequence. Evaluation if your and/or the MakeUp samples would make the QC cut-off 90% genome coverage and other metrics that you deem important for QC. Would you 'flag' any of your samples as standing out e.g. being negative control? Approximate sampling date of your sequences for month and year. You must report the versions of all tools used in your report and the day the analysis was performed. This is extremely important for reporting as lineage naming and such change VERY frequently during the pandemic and any outbreak. This protocol is applicable for week 9. The following links might be useful for your report: The original publication that describes the sequencing protocol is here: https://academic.oup.com/biomethods/article/5/1/bpaa014/5873518?login=true Original sources describing the consensus reconstruction from raw reads are here: https://artic.network/ncov-2019, https://artic.network/ncov-2019/ncov2019-bioinformatics-sop.html, https://labs.epi2me.io/, https://www.nature.com/articles/s41467-020-20075-6 Other websites and resources used in the protocol: https://igv.org/, https://clades.nextstrain.org/, https://pangolin.cog-uk.io/, https://genome.ucsc.edu/cgi-bin/hgPhyloPlace
- Published
- 2023