Back to Search
Start Over
Defining the human reference protein-coding gene set
- Source :
- Genome Biology
- Publication Year :
- 2010
- Publisher :
- BioMed Central, 2010.
-
Abstract
- The number of coding genes in the human genome is still under debate [1]. Here, we present a proposal to define the human reference gene set that takes into account the inter-individual differences in gene numbers arising from gene inactivation events, such as premature termination or aberrant splicing due to nonsense SNPs or SNPs at essential splice sites respectively. We have analyzed SNPs (specifically nonsense SNPs and SNPs affecting essential splice sites) from 23 personal genomes and exomes. We see a wide range in numbers of SNPs in each of the categories surveyed. A large fraction of these SNPs are singletons. Using a data set of high-confidence SNPs obtained by intersecting SNPs from dbSNP and the personal genomes, we identify a common set of 279 genes predicted to be pseudogenic (non-functional) in some individuals and functional in others. We focused on two key questions arising from these considerations: (i) Which criteria should be used for inclusion and exclusion of genes from the reference set? (ii) What sequence should be used as the reference for genes that are non-functional in some humans? For the first question, we propose to include all genes that are functional even in one individual to produce a maximally-inclusive set of genes. For the second, we propose the use of the ancestral allele as the reference allele. This will provide a uniform basis for gene annotation and ensure that the reference gene set and sequence will be relatively stable as more individual genomes are sequenced. In the few cases where an ancestral state assignment is unavailable or ambiguous, we propose that genes be annotated as the functional allele.
- Subjects :
- Genetics
0303 health sciences
dbSNP
Selected Oral Presentation
Single-nucleotide polymorphism
Gene Annotation
Biology
Genome
Human genetics
03 medical and health sciences
0302 clinical medicine
Human genome
Allele
10. No inequality
Gene
030217 neurology & neurosurgery
030304 developmental biology
Subjects
Details
- Language :
- English
- ISSN :
- 14656914 and 14656906
- Volume :
- 11
- Issue :
- Suppl 1
- Database :
- OpenAIRE
- Journal :
- Genome Biology
- Accession number :
- edsair.doi.dedup.....30d3341b7ea2421115ee3df0e7f0bba3