Back to Search
Start Over
Loose ends: almost one in five human genes still have unresolved coding status
- Source :
- Nucleic Acids Research, Repisalud, Instituto de Salud Carlos III (ISCIII)
- Publication Year :
- 2018
- Publisher :
- Oxford University Press, 2018.
-
Abstract
- Seventeen years after the sequencing of the human genome, the human proteome is still under revision. One in eight of the 22 210 coding genes listed by the Ensembl/GENCODE, RefSeq and UniProtKB reference databases are annotated differently across the three sets. We have carried out an in-depth investigation on the 2764 genes classified as coding by one or more sets of manual curators and not coding by others. Data from large-scale genetic variation analyses suggests that most are not under protein-like purifying selection and so are unlikely to code for functional proteins. A further 1470 genes annotated as coding in all three reference sets have characteristics that are typical of non-coding genes or pseudogenes. These potential non-coding genes also appear to be undergoing neutral evolution and have considerably less supporting transcript and protein evidence than other coding genes. We believe that the three reference databases currently overestimate the number of human coding genes by at least 2000, complicating and adding noise to large-scale biomedical experiments. Determining which potential non-coding genes do not code for proteins is a difficult but vitally important task since the human reference proteome is a fundamental pillar of most basic research and supports almost all large-scale biomedical projects. National Institutes of Health [2 U41 HG007234 to I.J., L.M., J.M.R. and M.L.T., R01 HG004037 to I.J.]. Funding for open access charge: NIH [2 U41 HG007234]. Sí
- Subjects :
- 0301 basic medicine
INTEGRATED MAP
DNA Copy Number Variations
PREDICTION
DATABASE
Pseudogene
Computational biology
Data Resources and Analyses
Biology
EVOLUTIONARY INFORMATION
Genome
Antibodies
03 medical and health sciences
NUMBER
0302 clinical medicine
Human proteome project
RefSeq
Genetics
Ensembl
HUMAN GENOME
TOPOLOGY
Humans
GENCODE
Genome, Human
Genetic Variation
Proteins
Molecular Sequence Annotation
FUNCTIONALLY IMPORTANT
PROTEOME
030104 developmental biology
Genes
030211 gastroenterology & hepatology
Human genome
UniProt
Corrigendum
PROJECT
Pseudogenes
Coding (social sciences)
Subjects
Details
- Language :
- English
- ISSN :
- 13624962 and 03051048
- Volume :
- 46
- Issue :
- 22
- Database :
- OpenAIRE
- Journal :
- Nucleic Acids Research
- Accession number :
- edsair.doi.dedup.....24e4ed4f76c681f06d769d665cd13da5