Using native and syntenically mapped cDNA alignments to improve de novo gene findingстатья из журнала
Аннотация: Abstract Motivation: Computational annotation of protein coding genes in genomic DNA is a widely used and essential tool for analyzing newly sequenced genomes. However, current methods suffer from inaccuracy and do poorly with certain types of genes. Including additional sources of evidence of the existence and structure of genes can improve the quality of gene predictions. For many eukaryotic genomes, expressed sequence tags (ESTs) are available as evidence for genes. Related genomes that have been sequenced, annotated, and aligned to the target genome provide evidence of existence and structure of genes. Results: We incorporate several different evidence sources into the gene finder AUGUSTUS. The sources of evidence are gene and transcript annotations from related species syntenically mapped to the target genome using TransMap, evolutionary conservation of DNA, mRNA and ESTs of the target species, and retroposed genes. The predictions include alternative splice variants where evidence supports it. Using only ESTs we were able to correctly predict at least one splice form exactly correct in 57% of human genes. Also using evidence from other species and human mRNAs, this number rises to 77%. Syntenic mapping is well-suited to annotate genomes closely related to genomes that are already annotated or for which extensive transcript evidence is available. Native cDNA evidence is most helpful when the alignments are used as compound information rather than independent positionwise information. Availability: AUGUSTUS is open source and available at http://augustus.gobics.de. The gene predictions for human can be browsed and downloaded at the UCSC Genome Browser (http://genome.ucsc.edu) Contact: mstanke@gwdg.de Supplementary information: Supplementary data are available at Bioinformatics online.
Год издания: 2008
Авторы: Mario Stanke, Mark Diekhans, Robert Baertsch, David Haussler
Издательство: Oxford University Press
Источник: Bioinformatics
Ключевые слова: Genomics and Phylogenetic Studies, RNA and protein synthesis mechanisms, Machine Learning in Bioinformatics
Другие ссылки: Bioinformatics (PDF)
Bioinformatics (HTML)
PubMed (HTML)
Bioinformatics (HTML)
PubMed (HTML)
Открытый доступ: hybrid
Том: 24
Выпуск: 5
Страницы: 637–644