Welcome to the open resource platform for

integrated proteogenomics databases (iPtgxDBs)

Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes. However, large discrepancies among the number of annotated CDSs, missed functional short ORFs, and overprediction of spurious ORFs represent serious limitations.

Workflow

Our proteogenomics [1,2] strategy for accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab initio gene prediction algorithms and in silico ORFs in an integrated proteogenomics database (iPtgxDB) that covers the entire protein-coding potential of a prokaryotic genome [3].

View details »

Creating iPtgxDBs

iPtgxDBs address an unmet need of the research community, i.e. an open source DB that provides integrated annotations, predictions and a six-frame translation for one respective genome sequence in an easily usable format, both as a search DB (FASTA format) with informative identifiers and a GFF file that integrates all annotations and identifiers. The search DB is highly informative: by extending the PeptideClassifier concept of unambiguous peptides [4], close to 95% of the mass spectrometry-identifiable peptides imply one distinct protein, largely simplifying downstream analysis and overcoming the need to dis-entangle protein groups implied by shared peptides [5].

View details »

Novelties

Using our precomputed iPtgxDBs or by generating their own, researchers can swiftly identify novel short ORFs (sORFs; [6]), start sites or wrongly annotated pseudogenes.

View details »

Our proteogenomics strategy is broadly applicable to key model organisms, where a wealth of reference genome annotations exist, and to newly sequenced genomes.

iPtgxDB schema — **Figure 1. Generating iPtgxDBs for key model organisms and newly sequenced genomes.**
We first release open source iPtgxDBs for several key model organisms, here for *B. henselae* strain Houston-1, *E. coli* BW25113 and *B. diazoefficiens* USDA 110 (left panel). Using proteomics data from any condition or knockout strain (yellow boxes, here schematically shown for *E. coli*), researchers can identify novelties, and iteratively improve the genome annotation e.g. in a community-driven genome Wiki approach [7].
We also enable users to create iPtgxDBs with our software following a step to step protocol even for newly sequenced genomes (right panel). Thereby, results from *ab initio* gene prediction algorithms like Prodigal [8] or ChemGenome [9] and *in silico* predictions (a six frame translation considering alternative start codons) can help to improve genome annotations of newly sequenced genomes.

References

[1]
A. I. Nesvizhskii. 2014. Proteogenomics: concepts, applications and computational strategies. Nature Methods 11: 1114-1125. 10.1038/nmeth.3144.
[2]
C. H. Ahrens, E. Brunner, E. Qeli, K. Basler, and R. Aebersold. 2010. Generating and navigating proteome maps using mass spectrometry. Nature Reviews Molecular Cell Biology 11: 789-801. 10.1038/nrm2973.
[3]
U. Omasits, A. R. Varadarajan, M. Schmid, S. Goetze, D. Melidis, M. Bourqui, O. Nikolayeva, M. Quebatte, A. Patrignani, C. Dehio, J. E. Frey, M. D. Robinson, B. Wollscheid, and C. H. Ahrens. 2017. An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics. Genome Research 27: 2083-2095. 10.1101/gr.218255.116.
[4]
E. Qeli, and C. H. Ahrens. 2010. PeptideClassifier for protein inference and targeted quantitative proteomics. Nature Biotechnology 28: 647-650. 10.1038/nbt0710-647.
[5]
A. I. Nesvizhskii, and R. Aebersold. 2005. Interpretation of shotgun proteomic data: the protein inference problem. Mol Cell Proteomics 4: 1419-1440. 10.1074/mcp.R500012-MCP200.
[6]
G. Storz, Y. Wolf, and K. Ramamurthi. 2014. Small proteins can no longer be ignored. Annual Review of Biochemistry 83: 753-777. 10.1146/annurev-biochem-070611-102400.
[7]
S. L. Salzberg. 2007. Genome re-annotation: a wiki solution?. Genome Biology 8: 102. 10.1186/gb-2007-8-1-102.
[8]
D. Hyatt, G.-L. Chen, P. F. LoCascio, M. L. Land, F. W. Larimer, and L. J. Hauser. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11: 119. 10.1186/1471-2105-11-119.
[9]
P. Singhal, B. Jayaram, S. B. Dixit, and D. L. Beveridge. 2008. Prokaryotic gene finding based on physicochemical characteristics of codons calculated from molecular dynamics simulations. Biophysical Journal 94: 4173-4183. 10.1529/biophysj.107.116392.