Integrated proteogenomics database

Bacteria iconE. coli BW25113_tryptic

This is the parental strain (Genbank #CP009273) of the widely used Escherichia coli Keio gene knockout collection [1].

An iPtgxDB was created by hierarchically integrating protein coding sequences from the following annotation resources:

Hierarchy Resource Link
1 NCBI RefSeq CP009273.1; from 30/10/2014
2 IMG [2] Integrated Microbial Genomes (IMG) initiative of the Joint Genome Institute (JGI); Ga0058822, from 12/08/2014
3 Prodigal [3] Ab initio gene predictions from Prodigal (v2.6)
4 ChemGenome [4] Ab initio gene predictions from ChemGenome (v2.0, http://www.scfbio-iitd.res.in/chemgenome/chemgenomenew.jsp; with parameters: method, Swissprot space; length threshold, 70 nt; initiation codons, ATG, CTG, TTG, GTG)
5 in silico ORFs The in silico ORF annotations were generated as described by Omasits and Varadarajan et al., 2017

Only ORFs above a selectable length threshold (here 18 aa) were considered. The iPtgxDB was created using the hierarchy RefSeq > JGI > Prodigal > ChemGenome > in silico. Files were parsed to extract the identifier, coordinates and sequences of bona fide protein-coding sequences (CDS) and pseudogene entries.

References

  1. Baba, T., Ara, T., Hasegawa, M., Takai, Y., Okumura, Y., Baba, M., Datsenko, K.A., Tomita, M., Wanner, B.L., and Mori, H. 2006. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol 2: 2006.0008.
  2. Markowitz, V.M., Mavromatis, K., Ivanova, N.N., Chen, I.M., Chu, K., and Kyrpides, N.C. 2009. IMG ER: a system for microbial genome annotation expert review and curation. Bioinformatics 25: 2271-2278.
  3. Hyatt, D., Chen, G.L., Locascio, P.F., Land, M.L., Larimer, F.W., and Hauser, L.J. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11: 119.
  4. Singhal, P., Jayaram, B., Dixit, S.B., and Beveridge, D.L. 2008. Prokaryotic gene finding based on physicochemical characteristics of codons calculated from molecular dynamics simulations. Biophys J 94: 4173-4183.
  5. Omasits, U., Varadarajan, A. R., Schmid, M., Goetze, S., Melidis, D., Bourqui, M., Nikolayeva, O., Quebatte, M., Patrignani, A., Dehio, C., Frey, J. E., Robinson, M. D., Wollscheid, B., and Ahrens., C. H. An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics. bioRxiv, Cold Spring Harbor Labs Journals, 2017.
iPtgxDB Release Info
Versions

Version

1
Versions

Date

26.09.2016

Downloads icon Downloads

Compression icon

TAR.GZ

File icon

Size

7.8 MB
Data icon

MD5

ace445e4bb24637e02a4531c6fc2842d
Data icon

SHA1

a8b488accabdaf613567aa6ec2980c5a8193c26a
Compression icon

ZIP

File icon

Size

8.0 MB
Data icon

MD5

a56433e085b8d73170d6162c2351258c
Data icon

SHA1

0442774be41a488dafd41aa1a2b5b4d0ed93cad5