Integrated proteogenomics database

Bacteria iconB. subtilis 168_Lys-C

Bacillus subtilis strain 168 (Genbank #NC_00964.3) is one of the well studied bacterial strain for this widely used Gram-positive prokaryotic model organism [1].

iPtgxDB was created by hierarchically integrating protein coding sequences from the following annotation resources:

Hierarchy Resource Link
1 NCBI RefSeq 2018 GCA_000009045.1_ASM904v1; from 15/01/2018
2 NCBI RefSeq 2017 GCF_000009045.1_ASM904v1; from 21/05/2017
3 Genoscope [2] v2.7.3, accessed 17/07/2018
4 IMG [3] Integrated Microbial Genomes (IMG) initiative of the Joint Genome Institute (JGI); Taxon ID: 646311909, from 17/07/2018
5 Prodigal [4] Ab initio gene predictions from Prodigal (v2.6)
6 ChemGenome [5] Ab initio gene predictions from ChemGenome (v2.0, http://www.scfbio-iitd.res.in/chemgenome/chemgenomenew.jsp; with parameters: method, Swissprot space; length threshold, 70 nt; initiation codons, ATG, CTG, TTG, GTG)
7 in silico ORFs The in silico ORF annotations were generated as described by Omasits and Varadarajan et al., 2017 [6]

Only ORFs above a selectable length threshold (here 18 aa) were considered. The iPtgxDB was created using the hierarchy RefSeq 2018 > RefSeq 2017 > Genoscope > JGI > Prodigal > ChemGenome > in silico. Files were parsed to extract the identifier, coordinates and sequences of bona fide protein-coding sequences (CDS) and pseudogene entries. For extensions or reductions to already annotated CDSs, sequences were only included up to the first LysC cleavage site, allowing to identify such proteins using the proteomics data obtained by using any of these alternative proteases.

References

  1. Kunst, F., Ogasawara, N., Moszer, I., Albertini, A.M., Alloni, G. et al. 1997. The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature 390:249-56.
  2. Vallenet, D., Belda, E., Calteau, A., Cruveiller, S., Engelen, S., Lajus, A., Le Fevre, F., Longin, C., Mornico, D., Roche, D. et al. 2013. MicroScope--an integrated microbial resource for the curation and comparative analysis of genomic and metabolic data. Nucleic Acids Res 41: D636-647.
  3. Markowitz, V.M., Mavromatis, K., Ivanova, N.N., Chen, I.M., Chu, K., and Kyrpides, N.C. 2009. IMG ER: a system for microbial genome annotation expert review and curation. Bioinformatics 25: 2271-2278.
  4. Hyatt, D., Chen, G.L., Locascio, P.F., Land, M.L., Larimer, F.W., and Hauser, L.J. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11: 119.
  5. Singhal, P., Jayaram, B., Dixit, S.B., and Beveridge, D.L. 2008. Prokaryotic gene finding based on physicochemical characteristics of codons calculated from molecular dynamics simulations. Biophys J 94: 4173-4183.
  6. Omasits, U., Varadarajan, A. R., Schmid, M., Goetze, S., Melidis, D., Bourqui, M., Nikolayeva, O., Quebatte, M., Patrignani, A., Dehio, C., Frey, J. E., Robinson, M. D., Wollscheid, B., and Ahrens., C. H. 2017. An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics. Genome Research. 27: 2083-2095.
iPtgxDB Release Info
Versions
Version
1
Versions
Date
06.02.2020

Downloads icon Downloads

Compression icon

TAR.GZ

File icon
Size
6.8 MB
Data icon
MD5
24190890799a0b3fb5fefed1165e8383
Data icon
SHA1
90c58b6c3e64e3bd6f029898ae1b0cb2a945e8e3
Compression icon

ZIP

File icon
Size
7.0 MB
Data icon
MD5
984fb1fa443102b13dc1b18936ff8128
Data icon
SHA1
c198148adf907725729e0bf935d00a064378aeca