Integrated proteogenomics database

Bacteria iconP. stutzeri ATCC 14405 tryptic

Pseudomonas stutzeri strain ATCC 14405 (= CCUG 16156) is a gamma-proteobacterium that was originally isolated from a marine environment and has served as a model organism for denitrification studies [1]. Due to its high homology and similar metabolism, it has also been used as a model system for Pseudomonas aeruginosa, an important human pathogen. Recent phylogenomic analyses of the genus Pseudomonas suggest to split it into several genera, including Stutzerimonas with Stutzerimonas stutzeri as type species [2].

In our study [3], we used the first complete genome of ATCC 14405 as basis for conducting bottom-up proteomics and establishing a digest-free, direct sequencing proteomics approach to study cells grown under aerobic and oxygen-limiting conditions. We could show that the digest free approach had some advantages for the identification of novel, small proteins: it was able to detect more PSMs and achieve a higher protein coverage than the standard shotgun proteomics approach. It thus represents a promising option for the detection of these small proteins.

To identify missed protein-coding genes using a proteogenomics approach we created an iPtgxDB by hierarchically integrating protein coding sequences from the following annotation resources:

Hierarchy Resource Link
1 NCBI RefSeq 2019 GCF_015291885.1_ASM1529188v1; from 25.02.2019
2 Prodigal [4] Ab initio gene predictions from Prodigal 2.6.3
3 ChemGenome [5] Ab initio gene predictions from ChemGenome 2.0 with method: Swissprot, length threshold: 70 nt, initiation codons: ATG, CTG, TTG, GTG
4 in silico ORFs The in silico ORF annotations were generated as described by Omasits and Varadarajan et al., 2017 [6]

Only ORFs above a selectable length threshold (here 18 aa) were considered. The iPtgxDB was created using the hierarchy RefSeq 2019 > Prodigal > ChemGenome > in silico. Files were parsed to extract the identifier, coordinates and sequences of bona fide protein-coding sequences (CDS) and pseudogene entries. For extensions or reductions to already annotated CDSs, sequences were only included up to the first tryptic cleavage site, allowing to identify such proteins using the proteomics data obtained by using this protease. For more detail on how we generate iPtgxDBs and how the identifiers can be interpreted, please see reference [6].

References

  1. Peña, A., Busquets, A., Gomila, M., Bosch, R., Nogales, B., García-Valdés, E., Lalucat, J. and Bennasar, A. 2012. Draft Genome of Pseudomonas stutzeri Strain ZoBell (CCUG 16156), a Marine Isolate and Model Organism for Denitrification Studies. Journal of Bacteriology. 194: 1277-1278.
  2. Lalucat, J., Gomila, M., Mulet, M., Zaruma, A. and García-Valdés, E. 2022. Past, present and future of the boundaries of the Pseudomonas genus: Proposal of Stutzerimonas gen. Nov. Syst Appl Microbiol. 45: 126289.
  3. Meier-Credo, J., Heiniger, B., Schori, C., Rupprecht, F., Michel, H., Ahrens, C. H. and Langer, J.D. 2023, Detection of known and novel small proteins in Pseudomonas stutzeri using a combination of bottom-up and digest-free proteomics and proteogenomics. Analytical Chemistry. submitted.
  4. Hyatt, D., Chen, G.L., Locascio, P.F., Land, M.L., Larimer, F.W., and Hauser, L.J. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11: 119.
  5. Singhal, P., Jayaram, B., Dixit, S.B., and Beveridge, D.L. 2008. Prokaryotic gene finding based on physicochemical characteristics of codons calculated from molecular dynamics simulations. Biophys J 94: 4173-4183.
  6. Omasits, U., Varadarajan, A. R., Schmid, M., Goetze, S., Melidis, D., Bourqui, M., Nikolayeva, O., Quebatte, M., Patrignani, A., Dehio, C., Frey, J. E., Robinson, M. D., Wollscheid, B., and Ahrens., C. H. 2017. An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics. Genome Research. 27: 2083-2095.
iPtgxDB Release Info
Versions
Version
1
Versions
Date
26.02.2019

Downloads icon Downloads

Compression icon

TAR.GZ

File icon
Size
11.3 MB
Data icon
MD5
4410de84225590a7dbab20763c91f9f2
Data icon
SHA1
99ba4587f736c4fd84cb9942293d3b9a9d6d357a
Compression icon

ZIP

File icon
Size
11.6 MB
Data icon
MD5
5395f173d693bf17e400b07abc488fe7
Data icon
SHA1
42a81083d2dc58a1f8b52c826e157e8063b2126b