Pseudomonas stutzeri strain ATCC 14405 (= CCUG 16156) is a gamma-proteobacterium that was originally isolated from a marine environment and has served as a model organism for denitrification studies [1]. Due to its high homology and similar metabolism, it has also been used as a model system for Pseudomonas aeruginosa, an important human pathogen. Recent phylogenomic analyses of the genus Pseudomonas suggest to split it into several genera, including Stutzerimonas with Stutzerimonas stutzeri as type species [2].
In our study [3], we used the first complete genome of ATCC 14405 as basis for conducting bottom-up proteomics and establishing a digest-free, direct sequencing proteomics approach to study cells grown under aerobic and oxygen-limiting conditions. We could show that the digest free approach had some advantages for the identification of novel, small proteins: it was able to detect more PSMs and achieve a higher protein coverage than the standard shotgun proteomics approach. It thus represents a promising option for the detection of these small proteins.
To identify missed protein-coding genes using a proteogenomics approach we created an iPtgxDB by hierarchically integrating protein coding sequences from the following annotation resources:
Hierarchy | Resource | Link |
---|---|---|
1 | NCBI RefSeq 2019 | GCF_015291885.1_ASM1529188v1; from 25.02.2019 |
2 | Prodigal [4] | Ab initio gene predictions from Prodigal 2.6.3 |
3 | ChemGenome [5] | Ab initio gene predictions from ChemGenome 2.0 with method: Swissprot, length threshold: 70 nt, initiation codons: ATG, CTG, TTG, GTG |
4 | in silico ORFs | The in silico ORF annotations were generated as described by Omasits and Varadarajan et al., 2017 [6] |
Only ORFs above a selectable length threshold (here 18 aa) were considered. The iPtgxDB was created using the hierarchy RefSeq 2019 > Prodigal > ChemGenome > in silico. Files were parsed to extract the identifier, coordinates and sequences of bona fide protein-coding sequences (CDS) and pseudogene entries. For extensions or reductions to already annotated CDSs, sequences were only included up to the first tryptic cleavage site, allowing to identify such proteins using the proteomics data obtained by using this protease. For more detail on how we generate iPtgxDBs and how the identifiers can be interpreted, please see reference [6].
iPtgxDB Release Info | |
---|---|
Version
|
1 |
Date
|
26.02.2019 |