Integrated proteogenomics database

Bacteria iconL. monocytogenes EGD-e_tryptic

Listeria monocytogenes strain EGD-e (serovar 1/2a; Genbank #CP023861), was derived from strain EGD, originally isolated from guinea pigs and used in studies of cell-mediated immunity[1], and differs quite substantially from EGD [2]. While the strain is already available as a RefSeq strain (Genbank #NC_003210) from the NCBI, its genome was assembled using a mixed strategy of de novo and reference-based assembly, Therefore, we re-sequenced and assembled purely with a de novo hybrid strategy, by combining PacBio and Illumina MiSeq reads, to obtain complete genome sequence for subsequent proteogenomic studies [3].

An iPtgxDB was created by hierarchically integrating protein coding sequences from these annotation resources:

Hierarchy Resource Link
1 NCBI RefSeq CP023861.1; from 23/10/2017
2 Prodigal [4] Ab initio gene predictions from Prodigal (v1.12)
3 in silico ORFs The in silico ORFs annotations were generated as described by Omasits and Varadarajan et al., 2017

Only ORFs above a selectable length threshold (here 18 aa) were considered. The iPtgxDB was created using the hierarchy RefSeq > Prodigal > in silico. Files were parsed to extract the identifier, coordinates and sequences of bona fide protein-coding sequences (CDS) and pseudogene entries.

References

  1. Glaser, P., Frangeul, L., Buchrieser, C., Rusniok, C., Amend, A., Baquero, F., et al. 2001. Comparative genomics of Listeria species. Science 294: 849–852.
  2. Bécavin, C., Bouchier, C., Lechat, P., Archambaud, C., Creno, S., Gouin, E., et al. 2014. Comparison of widely used Listeria monocytogenes strains EGD, 10403S, and EGD-e highlights genomic variations underlying differences in pathogenicity. MBio 5: e00969–14.
  3. Varadarajan, A. R., Pavlou, M., Goetze, S., Grosboillot, V., Shen, Y., Loessner, M. H., Ahrens, C., Wollscheid, B. 2019. A proteogenomic resource enabling rapid quantitative proteotype profiling of Listeria strains using DIA/SWATH. PLoS Pathog. (manuscript in preparation).
  4. Hyatt, D., Chen, G.L., Locascio, P.F., Land, M.L., Larimer, F.W., and Hauser, L.J. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11: 119.
  5. Omasits, U., Varadarajan, A. R., Schmid, M., Goetze, S., Melidis, D., Bourqui, M., Nikolayeva, O., Quebatte, M., Patrignani, A., Dehio, C., Frey, J. E., Robinson, M. D., Wollscheid, B., and Ahrens., C. H. 2017. An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics. Genome Research. 27: 2083-2095.
iPtgxDB Release Info
Versions
Version
1
Versions
Date
23.10.2017

Downloads icon Downloads

Compression icon

TAR.GZ

File icon
Size
3.9 MB
Data icon
MD5
94d5bf2accf49a51d87dbce365598911
Data icon
SHA1
0a718938ecda2e6b8c65d01683e2af8cf5bdad54
Compression icon

ZIP

File icon
Size
4.0 MB
Data icon
MD5
93d85003136f5a43b5ea32553777e56a
Data icon
SHA1
ab88bf478ccfd1220924035b2d6e06c7c8fa814f