Listeria monocytogenes strain EGD-e (serovar 1/2a; Genbank #CP023861), was derived from strain EGD, originally isolated from guinea pigs and used in studies of cell-mediated immunity[1], and differs quite substantially from EGD [2]. While the strain is already available as a RefSeq strain (Genbank #NC_003210) from the NCBI, its genome was assembled using a mixed strategy of de novo and reference-based assembly, Therefore, we re-sequenced and assembled purely with a de novo hybrid strategy, by combining PacBio and Illumina MiSeq reads, to obtain complete genome sequence for subsequent proteogenomic studies [3].
An iPtgxDB was created by hierarchically integrating protein coding sequences from these annotation resources:
Hierarchy | Resource | Link |
---|---|---|
1 | NCBI RefSeq | CP023861.1; from 23/10/2017 |
2 | Prodigal [4] | Ab initio gene predictions from Prodigal (v1.12) |
3 | in silico ORFs | The in silico ORFs annotations were generated as described by Omasits and Varadarajan et al., 2017 |
Only ORFs above a selectable length threshold (here 18 aa) were considered. The iPtgxDB was created using the hierarchy RefSeq > Prodigal > in silico. Files were parsed to extract the identifier, coordinates and sequences of bona fide protein-coding sequences (CDS) and pseudogene entries.
iPtgxDB Release Info | |
---|---|
Version
|
1 |
Date
|
23.10.2017 |