Listeria monocytogenes strain EGD-e (serovar 1/2a; Genbank #CP023861), was derived from strain EGD, originally isolated from guinea pigs and used in studies of cell-mediated immunity, and differs quite substantially from EGD . While the strain is already available as a RefSeq strain (Genbank #NC_003210) from the NCBI, its genome was assembled using a mixed strategy of de novo and reference-based assembly, Therefore, we re-sequenced and assembled purely with a de novo hybrid strategy, by combining PacBio and Illumina MiSeq reads, to obtain complete genome sequence for subsequent proteogenomic studies .
An iPtgxDB was created by hierarchically integrating protein coding sequences from these annotation resources:
|1||NCBI RefSeq||CP023861.1; from 23/10/2017|
|2||Prodigal ||Ab initio gene predictions from Prodigal (v1.12)|
|3||in silico ORFs||The in silico ORFs annotations were generated as described by Omasits and Varadarajan et al., 2017|
Only ORFs above a selectable length threshold (here 18 aa) were considered. The iPtgxDB was created using the hierarchy RefSeq > Prodigal > in silico. Files were parsed to extract the identifier, coordinates and sequences of bona fide protein-coding sequences (CDS) and pseudogene entries.
|iPtgxDB Release Info|