-
Notifications
You must be signed in to change notification settings - Fork 17
Automatic annotation of e. coli h112180280 strain (hpa sequence and assembly)
The Oh no sequences! (Era7) automatic annotation of E. coli H112180280 strain is already available. Assembly provided by the Health Protection Agency (HPA) was used. This is a de novo assembly (not confirmed) of 454 reads. Get the assembly from the repos https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/tree/master/strains/H112180280/seqProject/HealthProtectionAgencyUK/assemblies/HealthProtectionAgencyUK
Annotation was done with BG7 using this set of reference proteins (137,063 proteins in total):
- The representative Uniprot proteins corresponding to all Uniref90 clusters for all Escherichia coli proteins
- All Uniprot proteins from organisms including in their name the terms “EHEC” or “EAEC”
- All Uniprot proteins from bacteria that have in any Uniprot field the term “toxin”
- All Uniprot proteins from bacteria that have in any Uniprot field “hemolysin”
- All the proteins from Salmonella typhi, Yersinia pestis and Shigella dysenteriae
5,916 genes were detected
- 5,792 protein encoding genes
- 124 RNA genes
4,912 out of the 5,792 (84.80%) protein encoding genes have canonical start and stop codon and haven´t either frame-shifts or intragenic stop codons.
615 out of the 5,792 (10.61%) protein encoding genes have some frameshifts or intragenic stop codon in their sequences, probably caused by inherent technology errors.
You can get the results of the annotation from the repos https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/tree/master/strains/H112180280/seqProject/HealthProtectionAgencyUK/annotations/era7bioinformatics/era7_HPA_H112180280_annotations