-
Notifications
You must be signed in to change notification settings - Fork 17
PFAM domain comparison
- who Konrad Paszkiewicz, khp204 at ex . ac . uk
- what PFAM domain comparison
- date 09/06/2011
#PFAM classification comparison
As HMMER3 is relatively quick, I put TY2482, LB226692 and seven other Escherichia coli genomes through PFAM. The TY2482 data has gone through two iterations - one with Ion Torrent only data and the other incorporating HiSeq reads. The MIRA assembly from Nick Loman was used as well as BGI's latest assembly incorporating the HiSeq reads. The idea being that we can see at a glance which domains are shared by the outbreak strains, and even whether there are any significant differences between them. This follows in the footsteps of excellent detective work done by Kat Holt and David Studholme. I hope this will be complementary and additional confirmation of Kat, David and other's in-depth analyses.
Assembly errors can come into play here. This one reason why the original Ion Torrent only assembly of TY2482 has been included - just so we can compare like-with-like.
##Method For each genome ORFs were called using the EMBOSS getorf program. Minimum ORF size of 102 nt and* the bacterial codon usage table was used. These were then passed to the pfam_scan.pl program with default parameters and the February 2011 PFAM release.
##Results The results were then parsed and grouped according to which samples they appeared in. The results are available here. The original PFAM result files are provided along with the ORFs identified.
###The following PFAM domains were identified as being exclusive to both TY2482 assemblies and LB226692.
- PF00161 Ribosome Inactivating Protein
- PF02258 Shiga-like toxin beta subunit
- PF02342 TerD Bacterial stress protein
- PF02411 MerT Mercuric transport protein
- PF02517 Abi CAAX amino terminal protease self-immunity
- PF02677 DUF208
- PF02794 HlyC RTX toxin acyltransferase family
- PF03203 MerC mercury resistance protein
- PF03412 Peptidase C39 family
- PF05052 MerE Mercury resistance operon
- PF05063 MT-A70 S-adenosylmethionine-binding subunit of human mRNA
- PF05268 GP38 Phage tail fibre adhesin Gp38
- PF05696 DUF826 Enterobacterial and siphoviral sequences of unknown function
- PF05766 NinG Bacteriophage Lambda NinG protein
- PF05775 AfaD Enterobacteria AfaD invasin protein
- PF06504 RepC Replication protein C
- PF06616 BsuBI_PstI_RE BsuBI/PstI restriction endonuclease C-terminus
- PF06722 DUF1205
- PF07125 DUF1378
- PF07669 Eco57I Restriction-modification methylase
- PF08273 Prim_Zn_Ribbon Zinc-binding domain of primase-helicase
- PF08861 DUF1828
- PF09048 Cro Gene repressor
- PF09848 DUF2075
- PF10065 DUF2303
- PF10138 Tellurium resistance protein
- PF10828 DUF2570
- PF11202 DUF2983
- PF11809 DUF3330
- PF12500 DUF3706
###PFAM domains unique to just the LB226692 isolate:
- PF03845 Spore_permease (Spore germination protein, APC transporter)
- PF05969 PSII_Ycf12 (Photosystem II complex subunit Ycf12)
- PF07495 Y_Y_Y (Beta propeller)
The presence of photosystem II is likely spurious or due to misassembly or contaimination.
###PFAM domains unique to TY2482 These were obtained from the TY2482 MIRA assembly. Interestingly, the TY2482 BGI Hiseq+Ion Torrent assembly did not have any unique PFAM domains. This could indicate that these are false matches due to an assembly, or that the most recent assembly was too stringent.
- PF00689 Cation transporting ATPase, C-terminus
- PF01490 Aa_trans UNC 47 Amino acid transporter/MTR methyltryptophan resistance
- PF02030 Lipoprotein_8
- PF03552 Cellulose synthase
- PF04893 Yip1 Golgi apparatus protein involved in vesicular transport
- PF10569 Alpha-macro-globulin thiol-ester bond-forming region
Again the presence of eukaryotic specific PFAM hits is likely to be spurious or due to mis-assembly of contamination.
###Restricting the analysis to just 55989 and the two outbreak strains
This increases sensitivity but may introduce spurious hits. It also allows us to check that other bacteria in the comparison don't contain virulence factors which would be masked by the above assembly.
####Present in LB226692 and TY2482 hiseq assembly but absent from 55989 and MIRA assembly PF01896 Eukaryotic and archaeal DNA primase small subunit PF10439 Bacteriocin class II with double-glycine leader peptide ####Present in LB226692 and TY2482 MIRA assembly but absent from 55989 and hiseq assembly PF08378 NERD Nuclease-related domain ####Present in TY2482 MIRA assembly only [PF09039]((http://pfam.sanger.ac.uk/family?acc=PF09039) Mu DNA binding, I gamma subdomain
##Conclusion Its reassuring to see tellurium resistance and other resistance and toxin proteins detected. It seems as though the two isolates are identical at a functional level. Beyond that is beyond my competence to interpret.
What is clear is that there does not appear to be a clear functional difference between the two isolates. However, it has to be bourne in mind that SNPs and other indels may enhance/suppress traits which would be undetectable by this analysis. More on that later I hope!
##Addendum - Marcin Grynberg (mr.cingg < at > gmail.com) 20/06/2011
Marcin has very kindly added some more detailed annotations to the Domains of Unknown Function (DUFs) using Hhpred. Options were left at default except for compositional bias correction which was left off.
###Present in TY2482 assemblies and LB226692
- PF02677 DUF208 PP-loop ATP-utilizing enzyme PIRSF006661 UCP006661_PP-loop Probab=99.79 E-value=2.6e-18 Score=144.82 Aligned_cols=143 Identities=18% Similarity=0.161 Sum_probs=0.0
- PF05696 DUF826 KOG3229 Vacuolar sorting protein VPS24 [Intracellular trafficking, secretion, and vesicular transport] Probab=95.87 E-value=0.082 Score=43.96 Aligned_cols=79 Identities=33% Similarity=0.368 Sum_probs=0.0
- PF06722 2p6p_A Glycosyl transferase; GT-B family, X-RAY-diffraction,urdamycina- biosynthesis; 1.88A {Streptomyces fradiae} Probab=99.58 E-value=6.2e-15 Score=115.55 Aligned_cols=91 Identities=22% Similarity=0.295 Sum_probs=0.0
- PF07125 TraQ type-F conjugative transfer system pilin chaperone TraQ Probab=95.42 E-value=0.19 Score=32.27 Aligned_cols=44 Identities=16% Similarity=0.514 Sum_probs=0.0
- PF08861 Mga helix-turn-helix domain Probab=93.25 E-value=1.1 Score=29.08 Aligned_cols=38 Identities=21% Similarity=0.325 Sum_probs=0.0
- PF09848 DUF2075 [According to Pfam, belongs to the AAA clan (CL0023)]
- PF10828 PHA02047 phage lambda Rz1-like protein Probab=99.31 E-value=4e-10 Score=79.99 Aligned_cols=92 Identities=22% Similarity=0.219 Sum_probs=0.0
- PF11202 Pyrimidine operon regulator PyrR {Thermus thermophilus [TaxId: 274]}; PRK13600 putative ribosomal protein L7Ae-like Probab=96.62 E-value=0.036 Score=52.54 Aligned_cols=122 Identities=17% Similarity=0.147 Sum_probs=0.0; Probab=95.98 E-value=0.063 Score=46.86 Aligned_cols=56 Identities=18% Similarity=0.270 Sum_probs=0.0
- PF11809 pfam08394 Arc_trans_TRASH Archaeal TRASH domain. This region is found in the C-terminus of a number of archaeal transcriptional regulators. It is thought to function as a metal-sensing regulatory module. Probab=97.24 E-value=2.5e-05 Score=45.37 Aligned_cols=36 Identities=25% Similarity=0.438 Sum_probs=0.0
- PF12500 DUF3706
- PF10065 DUF2303
Interestingly a TraQ protein is inferred with Hhpred. This could be part of the same system which transferred the virulence factors and shiga toxin to this strain.