-
Notifications
You must be signed in to change notification settings - Fork 19
GENE : TPS
GENE DICTIONARY:
-
Vasant and Giulia started Gene dictionary.
-
This eo_Gene dictionary contain 11951 TPS genes.
TOTAL GENES: 11951
#Creation of eo_Gene dictionary
-
I got 7572 UNIPROTKB AC/ID from following links.
UNIPROTKB AC/IDs:Terpene synthase
UNIPROTKB AC/IDs:Terpene synthase C
Please find above both combined.
-
I also searched uniprot for "terpene synthase AND reviewed:yes" and found new 945 TPS genes.
After removing duplicates (point 1 and 2) and entries with "deleted" annotation, I have total 8409 TPS genes.
-
Gene names, synonyms, organism names (and IDs), protein information, Enzyme Commission number and Gene ontology (molecular information) was retrieved from Uniprot (https://www.uniprot.org/uploadlists/).
-
Gene identifier IDs such as AT5G23960 for Arabidopsis (TPS21) are being used in literature. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3268506/table/t02_01/)
-
So, I mapped Primary and secondary identifiers from Phytomine. (https://phytozome.jgi.doe.gov/phytomine/template.do?name=Proteins%20with%20Two%20PFAM%20Domains&scope=all).
-
Found 7 new plant species from (https://digitalcommons.wustl.edu/cgi/viewcontent.cgi?filename=5&article=9240&context=open_access_pubs&type=additional)
-
From these 7 species and PFAM domain information, I retrieved Gene Identifier IDs for 510 new TPS genes.
-
Collected 3032 new TPS genes from
Phytomine https://phytozome.jgi.doe.gov/phytomine/begin.do
http://www.nipgr.ac.in/terzyme.html
http://radish.kazusa.or.jp/cgi-bin/keyword.cgi.
https://viggs.dna.affrc.go.jp/
-
Finally, dictionary now contain 11951 TPS genes (8409+510+3032).
-
TPS GENE CLASSIFICATION
Monoterpene sythase: 1062
Sesquiterpene sythase: 2273
Diterpene sythase: 681
Prenyl transferase: 179
Triterpenoid synthase: 94
Uncharacterized: 3237
Total above: 7526
Total TPS Genes: 11951
Terpen synthase domain containing protein, Unannotated genome, Species specific terpen: 4425
-
TPS Gene distribution in different species
-
Creating eo_Gene dictionary and minicorpus
A) I created txt file containing list of all gene names.
B) I created txt file containing list of all species names and terms such as TPS, TPS1, TPS2 and so on.
amidict -v --dictionary eo_Gene--directory gene --input genee.txt create --informat list --outformats xml
pls find dictionary here
getpapers -q "(terpene synthase)" -o corporaTPS -x -p -k 500 -f corporaTPS/log.txt
downloaded 500 papersgetpapers -q "(terpene synthase) AND (characterisation) AND (characterization)" -o corpusTPS -x -p -k 500 -f corporaTPS/log.txt
downloaded around 35 papers -
Testing eo_Gene dictionary
ami -p "corporaTPS" section
ami -p "corporaTPS" search --dictionary eo_Gene.xml
-
eo_Gene1
eo_Gene
-
Difficulties:
a) Some papers mention TPS in Vitis vinifera as VvTPS and VviTPS. Some use TPS1 or TPS01.
b) Gene names are in tables, figures or supplementary files.
Creating TPS Goldstandard:
-
I queried https://europepmc.org/ for following searches and got results as:
Query Number of hits terpene synthase 4308 terpene synthase plant 3447 terpene synthase plant volatile 1200 terpene synthase plant TPS 650 terpene synthase TPS plant volatile 376 terpene synthase TPS plant volatile compounds 355 (Research articles 312)