-
Notifications
You must be signed in to change notification settings - Fork 19
GENE : TPS
GENE DICTIONARY:
-
Vasant and Giulia started Gene dictionary.
-
This eo_Gene dictionary contain 11951 TPS genes.
TOTAL GENES: 11951
#Creation of eo_Gene dictionary
-
I got 7572 UNIPROTKB AC/ID from following links.
UNIPROTKB AC/IDs:Terpene synthase
UNIPROTKB AC/IDs:Terpene synthase C
Please find above both combined.
-
I also searched uniprot for "terpene synthase AND reviewed:yes" and found new 945 TPS genes.
After removing duplicates (point 1 and 2) and entries with "deleted" annotation, I have total 8409 TPS genes.
-
Gene names, synonyms, organism names (and IDs), protein information, Enzyme Commission number and Gene ontology (molecular information) was retrieved from Uniprot (https://www.uniprot.org/uploadlists/).
-
Gene identifier IDs such as AT5G23960 for Arabidopsis (TPS21) are being used in literature. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3268506/table/t02_01/)
-
So, I mapped Primary and secondary identifiers from Phytomine. (https://phytozome.jgi.doe.gov/phytomine/template.do?name=Proteins%20with%20Two%20PFAM%20Domains&scope=all). I retrieved Gene identifier IDs for 1607 Genes (around 18 species including Arabidopsis).
-
Found 7 new plant species from (https://digitalcommons.wustl.edu/cgi/viewcontent.cgi?filename=5&article=9240&context=open_access_pubs&type=additional)
-
From these 7 species and PFAM domain information, I retrieved Gene Identifier IDs for 510 new TPS genes.
-
Collected 3032 new TPS genes from
Phytomine https://phytozome.jgi.doe.gov/phytomine/begin.do
http://www.nipgr.ac.in/terzyme.html
http://radish.kazusa.or.jp/cgi-bin/keyword.cgi.
https://viggs.dna.affrc.go.jp/
-
Finally, dictionary now contain 11951 TPS genes (8409+510+3032).
-
TPS GENE CLASSIFICATION
Monoterpene sythase: 1062
Sesquiterpene sythase: 2273
Diterpene sythase: 681
Prenyl transferase: 179
Triterpenoid synthase: 94
Uncharacterized: 3237
Total above: 7526
Total TPS Genes: 11951
Terpen synthase domain containing protein, Unannotated genome, Species specific terpen: 4425
-
TPS Gene distribution in different species
-
Creating eo_Gene dictionary and minicorpus
A) I created txt file containing list of all gene names.
B) I created txt file containing list of all species names and terms such as TPS, TPS1, TPS2 and so on.
amidict -v --dictionary eo_Gene--directory gene --input genee.txt create --informat list --outformats xml
pls find dictionary here
getpapers -q "(terpene synthase)" -o corporaTPS -x -p -k 500 -f corporaTPS/log.txt
downloaded 500 papersgetpapers -q "(terpene synthase) AND (characterisation) AND (characterization)" -o corpusTPS -x -p -k 500 -f corporaTPS/log.txt
downloaded around 35 papers -
Testing eo_Gene dictionary
ami -p "corporaTPS" section
ami -p "corporaTPS" search --dictionary eo_Gene.xml
-
eo_Gene1
eo_Gene
-
Difficulties:
a) Some papers mention TPS in Vitis vinifera as VvTPS and VviTPS. Some use TPS1 or TPS01.
b) Gene names are in tables, figures or supplementary files.