Skip to content

GENE : TPS

Sagar Jadhav edited this page Jul 29, 2021 · 69 revisions

GENE DICTIONARY:

  1. Vasant and Giulia started Gene dictionary.

  2. eo_Gene in excel

  3. This eo_Gene dictionary contain 11951 TPS genes from 483 species.

    TOTAL GENES: 11951

    UNIPROT IDs: 8420

    GENE IDENTIFIER IDs: 5178

  4. TPS GENE CLASSIFICATION

    Monoterpene sythase: 1062

    Sesquiterpene sythase: 2273

    Diterpene sythase: 681

    Prenyl transferase: 179

    Triterpenoid synthase: 94

    Uncharacterized: 3237

    Total above: 7526

    Total TPS Genes: 11951

    Terpen synthase domain containing protein, Unannotated genome, Species specific terpen: 4425

    TPS Classification

#Creation of eo_Gene dictionary

  1. I got 7572 UNIPROTKB AC/ID from following links.

    UNIPROTKB AC/IDs:Terpene synthase.

    UNIPROTKB AC/IDs:Terpene synthase C

    UNIPROTKB AC/ID retrieval for TPS

  2. I also searched uniprot for "terpene synthase AND reviewed:yes" and found new 945 TPS genes. After removing duplicates, I have total 8409 TPS genes.

  3. GENE TPS UniprotKB AC/IDs for 8409 genes

  4. Gene names, synonyms, organism names (and IDs), protein information, Enzyme Commission number, PFAM domain and Gene ontology (molecular information) was retrieved from Uniprot (https://www.uniprot.org/uploadlists/).

    uniprot

  5. Gene identifiers are an important component of literature. Gene identifier IDs such as AT5G23960 for Arabidopsis (TPS21) are being used in literature. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3268506/table/t02_01/)

  6. So, I mapped Primary and secondary identifiers from Phytomine. (https://phytozome.jgi.doe.gov/phytomine/template.do?name=Proteins%20with%20Two%20PFAM%20Domains&scope=all). I retrieved Gene identifier IDs for 1607 Genes (around 18 species including Arabidopsis).

  7. Found 7 new plant species from (https://digitalcommons.wustl.edu/cgi/viewcontent.cgi?filename=5&article=9240&context=open_access_pubs&type=additional)

  8. From these 7 species and PFAM domain information, I retrieved Gene Identifier IDs for 539 new TPS genes.

  9. Collected new TPS genes from

    Phytomine (leftover species),

    http://www.nipgr.ac.in/terzyme.html

    http://radish.kazusa.or.jp/cgi-bin/keyword.cgi.

    www.rosaceae.org

    www.solgenomics.net

    www.citrusgenomedb.org

    www.pulsedb.org

    https://viggs.dna.affrc.go.jp/

    www.cucurbitgenomics.org

    www.banana-genome-hub.southgreen.fr

    www.morus.swu.edu.cn

  10. By using above mentioned genome brawsers, I retrieved 3003 genes. Finally, dictionary now contain 11951 TPS genes (8409+539+3003).

  11. TPS Gene distribution in different species

    all taxons

    monocots

    Dicots

    Dicots

    Gymnosperms and others

    gene density monocot

  12. Creating eo_Gene dictionary and minicorpus

    A) I created txt file containing list of all gene names.

    B) I created txt file containing list of all species names and terms such as TPS, TPS1, TPS2 and so on.

    amidict -v --dictionary eo_Gene--directory gene --input genee.txt create --informat list --outformats xml

    pls find dictionary here

    Gene1

    eogene

    getpapers -q "(terpene synthase)" -o corporaTPS -x -p -k 500 -f corporaTPS/log.txt downloaded 500 papers

    getpapers -q "(terpene synthase) AND (characterisation) AND (characterization)" -o corpusTPS -x -p -k 500 -f corporaTPS/log.txt
    downloaded around 35 papers

  13. Testing eo_Gene dictionary

    ami -p "corporaTPS" section

    ami -p "corporaTPS" search --dictionary eo_Gene.xml

  14. eo_Gene1

    Gene

    eo_Gene

    Gene

  15. Difficulties:

    a) Some papers mention TPS in Vitis vinifera as VvTPS and VviTPS. Some use TPS1 or TPS01.

    b) Gene names are in tables, figures or supplementary files.

Clone this wiki locally