Skip to content

Progress Summary : Utkarsha Pande

utkarsha-05 edited this page Oct 28, 2021 · 20 revisions

Maintained by : Utkarsha Pande

Table of Contents

Table of contents generated with markdown-toc

System specifications :

  • MacBook Air (13-inch, 2017)
  • Processor: 1.8 GHz Dual-Core Intel Core i5
  • Software: macOS Big Sur version 11.0.1

Date : 08/09/2021

pygetpapers

It is a powerful tool packaged as a python package developed from getpapers that helps in downloading full-text scientific texts, using open access repositories such as API. The tool can be accessed through a command-line interface (terminal).

To know more details on pygetpapers: https://github.com/petermr/pygetpapers

Installing pygetpapers

  1. Make sure you have pip and python installed.
  2. After downloading python, go to terminal and and run the command pip install pygetpapers.
  3. Check if your installation was successful by typing pygetpapers --help.
  4. pygetpapers --help should open the pygetpapers help prompt on terminal.
  • Usage of pygetpapers

Enter your query on terminal : pygetpapers -q "terpene synthase volatile plant_name " -o plant_nameTPS -x -p -s

ami3

ami3 is a toolkit to manage (scholarly) documents by dividing the document(research papers) into various sections like abstract, materials ad methods, results, ethical statements and acknowledgements which can be further used for text search and classifications. ami3 is written in Java, and is designed to be a declarative system, with commands and data modules.

To know more about ami3 and its working: https://github.com/petermr/ami3

Date - 09/09/2021

Installing ami3

  1. Install Java
  • Install the latest version of Java on your system from https://www.java.com/en/download/
  • Test if the installation was successful by typing java -version on command line.
  1. Install JDK
  1. Install git
  1. Install Maven
  • Install maven for your system from https://maven.apache.org/download.cgi. I installed apache-maven-3.8.2-bin.zip from this link.
  • Unzip the apache-maven-3.8.2-bin.zip file in the downloads folder using unzip apache-maven-3.8.2-bin.zip.
  • Move the apache-maven-3.8.2 folder to /Applications by :
pwd
cd downloads/
mv apache-maven-3.8.2/Applications/
cd /Applications
ls
  • Close the terminal and open it again and set the path for mvn installation.
  • For setting path :

-> Open bash profile by open -e .bash_profile(In my case I did it in home directory) this will open a text edit file.

-> If the bash profile is not present in the directory, create it using touch .bash_profile and then open the bash profile using the above command.

-> Press enter to go to new line and edit the text file by pasting this in the file :

export M2_HOME=/Applications/apache-maven-3.8.2
export PATH=$PATH:$M2_HOME/bin

-> Save the file and close it.

-> Type this source .bash_profileon command line.

-> And install maven by mvn -install

You can also follow this youtube link for maven installation : https://youtu.be/j0OnSAP-KtU

  • Check the maven installation by typing mvn -version on the command line, it should show apache-maven-3.8.2.

Date - 10/09/2021

  1. ami3
  • Close and open the terminal and git clone the ami3 repository on the terminal
git clone https://github.com/petermr/ami3.git
  • Drag and drop the ami3 folder from the home directory to the /Applications directory.

  • Set the path for ami3, ppen bash profile by open -e .bash_profile and add the following lines :

export A2_HOME=/Applications/ami3/target/appassembler
export PATH=$PATH:$A2_HOME/bin
  • Save and close the bash profile.

  • On the terminal change directory :

cd \Applications
cd ami3
  • Paste the following command, it will take some time for installation.
mvn install -Dmaven.test.skip=true
  • Check ami3 installation by typing ami --help. It should open the help message.

  • Usage of ami3

  1. Go through each paper to manually scop for TPS related to a crop.
  2. Collect gene (names) terms such as CsTPS, MonoTPS and so on. Put those terms into excel file as a list and save excel file as gene.txt file and save it in the same directory as the ami3.
  3. Enter your query on terminal : amidict -v --dictionary eo_Gene --directory gene --input gene.txt create --informat list --outformats xml.
  4. Generates a XML file with eo_gene dictionary.

Date- 13/09/2021

  • Ran metadata analysis on Tomato corpus and generated CSV file for 'result' and 'abstract' section.

Running Metadata analysis script developed by Shweata Hegde

To know further: Metadata analysis Wikipage

  • Install Visual Studio on the system and added the following extensions python, prettiercode, excel-to-csv.
pip install pandas 
pip install yake
pip install scispacy
pip install bs4
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_ner_bionlp13cg_md-0.4.0.tar.gz

  • Copy paste the metadata script on a new python file in Visual Studio Metadata_analysis.py
  • Uncomment this line #querying_pygetpapers_sectioning("('terpene synthase') AND ('volatile') AND ('Citrus') AND (((SRC:MED OR SRC:PMC OR SRC:AGR OR SRC:CBA) NOT (PUB_TYPE:'Review')))",'200',CPROJECT) and edit the code with your crop of interest.
  • Run the code in the same environment in which all the requirements stated above have been pip-installed.
  • Generates a tps_corpus and a CSV file for the crop containing all the tps gene names and ids from a particular section.

Date- 17-09/2021 - 25/09/2021

Problems faced :

  • ImportError cannot import name Deque - so tried solving it using pip install websockets but Deque is not available for python versions 3.6.0 and below.
  • Updated python version to 3.6.7 in Ragheshwari's system.
  • Re-ran metadata analysis script for Vitis and generated CSV file with TPS names in 'result' section - Vitis metadata script CSV file

Date - 12/10/2021

Validation Software

Pyami code for developing dictionary SOFTWARE test_pyamidict_tdd.py

  1. Clone pyami repo using git clone https://github.com/petermr/pyami.git
  2. Set path open -e .bash_profile. It opens the text editor and then paste the below path. Save and close the text editor.
export P2_HOME=/Users/sagar/pyami
export PATH=$PATH:$P2_HOME/py4ami
  1. Re-open terminal go to test folder in pyami by cd utkarsha/pyami/test.
  2. Install pytest by pip install pytest.
  3. Run pytest test_pyamidict_tdd.py in the test folder.
  4. If it shows ImportError: cannot import name 'AMIDict' from 'py4ami.dict_lib', then open the test_pyamidict_tdd.py code in any code editor and add .. at lines 14 and 15 before 'py4ami'. So the lines 14 and 15 should be :
from ..py4ami.xml_lib import XmlLib
from ..py4ami.dict_lib import AMIDict, AMIDictError, Synonym, Entry
  1. Run pytest test_pyamidict_tdd.py in the test folder. Ran 58 tests passed.

Date - 12/10/2021

Testing Dictionaries

Pyami code for testing dictionaries pyami/test/test_cevopen_tps_dictionaries.py.

  1. Go to test folder in pyami by cd utkarsha/pyami/test.
  2. Install pytest by pip install pytest if not installed.
  3. open the pyami/test/test_cevopen_tps_dictionaries.py code in any code editor and add .. at line 1 before 'py4ami'. So the line 1 should be :
from ..py4ami.dict_lib import AMIDict
  1. Run pytest test_cevopen_tps_dictionaries.py in the test folder. Ran 4 test_dictionaries, 3 passed 1 failed.
pyami.py4ami.dict_lib.AMIDictError: Failed to read URL: https://raw.githubusercontent.com/petermr/crops/main/Vitis%20vinifera/eo_Gene.xml; reason = Not Found

../py4ami/dict_lib.py:496: AMIDictError
FAILED test_cevopen_tps_dictionaries.py::test_vitis_is_valid - pyami.py4ami.dict_lib.AMIDictError: Failed to read URL: https://raw.githubusercontent.com/petermr/crops/main/Vitis%20vinifera/eo_Gene.xml;...

Project Ideas

Title : Analysis of Semantic Terpene Synthase Dictionaries

  • Demonstrating the working of all the intern dictionaries (including the previous interns)

  • Integration of all the dictionaries to enzyme name dictionary to create a knowledgebase for terpene synthases.

  • Adding EC numbers to the enzyme_name dictionary and crops dictionaries (Karya interns, Sagar, me)
  • Adding wikidata id’s to all the dictionaries (Karya interns, Sagar, me)
  • Calculating co-occurrences of enzyme names in each of the crop dictionaries (TBD)
  • Comparative analysis of tps_enzymes between each of the crops (TBD)
  • Entity-relationship modeling using the full datatables to relate tps to other entities like volatile compound emission (TBD)

Slides - https://docs.google.com/presentation/d/1D4EXVkYUjqQtr_56dt_ORE215fX_U5koQUpskCd45qQ/edit?usp=sharing

Work done:

  1. Created dictionaries using wikidata ids for camelia and tomato tps
  2. Generated XML and csv dictionaries. camelia_wikidata.csv , tomato_wikidata.csv , camelia_sparql, tomato_sparql
  3. Converted CSV dictionaries to pandas Dataframes.
  4. Removed duplicates to obtain a list of the tps enzymes for camelia and tomato.
  5. Identifying common tps enzymes in both the crops camelia_tps[camelia_tps['itemLabel'].isin(tomato_tps['itemLabel'])].
  6. Plot correlation matrix between enzyme names and crop tps enzymes (The attached images are generated from incomplete dictionaries)

Caption - common tps in tomato and camelia"

Caption - Correlation plot for camelia and tomato tps