-
Notifications
You must be signed in to change notification settings - Fork 3
Progress Summary : Utkarsha Pande
Maintained by : Utkarsha Pande
- System specifications :
-
pygetpapers
-
ami3
- Running Metadata analysis script developed by Shweata Hegde
- Validation Software
- Testing Dictionaries
- Project Ideas
Table of contents generated with markdown-toc
- MacBook Air (13-inch, 2017)
- Processor: 1.8 GHz Dual-Core Intel Core i5
- Software: macOS Big Sur version 11.0.1
Date : 08/09/2021
It is a powerful tool packaged as a python package developed from getpapers
that helps in downloading full-text scientific texts, using open access repositories such as API. The tool can be accessed through a command-line interface (terminal).
To know more details on pygetpapers
: https://github.com/petermr/pygetpapers
- Make sure you have
pip
and python installed. - After downloading python, go to terminal and and run the command
pip install pygetpapers
. - Check if your installation was successful by typing
pygetpapers --help
. -
pygetpapers --help
should open thepygetpapers
help prompt on terminal.
Enter your query on terminal : pygetpapers -q "terpene synthase volatile plant_name " -o plant_nameTPS -x -p -s
ami3
is a toolkit to manage (scholarly) documents by dividing the document(research papers) into various sections like abstract, materials ad methods, results, ethical statements and acknowledgements which can be further used for text search and classifications. ami3
is written in Java, and is designed to be a declarative system, with commands and data modules.
To know more about ami3
and its working: https://github.com/petermr/ami3
Date - 09/09/2021
- Install Java
- Install the latest version of Java on your system from https://www.java.com/en/download/
- Test if the installation was successful by typing
java -version
on command line.
- Install JDK
- Install latest JDk on your system from https://www.oracle.com/java/technologies/javase-jdk16-downloads.html
- Check JDK version by typing
ls /Library/Java/JavaVirtualMachines
on terminal.
- Install git
- Install git for your system from https://git-scm.com/downloads
- Check if git is successfully installed by typing
git -version
- Install Maven
- Install maven for your system from https://maven.apache.org/download.cgi. I installed
apache-maven-3.8.2-bin.zip
from this link. - Unzip the
apache-maven-3.8.2-bin.zip
file in the downloads folder usingunzip apache-maven-3.8.2-bin.zip
. - Move the
apache-maven-3.8.2
folder to/Applications
by :
pwd
cd downloads/
mv apache-maven-3.8.2/Applications/
cd /Applications
ls
- Close the terminal and open it again and set the path for mvn installation.
- For setting path :
-> Open bash profile by open -e .bash_profile
(In my case I did it in home directory) this will open a text edit file.
-> If the bash profile is not present in the directory, create it using touch .bash_profile
and then open the bash profile using the above command.
-> Press enter to go to new line and edit the text file by pasting this in the file :
export M2_HOME=/Applications/apache-maven-3.8.2
export PATH=$PATH:$M2_HOME/bin
-> Save the file and close it.
-> Type this source .bash_profile
on command line.
-> And install maven by mvn -install
You can also follow this youtube link for maven installation : https://youtu.be/j0OnSAP-KtU
- Check the maven installation by typing
mvn -version
on the command line, it should showapache-maven-3.8.2
.
Date - 10/09/2021
-
ami3
- Close and open the terminal and git clone the
ami3
repository on the terminal
git clone https://github.com/petermr/ami3.git
-
Drag and drop the
ami3
folder from the home directory to the/Applications
directory. -
Set the path for ami3, ppen bash profile by
open -e .bash_profile
and add the following lines :
export A2_HOME=/Applications/ami3/target/appassembler
export PATH=$PATH:$A2_HOME/bin
-
Save and close the bash profile.
-
On the terminal change directory :
cd \Applications
cd ami3
- Paste the following command, it will take some time for installation.
mvn install -Dmaven.test.skip=true
-
Check
ami3
installation by typingami --help
. It should open the help message.
- Go through each paper to manually scop for TPS related to a crop.
- Collect gene (names) terms such as CsTPS, MonoTPS and so on. Put those terms into excel file as a list and save excel file as gene.txt file and save it in the same directory as the
ami3
. - Enter your query on terminal :
amidict -v --dictionary eo_Gene --directory gene --input gene.txt create --informat list --outformats xml
. - Generates a XML file with eo_gene dictionary.
Date- 13/09/2021
- Ran metadata analysis on Tomato corpus and generated CSV file for 'result' and 'abstract' section.
To know further: Metadata analysis Wikipage
- Install Visual Studio on the system and added the following extensions python, prettiercode, excel-to-csv.
pip install pandas
pip install yake
pip install scispacy
pip install bs4
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_ner_bionlp13cg_md-0.4.0.tar.gz
- Copy paste the metadata script on a new python file in Visual Studio Metadata_analysis.py
- Uncomment this line
#querying_pygetpapers_sectioning("('terpene synthase') AND ('volatile') AND ('Citrus') AND (((SRC:MED OR SRC:PMC OR SRC:AGR OR SRC:CBA) NOT (PUB_TYPE:'Review')))",'200',CPROJECT)
and edit the code with your crop of interest. - Run the code in the same environment in which all the requirements stated above have been pip-installed.
- Generates a tps_corpus and a CSV file for the crop containing all the tps gene names and ids from a particular section.
Date- 17-09/2021 - 25/09/2021
-
Vitis dictionary by Ragheshwari - eo_Gene.xml
-
Helped Ragheshwari with metadata analysis script for Vitis vinifera - Vitis metadata script CSV file
Problems faced :
- ImportError cannot import name Deque - so tried solving it using
pip install websockets
butDeque
is not available for python versions 3.6.0 and below. - Updated python version to 3.6.7 in Ragheshwari's system.
- Re-ran metadata analysis script for Vitis and generated CSV file with TPS names in 'result' section - Vitis metadata script CSV file
Date - 12/10/2021
Pyami code for developing dictionary SOFTWARE test_pyamidict_tdd.py
- Clone pyami repo using
git clone https://github.com/petermr/pyami.git
- Set path
open -e .bash_profile
. It opens the text editor and then paste the below path. Save and close the text editor.
export P2_HOME=/Users/sagar/pyami
export PATH=$PATH:$P2_HOME/py4ami
- Re-open terminal go to test folder in pyami by
cd utkarsha/pyami/test
. - Install pytest by
pip install pytest
. - Run
pytest test_pyamidict_tdd.py
in the test folder. - If it shows
ImportError: cannot import name 'AMIDict' from 'py4ami.dict_lib'
, then open thetest_pyamidict_tdd.py
code in any code editor and add..
at lines 14 and 15 before 'py4ami'. So the lines 14 and 15 should be :
from ..py4ami.xml_lib import XmlLib
from ..py4ami.dict_lib import AMIDict, AMIDictError, Synonym, Entry
- Run
pytest test_pyamidict_tdd.py
in the test folder. Ran 58 tests passed.
Date - 12/10/2021
Pyami code for testing dictionaries pyami/test/test_cevopen_tps_dictionaries.py
.
- Go to test folder in pyami by
cd utkarsha/pyami/test
. - Install pytest by
pip install pytest
if not installed. - open the
pyami/test/test_cevopen_tps_dictionaries.py
code in any code editor and add..
at line 1 before 'py4ami'. So the line 1 should be :
from ..py4ami.dict_lib import AMIDict
- Run
pytest test_cevopen_tps_dictionaries.py
in the test folder. Ran 4 test_dictionaries, 3 passed 1 failed.
pyami.py4ami.dict_lib.AMIDictError: Failed to read URL: https://raw.githubusercontent.com/petermr/crops/main/Vitis%20vinifera/eo_Gene.xml; reason = Not Found
../py4ami/dict_lib.py:496: AMIDictError
FAILED test_cevopen_tps_dictionaries.py::test_vitis_is_valid - pyami.py4ami.dict_lib.AMIDictError: Failed to read URL: https://raw.githubusercontent.com/petermr/crops/main/Vitis%20vinifera/eo_Gene.xml;...
Title : Analysis of Semantic Terpene Synthase Dictionaries
-
Demonstrating the working of all the intern dictionaries (including the previous interns)
-
Integration of all the dictionaries to enzyme name dictionary to create a knowledgebase for terpene synthases.
- Adding EC numbers to the enzyme_name dictionary and crops dictionaries (Karya interns, Sagar, me)
- Adding wikidata id’s to all the dictionaries (Karya interns, Sagar, me)
- Calculating co-occurrences of enzyme names in each of the crop dictionaries (TBD)
- Comparative analysis of tps_enzymes between each of the crops (TBD)
- Entity-relationship modeling using the full datatables to relate tps to other entities like volatile compound emission (TBD)
Slides - https://docs.google.com/presentation/d/1D4EXVkYUjqQtr_56dt_ORE215fX_U5koQUpskCd45qQ/edit?usp=sharing
Work done:
- Created dictionaries using wikidata ids for camelia and tomato tps
- Generated XML and csv dictionaries. camelia_wikidata.csv , tomato_wikidata.csv , camelia_sparql, tomato_sparql
- Converted CSV dictionaries to pandas Dataframes.
- Removed duplicates to obtain a list of the tps enzymes for camelia and tomato.
- Identifying common tps enzymes in both the crops
camelia_tps[camelia_tps['itemLabel'].isin(tomato_tps['itemLabel'])]
. - Plot correlation matrix between enzyme names and crop tps enzymes (The attached images are generated from incomplete dictionaries)
Caption - common tps in tomato and camelia"
Caption - Correlation plot for camelia and tomato tps