Skip to content

Progress Summary: Anuv

Anubhab Chakraborty edited this page Sep 29, 2021 · 4 revisions

Table of Contents

Software Installation

System Information

Hardware

  • Asus X406UA Laptop
  • Processor: Intel(R) Core(TM) i3-7100U CPU @ 2.40GHz
  • RAM: 8GB

Software

  • Operating System: Ubuntu 20.04 (Windows 10 WSL2)
  • Instruction set architecture: x86_64
  • python3 --version: 3.8.10
  • pip --version: 20.0.2

pygetpapers

pygetpapers is a fetch tool written in Python, developed by Ayush Garg. It is used to fetch freely available scientific papers from select repositories.

To install pygetpapers run pip install pygetpapers

Check if pygetpapers is properly installed: pygetpapers --help

Adding pygetpapers to path

In ubuntu the binaries are installed in ~/.local/bin by default. We can add this directory to our system path, and run pygetpapers from our console. To add the binary to the system path, execute:

export PATH="$HOME/.local/bin:$PATH"

ami3

ami is a sectioning tool written in Java created by Dr. Peter Murray-Rust. It is used to section a scientific paper into different sections according to their relative position in the document and their usage.

Dependencies

  • JAVA sudo apt install default-jre

To check if the software is successfully installed, run java --version

  • Maven sudo apt install maven

After Java and Maven is installed, we git clone the repository, and build it.

git clone https://github.com/petermr/ami3.git
cd ami3
mvn install -Dmaven.test.skip=true

To add ami to system path execute the following command:

export PATH="$HOME/ami3/target/appassembler/bin:$PATH"

20210916

scilitanalysis

git clone https://github.com/ShweataNHegde/scilitanalysis.git

Create a virtual environment by following the instructions here: Working with a virtual environment
Move into the cloned directory with cd scilitanalysis/scilitanalysis
Create a requirements file with the following data:

yake
scispacy
spacy==3.0.1
pygetpapers
bs4
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_ner_bionlp13cg_md-0.4.0.tar.gz

Install the requirements with pip install -r requirements


20210915

Working with a virtual environment

Sometimes we may be using software that requires a specific version of a package, or we may need to run multiple programs requiring conflicting package versions. For such cases, and for software development in general, it is useful to do the development in a virtual environment. When we activate a python virtual environment, the packages available in that environment is independent of the packages installed in the system, as a result it is often necessary to install commonly used packages in the virtual environment after creating it. You can create as many virtual environments you want, you might typically want to create a seperate virtual environment for every project.

Creating a virtual environment

python3 -m venv /path/to/virtual/environment

The path would also include the name of the virtual environment. For example, if I want to create a virtual environment named 'scilit_venv' in the /home/anuv/scilitanalysis/ I would run the command:

python3 -m venv /home/anuv/scilitanalysis/scilit_venv

Activating virtual environment

source path/to/venv/bin/activate

You need to run this command every time you want to enter the virtual environment. Do note that if you are not using bash you can use the alternative activate files for specific shells, for example, if you are using fish shell then use source venv/bin/activate.fish.

Continuing the above example, we can activate the scilit_venv by running:

source /home/anuv/scilitanalysis/scilit_venv/bin/activate

Deactivating virtual environment

To leave the virtual environment simply run:

deactivate

20210920

Running scilitanalysis

  • Activate virtual environment for the current project, and navigate to scilitanalysis/scilitanalysis/.
  • Open analysis.py with your file editor, and navigate to the end of the file. Set the query parameters as per your requirement.

In scoping_analysis

  • Uncomment(remove preceding #) the line self.querying_pygetpapers_sectioning(QUERY,HITS,CPROJECT)
  • Change the following lines as per your requirement:
self.look_for_a_word(SECTION, metadata_dictionary=metadata_dictionary,  search_for="C.")
self.look_for_a_word(SECTION, metadata_dictionary=metadata_dictionary, search_for='Citrus')
  • Run the script with python3 analysis.py

The output of the script would show up as the .csv format file in the current working directory.

Project Idea

A common representation of reaction pathways in scientific literature is in the form of diagrams in an image. Reaction information encoded in unstructured image could be potentially useful in a machine-readable structured format. Chemical Markup Language (CML) is an application of XML which provides a tagset for encoding chemical information which might be useful for representing reaction pathways found in the literature. Machines cannot simply read and understand an image the way humans do. But with carefully processing the information rich images we may be able toidentify important and chemical relevant information and parse it as CML.

Why it would be useful:

There is a vast repository of organic reaction pathway information locked away in images in scientific literature. The information can be easily deciphered by a chemist, but such a process cannot scale in time and cost when analyzing large amounts of scientific literature. Having such information in CML would make analysis and use of chemistry and biochemistry literature scalable.

Goal:

To identify the components of an image containing reaction pathways and encode the information in CML.

Proposed usage:

Image:

Extracted information:

We should be able to distinguish different parts of the image showing different pathways. The product at each step is to be identified, the arrows showing the direction of the reaction along with the enzyme carrying out the reaction is to be identified.

All of this information is to be encoded in a structured data format such as CML.


20210929

Setting up git with Github

git is a version control program that runs locally on your PC. Github is an online repository that also implements git style version control. It is often useful to integrate git with Github so that you can push changes to the online repository for everyone to pull directly from the command line.

Q: What is push and pull? push - to copy the local changes that you've made in your local clone of the repository to the global repository.
pull - to get download the latest changes in the global repository to your local clone

Pulling changes

To pull latest changes to the repository you simply need to

  1. Navigate to the repository with cd <repository>
  2. Update your local repository with
git pull

But to make changes to the global repository, you need to

  1. Have the permission to make changes to the repository by the author of the repo
  2. Supply the correct credentials to git, to allow it to make changes to the global repo.

Assuming you are using Github to host the global repo, you need to create a Personal Access Token.

Creating personal access tokens in Github

To create a personal access token:

  1. Login to your Github account
  2. Go to Settings > Developer settings > Personal access tokens and generate a new token.
  3. Add a note, set an expiry for the token and check the repo tickbox then generate the token.
  4. Copy the generated token, and keep it somewhere secure.

Pushing changes to Github

Now to push changes to a repository on Github:

  1. Clone the directory to your local machine. (Skip this step if you've already cloned the directory)
git clone <repository-url>
  1. Navigate to the cloned directory cd <repository>
  2. Make required changes to the repository
  3. If you've added any files or directory you need to notify git with
git add <directory> or <file>
  1. Commit the changes to your local copy of the repository, specify what changes you've made using the -m tag, like this
git commit -m "I've changed ... in the ... files"
  1. Push the changes you've committed to Github with
git push
  1. If asked, supply your Github username in the username field and the personal access token you had copied in the password field