-
Notifications
You must be signed in to change notification settings - Fork 19
Activities Summary: Anuv
- Operating System: Ubuntu 20.04
-
python3 --version
: 3.9 -
pip --version
: 20.0.2
To install pygetpapers run pip install pygetpapers
Check if pygetpapers is properly installed:
pygetpapers --help
In ubuntu the binaries are installed in ~/.local/bin by default. We can add this directory to our system path, and run pygetpapers from our console. To add the binary to the system path, execute:
export PATH="$HOME/.local/bin:$PATH"
- JAVA
sudo apt install default-jre
To check if the software is successfully installed, run
java --version
- Maven
sudo apt install maven
After Java and Maven is installed, we git clone the repository, and build it.
git clone https://github.com/petermr/ami3.git
cd ami3
mvn install -Dmaven.test.skip=true
To add ami to system path execute the following command: `export PATH="$HOME/ami3/target/appassembler/bin:$PATH"
A common representation of chemical reactions in scientific literature is in a paragraph format. Reaction information encoded in unstructured paragraph could be potentially useful in a machine-readable structured format. Chemical Markup Language (CML) is an application of XML which provides a tagset for encoding chemical information which might be useful for representing reactions found in the literature. Machines cannot simply read and understand a paragraph of plaintext the way humans do. But with NLP we might be able to identify important and chemical relevant information in paragraphs and parse the information as CML.
There is a vast repository of chemical information locked away in paragraphs of reaction description in scientific literature. The information can be easily deciphered by a chemist, but such a process cannot scale in time and cost when analysing large amounts of scientific literature. Having such information in CML would make analysis and use of chemistry and biochemistry literature scalable.
To identify the components of a (chemical reaction) information rich paragraph and correctly encode the information in CML.
- We can get a sense of the structure of a reaction by looking for certain words or word groups.
- Look for words such as ‘reacts with’, ‘undergoes reaction’, ‘undergoes elimination’ ‘combusts’, etc. These words or phrases might indicate the presence of a chemical reaction and also tell us about the products and the type of reaction.
- 0.5M; number followed by M indicated concentration
- ‘Catalysed by’, ‘in presence of’ indicate catalysts and reaction conditions
- ‘At K’ and ‘atm’, ‘temperature’, ‘pressure’, ‘NTP’, etc. indicate reaction conditions.
- ‘Gives’, ‘to form’ is usually followed by the reaction product.
- We can match words against a dictionary of chemical names to check if it is a valid compound or element or not.
Phenol reacts with NaOH and CO2 at 400K and 2-7atm to give Sodium Salicylate.
<reaction>
<reactant>
<formula>C6 H6 O</formula>
<name>Phenol</name>
</reactant>
<reactant>
<formula>Na O H</formula>
<name>Sodium Hydroxide</name>
</reactant>
<reactant>
<formula>C O2</formula>
<name>Carbon Dioxide</name>
</reactant>
<product>
<formula>C7 H5 Na O3</formula>
<name>Sodium Salicylate</name>
</product>
<reaction-conditions>
<temperature>400K</temperature>
<pressure>4-7atm</pressure>
</reaction-conditions>
</reaction>
- Identify passages containing description of a chemical reaction
- Convert molecules descriptions into CML
- Identify images depicting chemical molecules and reactions
- Convert chemical molecules or reactions presented as images into CML
- Encoding metabolic pathways as XML