Skip to content

Tutorial: plant oils

Emanuel Faria edited this page Nov 29, 2021 · 21 revisions

Tutorial on Plant oils

design of project

  • Create a minicorpus of about 100 entries related to plant essential oils in Brazil.
  • Make semantic sections with ami.
  • Use docanalysis unsupervised to extract main terms and areas of interest.
  • Use supervised dictionaries for plants, country, organization, compounds, etc.
  • initial smoke-test

file structure

  • CProject = oil_brazil

create minicorpus

split sections

run docanalysis

log

2021-11-29

  1. followed instructions to install docanalysis
Last login: Mon Nov 29 14:29:19 on ttys002
mannyrules@Mannys-MacBook-Pro-2021 ~ % git clone https://github.com/petermr/docanalysis.git 
Cloning into docanalysis...
remote: Enumerating objects: 1793, done.
remote: Counting objects: 100% (1793/1793), done.
remote: Compressing objects: 100% (1242/1242), done.
remote: Total 1793 (delta 594), reused 1717 (delta 539), pack-reused 0
Receiving objects: 100% (1793/1793), 1.59 MiB | 4.93 MiB/s, done.
Resolving deltas: 100% (594/594), done.
cd docanalysis
python setup.py install

gave error:

Traceback (most recent call last):
  File "setup.py", line 8, in <module>
    import configparser
ImportError: No module named configparser
mannyrules@Mannys-MacBook-Pro-2021 docanalysis % 
  1. Did not install cleanly

after changing to python3 and reinstalling spacy seems to work so far.

pip install configparser
mannyrules@Mannys-MacBook-Pro-2021 docanalysis % pip install configparser
Collecting configparser
  Downloading configparser-5.1.0-py3-none-any.whl (19 kB)
Installing collected packages: configparser
Successfully installed configparser-5.1.0

and end of setup.py dependencies...

[...]

Using /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages
Searching for sgmllib3k==1.0.0
Best match: sgmllib3k 1.0.0
Adding sgmllib3k 1.0.0 to easy-install.pth file

Using /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages
Finished processing dependencies for docanalysis==0.0.3
mannrules@Mannys-MacBook-Pro-2021 docanalysis %     
  • run demo.py as in README
(base) pm286macbook:docanalysis pm286$ python3 demo.py
Collecting en-core-web-sm==3.0.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0-py3-none-any.whl (13.7 MB)
     |████████████████████████████████| 13.7 MB 2.1 MB/s            
Requirement already satisfied: spacy<3.1.0,>=3.0.0 in /opt/anaconda3/lib/python3.8/site-packages (from en-core-web-sm==3.0.0) (3.0.6)
Requirement already satisfied: catalogue<2.1.0,>=2.0.3 in /opt/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (2.0.4)
Requirement already satisfied: srsly<3.0.0,>=2.4.1 in /opt/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (2.4.1)
Requirement already satisfied: thinc<8.1.0,>=8.0.3 in /opt/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (8.0.7)
Requirement already satisfied: pydantic<1.8.0,>=1.7.1 in /opt/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (1.7.4)
Requirement already satisfied: typer<0.4.0,>=0.3.0 in /opt/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (0.3.2)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /opt/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (2.20.0)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /opt/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (1.0.5)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.4 in /opt/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (3.0.7)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /opt/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (3.0.5)
Requirement already satisfied: wasabi<1.1.0,>=0.8.1 in /opt/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (0.8.2)
Requirement already satisfied: setuptools in /opt/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (57.4.0)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /opt/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (2.0.5)
Requirement already satisfied: numpy>=1.15.0 in /opt/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (1.19.1)
Requirement already satisfied: packaging>=20.0 in /opt/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (20.4)
Requirement already satisfied: jinja2 in /opt/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (2.11.2)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /opt/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (4.49.0)
Requirement already satisfied: blis<0.8.0,>=0.4.0 in /opt/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (0.7.4)
Requirement already satisfied: pathy>=0.3.5 in /opt/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (0.6.0)
Requirement already satisfied: six in /opt/anaconda3/lib/python3.8/site-packages (from packaging>=20.0->spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (1.15.0)
Requirement already satisfied: pyparsing>=2.0.2 in /opt/anaconda3/lib/python3.8/site-packages (from packaging>=20.0->spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (2.4.7)
Requirement already satisfied: smart-open<6.0.0,>=5.0.0 in /opt/anaconda3/lib/python3.8/site-packages (from pathy>=0.3.5->spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (5.1.0)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/anaconda3/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (3.0.4)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in /opt/anaconda3/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (1.24.3)
Requirement already satisfied: idna<2.8,>=2.5 in /opt/anaconda3/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (2.7)
Requirement already satisfied: certifi>=2017.4.17 in /opt/anaconda3/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (2020.6.20)
Requirement already satisfied: click<7.2.0,>=7.1.1 in /opt/anaconda3/lib/python3.8/site-packages (from typer<0.4.0,>=0.3.0->spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (7.1.2)
Requirement already satisfied: MarkupSafe>=0.23 in /opt/anaconda3/lib/python3.8/site-packages (from jinja2->spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (1.1.1)
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-3.0.0
✔ Download and installation successful
You can now load the package via spacy.load('en_core_web_sm')
0it [00:00, ?it/s]
INFO:root:Found 0 sentences
Traceback (most recent call last):
  File "demo.py", line 4, in <module>
    dict_for_entities = ethic_statement_creator.extract_entities_from_papers(
  File "/Users/pm286/workspace/docanalysis/docanalysis/extract_entities.py", line 68, in extract_entities_from_papers
    self.convert_dict_to_csv(
  File "/Users/pm286/workspace/docanalysis/docanalysis/extract_entities.py", line 179, in convert_dict_to_csv
    df.to_csv(path, encoding='utf-8', line_terminator='\r\n')
  File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py", line 3384, in to_csv
    return DataFrameRenderer(formatter).to_csv(
  File "/opt/anaconda3/lib/python3.8/site-packages/pandas/io/formats/format.py", line 1083, in to_csv
    csv_formatter.save()
  File "/opt/anaconda3/lib/python3.8/site-packages/pandas/io/formats/csvs.py", line 228, in save
    with get_handle(
  File "/opt/anaconda3/lib/python3.8/site-packages/pandas/io/common.py", line 639, in get_handle
    handle = open(
FileNotFoundError: [Errno 2] No such file or directory: '/Users/pm286/workspace/docanalysis/corpus/e_cancer_clinical_trial_50/entities.csv'
(base) pm286macbook:docanalysis pm286$