Skip to content

Progress Report: Prathmesh Sharma

Prathmesh Sharma edited this page Sep 15, 2021 · 5 revisions

INSTALLATIONS: pygetpapers, ami, maven, jdk

WORK DONE TILL NOW:

  • Helped Prachi install metadata_analysis: We followed the steps provided in the documentation but they didn't seem to work out for us, so Sagar advised us to install anaconda and pycharm into Prachi's device. However, since Prachi had very less storage space on her device, it didn't make sense to install anaconda as it is a 10 GB installation. We needed something lighter as a package manager for her 256 GB SSD. We decided to go ahead with VSCode as a package-manager-cum-text-editor.

  • The steps we took for installation were as follows:

  • 1.1) Download the repository for metadata_analysis from github from https://github.com/petermr/crops.git

  • 1.2) Navigate to ...\crops\metadata_analysis and open metadata_analysis.py using VSCode

  • 1.3) We tried to run it directly and as expected, it threw an error. So, we created a new python environment using the command: python -m venv myenv1

  • 1.4) We activated this virtual environment using the command: .\env\Scripts\activate

  • 1.5) This took us to the python virtual environment. We decided to install the dependencies directly from requirements.txt. Upon doing so and running the program, the program threw an error.

  • 1.6) We thought that this might be due to scispacy, so we commented the line out to check whether the rest of the program was running or not (python runs a sequential manner, and scispacy was listed on the top, so we didn't know if bs4 was actually being installed). As it turned out, none of the dependencies were being installed, because the program threw an error that said that bs4 wasn't being installed.

  • 1.7) As a result, within this environment, we manually installed bs4, yake, and scispacy (using pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_ner_bionlp13cg_md-0.4.0.tar.gz) 1.8) Then we ran metadata_analysis again, and it began to run without throwing any errors.

  1. Troubleshooted ami3 installation with Sagar (ongoing)

PROJECT IDEA: Problem Statement: The research papers that are returned by pygetpapers by querying the Europe PMC database may not be relevant to what the user is trying to find. Hence, it is necessary to assign a relevance score so that the user does not have to go through all the research papers queried. Steps:

  1. Specify a query in pygetpapers and download 5 research papers.
  2. Mine relevant phrases/keywords from the abstract/introduction by taking a small dataset for those 10 papers downloaded.
  3. Create a dictionary of keywords from these 5 pages, assign priorities to the each word based on the frequency and location of occurance.
  4. Now query a large number of research papers, and apply the dictionary model made in the previous step to these research papers and assign a relevance score to each of the pdf research papers.
  5. Reject the irrelevant research paper pdfs which have a threshold score below the specified value.

Problem with this idea: Ayush and Shweata already worked on such an idea (https://github.com/ShweataNHegde/snowball/blob/main/snowball_opendiagram_3.ipynb), and Shweata told me that the problem with this is that the code tends to pick up more generic words rather than the specific words that were more geared to giving a relevance score to the research paper. This does make sense, and one way to get around this is to query in 100 research papers from Europe PMC and since it gives results according to matches, we can select a larger set of 'word pointers' from the first 5 research papers and a smaller set of 'word pointers' from the last research papers, and then subtract the two sets to give a set of accurate words to provide a relevance score. But pygetpapers already allows the user to specify multiple keywords, sections etc and Europe PMC also filters results accordingly, so it doesn't make sense to write an entire program just for some marginal improvements..