Skip to content

miniproject: viral epidemics and organization

ShweataNHegde edited this page May 10, 2021 · 1 revision

What organizations fund research on viral epidemics?


Vaishali Arora, Shweata N. Hegde


Simranleen Singh

Mini project summary:

  • The scientific objective is to find out which are the most active organization to Viral Epidemic research.
  • To retrieve valuable information about them from the Scientific Literature.

Methodology: 📌

  • Using the communal corpus Epidemic50noCov on 50 articles. 🟩DONE

  • Subjecting them to binary Classification based on various parameters- related to viral epidemic or not, funders named or not and so on. 🟩DONE

  • Rerunning the query to get a corpus of 950 articles. 🟩DONE

  • Work on sectioning to filter the module of Acknowledgement or Funding in the paper, as it is the sole part of a scientific paper where Funders are more likely to occur. 🟩DONE

  • Creating dictionary Funder using ami and SPARQL/Wikidata Query Service. 🟩DONE

  • Using Machine Learning tools for entity extraction so that we can look for particular and very specific phrases, words and regex in those scientific papers. 🟪NOT STARTED

  • Subjecting the spreadsheets to analysis in order to find which funders are the most active. 🟪NOT STARTED

Corpora: 📂

❓ How I committing my corpus 950 :

Scroll down and see the section committing the corpus 950 to github.


Dictionary update: 🆕

  • Updated on: September 18, 2020

  • Source: Crossref

  • Number of entries: ~17k

  • Method: SPARQL/Wikidata Query Service

  • Attributes in there: term, name, description, WikdataID, wikidataURL, wikipedia URL, crossrefID, country, synonyms

  • SPARQL query used:

 SELECT DISTINCT ?Funder ?FunderLabel ?FunderDescription ?FunderAltLabel ?Country ?CountryLabel ?instanceofLabel ?crossrefid ?wikipedia WHERE {
   ?Funder wdt:P3153 ?crossrefid;
     wdt:P31 ?instanceof;
     wdt:P17 ?Country.
   OPTIONAL { ?wikipedia schema:about ?Funder; schema:isPartOf <> }
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
 LIMIT 20000000

Syntax used:

amidict -vv --dictionary funders --directory mydictionaries --input funder.sparql.xml create --informat wikisparqlxml --sparqlmap wikidataURL=Funder,term=FunderLabel,name=FunderLabel,country=CountryLabel,crossrefid=crossrefid,description=FunderDescription,wikipediaURL=wikipedia,wikidataURL=Funder --transformName wikidataID=EXTRACT(wikidataURL,.*/(.*)) --synonyms=wikidataAltLabel

Final dictionary:

Refined sparql query

Click,here and it will take you to wikidata sparql query service.

Updated syntax to create the dictionary

amidict -vv --dictionary organization --directory _sparlendpoint  --input sparql_organization.xml create --informat wikisparqlxml --sparqlmap wikidataURL=Organization,term=OrganizationLabel,name=OrganizationLabel,country=CountryLabel,crossrefIDs=crossrefIds,description=description,wikidataURL=Organization --transformName wikidataID=EXTRACT(wikidataURL,.*/(.*))

Dictionary validation : ✅

amidict --dictionary C:\Users\myPC\mydictionaries\funders(1).xml -v display --fields --validate

Generic values (DictionaryDisplayTool)
--testString        : d      null
--wikilinks         : d [$WikiLink;@1ae7dc0
--fields            : m        []
--files             : d        []
--maxEntries        : d         3
--remote            : d []
--validate          : m      true
--help              : d     false
--version           : d     false
--dictionary        : d [C:\Users\myPC\mydictionaries\funders(1).xml]
--directory         : d      null

Specific values (DictionaryDisplayTool)
list all fields
dictionaries from C:\Users\myPC\ContentMine\dictionaries

❓ Result : I checked the folder dictionaries as suggested in the above path. This folder was empty, Should I do something else or the software is built that way ?

Tools & Softwares: 🛠

1. ami for the creation of dictionary, and sectioning : 🟩DONE

  • To download my corpus of 950 articles in XML format in the directory mini project:

  • Open the Command Prompt and give the syntax:

        `getpapers -q "Funders in viral epidemic research" -o miniproject -f mycorpus/log.txt -k 950 `
  • To divide the CProject into sections, again open the Command Prompt and give the syntax in the Command prompt:

           `ami -p miniproject section`
  • This will create a subfolder of sections in each folder of the scientific paper that is there in your directory.

  • Open the folder sections, you will get subfolders as - Front, Body, Back, etc.

  • This completes the sectioning of my Cproject.

2. ami searching, ( and _cooccurrence created for the dictionary funder ( 🟩DONE

3. Jupyter Notebook for machine learning and data mining. 🟨STARTED

4. Later, R for analysis and to display the results graphically. 🟪NOT STARTED

Releasing the corpus 950 using Github desktop : 🟩DONE

  • Installed Github desktop from :
  • Cloned the repository openVirus into my system using Gitbash command line : git clone
  • Open the folder where you want to upload your CProject.
  • Paste your project to the folder in openVirus repository(our remote repository) where you want to commit the files.
  • Open the Github desktop.
  • Go to 'File', then 'Add Local Repository'.
  • Now, choose the openVirus repository from your system.
  • Add a commit message and go to 'Commit to master'.
  • After committing, go to 'Push to origin'.
  • After completion of pushing the repository, your uploaded files can be viewed on the Github repository.

💡 Tip: Committing the corpus in parts of five will make the uploading easy.

Updating ami : 🟩DONE

  • Open command prompt and type :

     `cd ami3`
     `git pull`
     `mvn clean install -Dmaven.test.skip=true `
  • Wait for some time till the command runs.

  • A BUILD SUCCESS message comes out in the command prompt.

💡 Tip: If you are getting BUILD FAILURE, then close the other command prompt if it is open on your system.

Blockers: 🚫

Software usage: 🔗

  1. Core softwares:
  • Node
  • getpapers
  • Java jdk
  • Maven
  • ami
  1. Optional softwares:
  • R graphics
  • Jupyter Notebook
  • Github desktop


  • Binary classification of Corpus 950 into True and False positives using different libraries in Python.


  • Working on usage ofJupyter Notebookby looking into tutorials on the internet
  • Maintaining the dictionary FUNDERS so that merging of the dictionaries could become easier



  • Creating the corpus 950
  • Ami search on the corpus 950
  • Sectioning the corpus 950
  • Creating the ami dictionary funder
  • Creating the SPARQL dictionary funder
  • Manual binary classification of corpus 50 "EpidemicnoCov50"
  • Corpus 950 released
  • Dictionary funder released
  • Dictionary validation using ami
  • Classifying first 50 papers from corpus 950 into True and False positives
  • Smoke test on Jupyter Notebook
  • Jupyter Notebook to create dictionary from a text file of funders


Submitted by-

Simranleen Singh


  • Under this project we are collecting useful data from authentic and global websites which are easily accessible and tabulating data so that it is clear to all that visits it.
  • My miniproject is on Viral Epidemic and funders. So, It will be dealing with all the Funders from all over the world that provide funds to viral epidemic.

Preliminary work:

  • My work would be first downloading useful software which will provide me easy access to the data which i am looking for and i will be able to download it and seggregate it whether it is useful to me or not.
  • initially I have to installed node for the framework of installing other softwares.
  • One of them is getpapers using the link and information provided by my mentor(given below in the reference) Reference :

Installation of getpapers:

Getpapers is a necessary software for this project as we have to download papers(of our need and subject) several paper in one go and here Getpapers helps us downloading that.


Installing getpapers.

Current work

Currently I am maintaining the dictionary of funders manually till my issue get solved.

Clone this wiki locally