Skip to content

miniproject: viral epidemics and disease

dheerajdhingani edited this page Jul 12, 2020 · 34 revisions

What diseases co-occur with viral epidemics?

owner:

Priya

collaborators:

Dheeraj kumar

miniproject summary

proposed activities

  • Use the communal corpus epidemic50noCov consisting of 50 articles.
  • Scrutinizing the 50 articles to know the true positives and false positives, that is, whether the articles are about viral epidemic or not.
  • Using ami search to find whether the articles mentioned any comorbidity in a viral epidemic or not.
  • Sectioning the articles using ami:section to extract the relevant information on comorbidity. Annotating with dictionaries to create ami DataTables.
  • Refining and rerunning the query to get a corpus of 950 articles.
  • Using relevant ML technique for the classification of data whether the articles are based on viral epidemic and the diseases/disorders that co-occur.

outcomes

  • A spreadsheet as well as a graph will be developed based on the comorbidity during a viral epidemic and their count.
  • Development of the ML model for data classification on accuracy.

corpora

  • Initially the communal corpus epidemic50noCov will be used.
  • Later a corpus of 950 articles will be created.

dictionaries

software

  • getpapers to create the corpus of 950 articles from EuPMC.
  • AMI for creating and using dictionaries, sectioning.
  • SPARQL for creating dictionaries.
  • KNIME for workflow and analytics.

constraints


Initial Summary

(by collaborator Dheeraj)

The aim of the mini-project

What is our aim first of all, that if we recognize diseases, then we will be able to give medicines for it. In this mini project, we will be able to find diseases with the help of dictionary in times of "viral epidemic" by using ContentMine software ( getpapers and ami)

Resources

Dictionary

  • The names of all diseases are written in the dictionary of diseases, just like the dictionary contains a store of words.
  • It's source is ICD-10(by WHO) and Wikidata Query Service and it was created using ami.

Corpus 950

  • This is a group of articles which based on epidemics and diseases. These articles contain information regarding diseases which are to be simplified.
  • This is a group of 950 articles that have been downloaded from EuPMC via getpapers.

Eupmc

This is an article website with a lot of knowledge. There are millions of articles on it. We are analyzing some of these and with the help of these articles, we are also creating a dictionary.

Tools

getpapers

  • It is a computer software capable of downloading a large number of articles from Eupmc and other sites.
  • The following step is to install it in the computer.
  1. Install nvm
  2. Install node
  3. Install getpapers For more information see the following link. (https://github.com/petermr/openVirus/wiki/getpapers)

ami

  • It is also a computer software, but it is useful for editing downloaded articles and gathering information from them and it also makes a dictionary. Like in this mini project, we have created a dictionary of disease by this.

Work done

  • I have read about getpapers and EuPMC and also I have read about advanced search in EuPMC and Reading its articles too.
  • I am reading wikidata and learning how to update the dictionary.

My goal

  • As we said that if diseases are known, then we can give medicines accordingly. Therefore, our main goal will be to find out the names and diseases of them.
  • In this mini-project my main goal is that updating dictionary with ICD-10 using Wikidata.

Progress done

  1. The 50 articles in communal corpus epidemic50noCov were binary classified as true and false positives manually and a spreadsheet was developed.
  2. ami search was used in the corpus of 50 articles and the html DataTables on disease dictionary were created.
  3. The corpus was sectioned using ami section as per the reference from https://github.com/petermr/openVirus/wiki/ami:section.
  4. getpapers was used to create a corpus of 950 articles regarding human viral epidemics(expect COVID-19) by the syntax getpapers -q "viral epidemics AND human NOT COVID NOT corona virus NOT SARS-Cov-2" -o mpc -f mpc/log.txt -k 950 -x -p. JATS - 950 files, a log text document, XML -949 files & PDF -903 files were created.
  5. ami search was used successfully in the 950 article corpus, which was segmented into 4 folders each containing 200-250 articles.
  6. The 950 article corpus was sectioned successfully using ami section.

Things need to be done

(in the 950 article corpus)

  1. To upload the 950 article corpus in GitHub(Issue rectifying).
  2. To binary classify true and false positives manually.
  3. To use KNIME software for binary classification.
  4. To test the data classification on accuracy.

Blocking

  1. Learning KNIME to use in binary classification.
Clone this wiki locally