-
Notifications
You must be signed in to change notification settings - Fork 17
miniproject: viral epidemics and disease
Lakshmi Devi Priya edited this page Jun 30, 2020
·
34 revisions
Priya
- Use the communal corpus
epidemic50noCov
articles. - Scrutinizing the 50 articles to know the true positives and false positives, that is, whether the articles are about viral epidemic or not.
- Using
ami search
to find whether the articles mentioned any comorbidity in a viral epidemic or not. - Sectioning the articles using
ami:section
to extract the relevant information on comorbidity. Annotating with dictionaries to create ami DataTables. - Refining and rerunning the query to get a corpus of 950 articles.
- Using relevant ML technique for the classification of data whether the articles are based on viral epidemic and the diseases/disorders that co-occur.
- A spreadsheet as well as a graph will be developed based on the comorbidity during a viral epidemic and their count.
- Development of the ML model for data classification on accuracy.
- Initially the communal corpus
epidemic50noCov
will be used. - Later a corpus of 950 articles will be created.
-
AMI
for creating and using dictionaries, sectioning. -
SPARQL
for creating dictionaries. -
KNIME
for workflow and analytics. -
Python
and relevant libraries (keras
) for ML and data visualization.
- The 50 articles in communal corpus
epidemic50noCov
were binary classified manually and a spreadsheet was developed. - The corpus was sectioned using
ami section
using reference from https://github.com/petermr/openVirus/wiki/ami:section. -
getpapers
was used to create a corpus of950
articles regarding human viral epidemics(expect COVID-19) by the syntaxgetpapers -q "viral epidemics AND human NOT COVID NOT corona virus NOT SARS-Cov-2" -o disease_mp -f ve/log.txt -k 955 -x -p
. XML -954
files & PDF -913
files were created.