-
Notifications
You must be signed in to change notification settings - Fork 17
What is openVirus? (OpenPublishingTalk)
an 8-minute presentation for https://openpublishingfest.org/calendar.html#event-178 on 2020-05-27
Sparked by realisation around 2020-02 that there was no simple way for citizens to find scientific information on the COVID-19 epidemic. A group of activists released about 5000 papers from SciHub [1]. Possibly in response a number of closed access publishers released a few thousand articles into the "CORD-19" database. this was restricted to viruses or COVID. (It's since developed into a larger user community). content was largely JSON.
We were working on OpenClimateKnowledge (OCK), for citizens to extract knowledge from the distributed scientific literature. When COVID-19 hit, we decided to use the same technology to tackle viral epidemics.
We felt that the selection of a very narrow section of the scientific literature , selected by commercial publishers, was a minimal response. With simple searches we found that 60-90% of the literature was still closed for topics such as aerosols, masks, ventilators, social distances, legal issues and many others. Citizens are confined to information on:
- topics selected by publishers
- sources of content restricted by current systems
openVirus
was created as a citizen volunteer community to create tools and sources for citizens to ask their own questions of their own sources.
- to welcome Open (free to use, re-use and re-distribute)
- to create a single point of entry for searching the Open Literature
- to provide a toolset that citizens could download, modify and use
- to create a Wikidata-based query, using simple dictionaries that citizens can create and modify
- to create an atmosphere where a community can grow.
- to emphasize globalness such as multilinguality and GlobalSouth publications.
- to use the most appropriate Open solutions. Collaborate not compete.
largely carried out by users on their own machines.
Many resources are server-centric and offer limited chance of systematic download.
- build scrapers or API query tools for Openly readable sources.
- query or scrape user questions
- download raw content (PDF, HTML, images) - 10 - 10,000 articles
- clean and semantify
- annotate with dictionaries
- expose , analyze, display.
- EuropePMC
- biorxiv and medrxiv
- DOAJ
- EThOS
- Redalyc (MX)
any tool can be included as long as it can communicate through files on local storage in our CProject
format.. This is not an exclusive list.
-
framework:
ami
+CProject
data -
scrapers:
getpapers
,Ferret
,curl
,scrapy
-
cleaners: PDFBox, Tidy/Jsoup, etc.
Grobid
-
transformers: xml2html,
ami ocr
,KNIME
-
dictionaries:
ami dictionary
-
indexing and annotation:
Solr
,ami
- Analysis and display: R, KNIME
The central philosophy is a defined *semantic universal data structure, CProject
. The tools can be varied or swapped.
- Remko Popma,
- Lezan Hawizy, Tim Voronov,
- Andy Jackson,
- Clyde Davies,
- Thomas Shafee,
- Priya JK , Kareena Singh,
- Simon Worthington, (check omissions)
- toolkit
- dictionaries
- tutorials
- citizen
openVirus
downloadable or boxed
====
[1] Bender, Maddie (3 February 2020). "'It's a Moral Imperative:' Archivists Made a Directory of 5,000 Coronavirus Studies to Bypass Paywalls". Vice. https://www.vice.com/en_us/article/z3b3v5/archivists-are-bypassing-paywalls-to-share-studies-about-coronaviruses
[CORD-19] (https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge)