-
Notifications
You must be signed in to change notification settings - Fork 4
Meeting Records
Previous meeting records can be found here: https://github.com/petermr/openVirus/wiki/Records-and-Reports
- Schematron - A tool to validate dictionaries.
- Everybody should come up with a set of rules that their dictionary should comply with.
- Briefly discussed the progress and the future directions for the project.
- GY: New Inters are to join us from January.
- Test-Driven Development
- Write tests that the dictionary editors should comply with.
- Discussed the structure of projects in Python. We looked at the project structure of 'Pymatgen', a project in which PMR is involved.
- Welcome Radhu, the new NIPGR intern!
- History and a bit of Context:
- 1.5 years ago, NIPGR-ContentMine internships started.
- Plant literature into a database.
- openVirus started in March with a focus on 'viral epidemics.
- Gita ma'am's group has developed EssOilDB which is now being converted to use for the newer technology. New interns will be working on this.
- We will continue to work on openVirus dictionaries as well.
- We follow 'open notebook philosophy'
- Review the EssOilDB and Emanuel work to make sure it's compatible.
- Review:
- Plant Taxonomy
- A typical paper about a plant has
- Place
- Part
- Chemicals
- Project Management
- Regular Meetings
- Agenda
- Run the Meetings
- Meeting Account
- Brief recap of the 'openVirus work' so far. (By Shweata, Ayush, PMR, and Dheeraj)
- Requirments for
pygetpapers
is documented here https://github.com/petermr/openVirus/wiki/pygetpapers -
pyamidict
We discussed the general workings of the code PMR developed. https://github.com/petermr/dictionary/blob/main/pythoncode/pyamidict/editor/amidict.py
- Coordinate the review for
pygetpapers
-> Ayush - Revise dictionaries
- prototype code to search using Dictionary
- Proper labeling and naming of rules for dictionary validation
- Introduction about Julia Arsuffi, Ph.D. Plant Science, Cambridge & Emanuel Faria, Brazil, and their work
- Discussion about Manny works so far
- Dictionaries
- (Country, Disease, Drug, Organization)
- Plant(essential oil-producing), Extraction Methods, Phytochemicals
- Brief about tigr2ess Program to Radhu (https://github.com/petermr/tigr2ess)
- Explanation of
pygetpapers
by Ayush - Test-Driven Development: Collectively drafted some tests to implement for
pyamidict
- Branching: As multiple people are going to be involved in the development of software, it's important to make sure that we don't change each others' codes without knowing. Branching, therefore, becomes important. Radhu and Ayush, together are going to write a guide for all of us to help understand how branching works.
- Welcome Kanishka, Manisha the new NIPGR intern!
- Getting to know each other's computational backgrounds.
- Introduction is given to each intern and a brief overview of their role in the project.
- We got an overview of the people, the project, the resources, how they are used, and how to build them.
- Discussion about the agenda behind openVirus (wikicite presentation) By PMR, to build a system so that anybody can understand the science behind the current pandemic.
- Gita ma'am's group has developed EssOilDB which is now being converted to use for the newer technology. New interns will be working on this.
- Aim to start "4-projects" - for new interns with different criteria including Parts of plant, Chemicals, Extraction method & Analytic method.
- Introduction about working of "Slack" and interns to new interns.
- Introduction about working of Github.
- PMR introduced 3 Software projects (getpapers, pygetpapers, Ami Dictionary).
- learn how to use the wiki and contribute.
- Discussion on running getpapers to retrieve papers from medrxiv.
- Running maven to build ami.
- be able to run getpapers and ami with existing dictionaries.
- explore dictionaries in https://github.com/petermr/dictionary/tree/main/openVirus202011
- Welcoming our new intern, Prashant
- A brief introduction of new interns
- A short review of the CEV Open project
- Brainstorming and discussing potential project ideas, feasible in 6 months
- Project Ideas generated at the end of the session:
-
Phytochemistry-specific Projects (tentative):
- Medicinal activities of plant essential oils
- Volatile Terpenes and genes
- Essential oils from invasive species
-
Tech/Independent Projects:
pygetpapers
pyamidict
search
- Location (Country)
- Organization
- Disease
- Drug(?)
-
Phytochemistry-specific Projects (tentative):
- Interns' standup
-
pygetpapers
Reviewing Ayush's code
- Shweata: Email all the volunteers of the openVirus team informing the new developments and project ideas.
- Ambreen, Anugrah, Rajan, Vanisha, Vaishali, Aishwarya, Mukul
- Priya, Kareena, Sana
- Scoping Review: Preliminary searching and readings. Mainly to figure out the feasibility.
- Radhu: Medicinal activities of plant essential oils
- Kanishka: Volatile Terpenes and genes
- Prashant: Essential oils from invasive species
- Brief introduction by each intern and get started.
- Briefly discussed the progress of Ambreen's work.
- A short review of the CEV Open project.
- Discussion about last 3-4 week's works so far.
- 4 core dictionaries:
- country (Ambreen) => location => geolocation services
- disease (Dheeraj) - human disease
- drugs (Rajan)
- organizations (Shweata, Vaishali)
-
pygetpapers
Reviewing Ayush's code. - Explanation of pygetpapers by Ayush to Ambreen and Vaishali.
- Review of PMC papers to explain to the interns about sectioning and output of XML files.
- Review of dictionaries(Plants and openvirus).
- Review of Shweta's and Dheeraj's excellent developments on "instance of" and OPTIONAL work.
- Discussion on SPARQL queries and downloading the .xml file.
- Extended discussion on dictionaries, editing the wiki page dict schema, different terms, and elements of wikidata.
- Review on Poster of "WikiFactMine for Phytochemistry".
- Briefly discussed the progress and the future directions for the project.
- Discussed the new and exciting directions to the project.
- Standup by interns (those who are present)
- Extended discussion on dictionaries, editing the wiki page dict schema, different terms, and elements of wikidata
- Some requirements we identified
- merge entries with the same WikidataID
- detect and eliminate scholarly articles, books, etc.
- add language Wikipedia pages from wikidataID
- (SH) post-SPARQL filtering, or query refinement
- translate attributes into wikidata properties where possible (crossrefid => _p3153_crossrefid)
- remove unwanted terms (term value or wikidataID)
- Introduction about a planner for new interns
- Dictionaries
- Plant(Radhu)
- Compound(Kanishka)
- Gene(Prashant)
- Incomplete dictionaries(?)- Activity, extraction method, plant parts [Ask Emanuel] - Missing Wikidata items
- Revise description and extract synonym.
- Some terms have wikidata id of scholarly articles
- Does Wikidata id of terms still exist? (Some items might have either be moved or deleted since the dictionary was created)
- Write python code to go through the ids and check if they exist
- PMR: Write software to convert SPARQL output into the dictionary.
- volatile_compound
- PROBLEM: Chemicals have commas in them. AltLabel gives synonyms separated by a comma.
- PMR: Query all the chemicals automatically and look them up.
- Find out if the compounds are in CheBI
- found in taxon property - P703
- Genes dictionary - Contact Guilia
- PMR's Network has dropped and he has a meeting so no meeting on Thursday.
- Welcome Vasant, Talha the new NIPGR interns!
- Getting to know each other's Backgrounds.
- General instruction given to all by Gitanjali Ma'am.
- We got an overview of the people, the project, the resources, how they are used, and how to build them.
- Discussion about the agenda behind openVirus By PMR, to build a system so that anybody can understand the science behind the current pandemic.
- Introduction about working of "Slack" and interns to new interns.
- Introduction about working of Github.
- Mini Projects: Each project should have a scientific target. It must involve technology development.
- Medicinal oils (Emanuel (Manny) Faria) (Radhu Ladani)
- Genes (Giulia Arsuffi) (Talha Hasan)
- Invasive species ( Gitanjali Yadav) (Kanishka Parashar)
- PMR introduced 3 Software projects (getpapers, pygetpapers, Ami Dictionary).
- Briefly discussed the progress and the future directions for the project.
- Explanation of a dictionary given by PMR, "Ocimum sanctum". Introduction to 'XML markup language and other components such as 'elements', 'attributes', and Q number for wikidata.
- Scientific strategy discussion over "Why we are doing these Projects?" To build an organized system using technological tools
- Review of each intern's dictionary. Each one to create their dictionary's own wiki page.
- Review of each mini-project (update your project pages with the progress made, Create pages for tools which you use)
- Everyone to commit their mini project data on Github ( https://github.com/petermr/CEVOpen/wiki)
- Main components of intern activity:
- Technology - (getpapers, ami, wikidata/SPARQL) - search
- Dictionaries (1 existing dictionary, 1 new dictionary) approx.
- Miniproject chemotype, genotype, activities (medicinal) phenotype - invasive species
- Integration - how these fit together - an atlas
- All new topics of the project will be discussed in slack and create Wiki pages on CEVOpen as Slack is for immediate conversations and GitHub is for structured technical conversions.
- Brief project review by PMR to GY regarding the progress made and upcoming tasks.
- Standup by each intern.
- Introduction to interns about
getpapers
andami
search query by PMR. - Review of Vasant and Talha's miniprojects.
- Review of dictionary- if each contains name, term, wikidata ID, wikidata label, description, Wikipedia URL.
- Update the dictionary-Do all entries have synonyms? Wikipedia pages? If not we have to add them and how to retrieve these using VALUES.
- Briefly discussed the progress and the future directions for the project.
- Interns' standup
- Discussion of organization documents of Github and blockage of interns.
- Explanation of Shuttleworth Foundation by PMR including their fellows, fellowships as well present the ideas of people who are open, innovative, global, etc.
- Open: “A piece of data or content is open if anyone is free to use, reuse, and redistribute it - subject only, at most, to the requirement to attribute and/or share-alike.”
- Discussed the outline of the "FlashForward presentation"
- ISC report and overview (PMR)
- Shweata overview
- Tigr2ess -> open climate -> open virus -> CEVopenPlant (-> crops, -> plant technology)
- Ayush pygetpapers (launch)
- demo of Wikidata/SPARQL - Dheeraj? + interns?
- (miniprojects) -> matplotlib displays
- (background photos - farms, landscapes, etc.) - stress mobile?
- Theme finalization about the event: "Scientific knowledge for Global Challenges"
- Review of
pygetpapers
and explanation of Flowchart by Ayush and PMR. - difference between
pygetpapers
andgetpapers
https://github.com/petermr/dictionary/blob/main/pygetpapersdev/oldgetpapers.js
- Standup by interns (those who are present)
- Planning for the event (Flash Forward Workshop). It is going to be a hands-on session for people to try out our software and give us feedback. We are also going to discuss briefly our current projects, their motives, and so on.
- Presentation Discussion Regarding OpenVirus which is a team of volunteers who build software to query the scientific literature automatically in large amounts.
- We will explore the technology and the issues:
- where can you find science?
- what's its value? How can it be used?
- how much is hidden by the punishers?
- how can you do this yourself?
- Flash Forward Workshop. This will be a workshop where
- we will demo the technology
- discuss what the world would like to be able to do. This includes non-English languages.
- PMR will introduce current project EO, past projects climate, and epidemics? the demo is the specific topic of invasive species? theme invasive species? introductions? [St Edmunds Game - Gita. Present science to non-scientists. Fun! Involved. May 2021]
- Presentation discussion of openVirus Overview of the project:
- Shweata (Textmining software and community - Introduction) - 5 min.
- Ambreen (Her experience, initial results, and machine learning 101 - 5 min.
- What interns can present (CEVOpen - Invasive species theme):
- Kanishka - Intro to Invasive species project - 2 min. recorded video
- Talha - Megapublishers and Manual search of scientific literature - 2 min.
- Radhu - Intro to Wikidata, Invasive species dictionary creation demo - 2 min.
- Vasant - Jupyter Notebook demo - data display (histograms, cooccurrence) - 2 min
- Ayush - pygetpapers demonstration - 5 min.
- Discuss the time slot for the presentation
- interns gave a demo for their particular topic
- PMR and other members suggest points and corrections so the presentation would be more precise.
- get familiars with the BigBlueButton
- Day of Flash Forward Workshop
- review of FlashForward presentations
- review of personal commitments (interns) including their College Schedule, Exams, and other things.
- alpha testing -> Preliminary tests for software.
- Every intern has to do testing of
pygetpapers
and record in detail what they did. Well-documented testing is one of the essential aspects of the software.
- Every intern has to do testing of
-
pyamisearch
works and needs alpha-testing. It will be used onsections
. Currently, we can extract the given sections quite well. There are several short, important subsections and each intern can take one.- acknowledgments
- conflict of interest
- ethics statements
- author contributions
- CEVOpen at StEds. The event in Cambridge ca 2021-05-01 showing our projects and software to Cambridge students, researchers, and faculty. Many are NOT scientists. This will be interactive and may have a game format. We need all interns to be completely fluent in
- installation
- tutorial materials and examples
- management (perhaps within teams)
- There may be a "dry run" about 2021-04-01 with volunteers (e.g. from Wikipedia)
- Everyone must be prepared to alpha-test
pygetpapers
and give feedback pygetpapers: test reports.
- review Ayush's work: Updated pygetpapers with options to make JSON and make CSV on demand. Also added time elapsed to pygetpapers.
- review of Wiki pages Dictionary wikipages for dictionaries. Everyone should have 1-2 dictionaries and summarize progress and problems.
- Radhu:
plant
andactivity
- Kanishka:
Invasive species
- Talha:
compound
andplant material history
- Vasant:
Plant parts
andgene
- Radhu:
- some skills will frequently need. so we have to can learning them through web tutorials and community self-help and set up Wiki pages for each:
- Programming: Regular Expressions
- Programming: Globbing
- Programming: Xpath
- Programming: JSON
- Programming: XML
- Programming: grep
- Each page should help newcomers learn these techniques. we don't have to write a tutorial - it's more useful to point to good (often interactive) tutorials. It's also useful to point out the things which we found difficult.
- Discussed the structure of projects in Python. We looked at the project structure of
pyami.ini
, a project in which PMR is involved. - Kanishka: Project Manager( maintain a wikipages for the testing of
pyamisearch
and asking people to put their tests and update these documents)
- Every intern has to check whether their dictionaries are capable to do
ami search
or not. - verify you can run /physchem/python/util.py
- install
pyami.ini
to your "home" directory. I don't know where this is on Windows. This will need to be customized to define- where your dictionaries are
- where your project/s are
- try to use symbols where possible
- run:
python <your path>/util.py
- keep editing till it shows your dictionaries
- If anyone has any error then create an issue on Github https://github.com/petermr/dictionary/issues and give a link to the issue on Slack.
- After an issue has been resolved please add the problem encountered and the accepted solution to the wiki. Also please don't delete issues even after the issue has been resolved because other fellows might encounter the same problem.
- Standup by interns
- Review of the individual tasks, Interns to come up with their own dictionaries and projects.
- creation with SPARQL
- editing and problems of existing dictionaries
- Potential use of a dictionary of genes.
- identifying articles (IR)
- annotating genes in articles (IE)
- linking to Wikidata
- translating synonyms
- Discussed the recent problem that we encountered with the SPARQL query, as reported by Talha & Radhu( We can't download Large URL of endpoints so we have to download endpoints into the different part and then merge manually)
- Every intern's dictionary wikipages created on the "CEVOpen" repository and should record update and issues related to the dictionary
- Standup by each interns.
- Review of dictionaries
- whether all SPARQL results have been downloaded and have been converted with
amidict
- each entry should have attributes:
- term
- name
- wikidataID
- wikidataURL
- en-wikidataURL
- en-description
- each entry may have children:
- EN-
synonym's
(optionalxml:lang
attribute) - non-EN
synonym's
(xml:lang
mandatory) - non-EN
description
(one per language, withxml:lang
) -
related
- e.g. non-EN Wikipedia pages
- EN-
-
entry's
may also have-
p
attributes for properties -
q
attributes for items
-
- whether all SPARQL results have been downloaded and have been converted with
- please checkout
openDiagram
and run latestsearch_lib.py
:cd physchem/python
python search_lib.py
- This should create graphs of occurrences of chemicals. click on the destroy-window button to move to next edit the file
search_lib
to reference your dictionary and your corpus choose sections that are likely to contain words then run your search and be prepared to demonstrate to us. We want to see all 4(6) dictionaries in action.
- Discussed the current status of each member's Dictionary
- Review of each dictionary if each contains name, term, wikidata ID, wikidata label, description, Wikipedia URL
- Reviewed
invasive species
dictionary updated by Kanishka. Synonyms, IUCN status, and taxon id need to be added. - Reviewed
Activity
dictionary updated by Radhu. Synonyms and language equivalents need to be added. - PMR Debugging people's problems while running
search_lib.py
using share screen - A brief discussion on Regular expressions (RegExp)
- Communal review of pygetpapers including installation and alpha testing review of each intern:
-
--update
option is unclear with what is meant to do - downloading additional types of file such as
--pdf
could download PDF file to an existing repository
-
- it is essential that all the options common to
getpapers
andpygetpapers
are IDENTICAL. -Ingetpapers
the-f <filename>
option creates a LOG file. Inpygetpapers
this has a completely different operation "from pickle". -Log levels and messages: -The currentpygetpapers
is verbose, relatively uninformative, and cannot be altered. Most of it would bedebug D
ortrace T
. - Users of
getpapers
will expect 16 flags to be present inpygetpapers
.- if present in
pygetpapers
these should have the SAME syntax asgetpapers
. The operation should ideally be the same. If enhanced or restricted this should be noted - if NOT present in
pygetpapers
these flags should be reserved for future use. - Flags should NOT be used for different purposes.
- if present in
- Standup by interns (those who are present)
- Review of
pygetpapers
with the new version and new flag addition - Review of
search_lib
in (openDiagram
) -
ami
search_lib
is working with facets of dictionary, corpus and section and runs quickly on small corpus - We tested Talha Hasan and Radhu Ladani and Shweata Hegde dictionaries and they worked excellently!
- We understand Concepts of Data Science in our project
- Explanation of Natural Language ToolKit and Natural Language Processing by PMR
- Talha Hasan please make a Wiki page for EPMC(Explore synonyms on EPMC)
- Radhu Ladani make a wiki page for Natural Language ToolKit(NLTK) https://github.com/petermr/openDiagram/wiki/Natural-Language-Toolkit-(NLTK)
- Standup by each intern
- Minicorpora Review with each intern's dictionary
- We download the 200 paper for each dictionary's topic using
getpapers
and then did a section forami search
- Radhu: Test
Plant
&Activity
dictionary withami search_lib
hereActivity
dictionary worked perfectly but We need to updatePlant
dictionary to work withami search_lib
- Kanishka: Test
Invasive_species
dictionary with Minicorporaoil186
&Invasive Plant species
- Vasant: Test
Plant Parts
- Talha: Test
Plant Compound
- Radhu: Test
- Understand The Database system and the Five Laws of Library Science of S. R. Ranganathan by PMR
- Kanishka & Radhu work's Review Regarding their dictionary
- Review of
pygetpapers
by Ayush - Review of
Search_lib
by PMR
- Update of St Edmunds Game & Project's Software to Gita ma'am by PMR including
search_lib
pygetpapers
and their working - Review of
search_lib
with multilingual (English, Hindi, Urdu) with the different DictionaryActivity
,Plant_Part
,Plant_Compound
,Plant_genus
- Review of
pygetpapers
by Ayush including the latest version, debug log-level, supplementary files, CSV file contain title & full column as well. - Every intern's Stand up.
- Debugging the
Search_lib
for the command line so everybody test on their system- Query:
python search_lib.py --dict --sect --proj
- Example:
python search_lib.py --dict country --sect introduction method --proj oil186
- Query:
-
--help
will give you the following output to understand the query structure
C:\Users\DELL\Radhu\openDiagram\physchem\python>python search_lib.py --help
running search main
usage: search_lib.py [-h] [--dict DICT [DICT ...]] [--sect SECT [SECT ...]] [--proj PROJ [PROJ ...]] [--patt PATT [PATT ...]]
Search sections with dictionaries and patterns
optional arguments:
-h, --help show this help message and exit
--dict DICT [DICT ...]
dictionaries to search with (lookup table from JSON (NYI); empty gives list
--sect SECT [SECT ...]
sections to search; empty gives all (Not yet tested
--Proj PROJ [PROJ ...]
projects to search; empty will exit
--patt PATT [PATT ...]
patterns to search with; regex may need quoting
- Dictionary
Activity
,Plant_Parts
,Plant_compound
,country
work excellently with thesearch_lib
- everyone needs to do Alpha testing of
pygetpapers
and make a report on wikipage - everyone needs to do
search_lib
with their dictionary on their own system
- UTF-8 is a variable-width character encoding used for communication discussed by PMR.
- Every intern's Stand up
- Feedback of
pygetpapers
by Ayush based on Alpha Test Reports and continuously worked on--restart
and--update
for better improvement - We will introduce Tutorials Documentation for
pygetpapers
so people can understand well! (Volunteers Intern's worked on this with Ayush) - Review of
Search_lib.py
with different dictionary and demo set.- The scheme is:
search SECTIONS in PROJECTS with (DICTIONARIES and/or PATTERNS) with (DISPLAY and/or ANALYSIS) options
-
python search_lib.py --demo
gives different project and we have to choose from'ethics', 'luke', 'plant_parts', 'worcester', 'word'
- Alpha code for searching document corpus SEARCH TUTORIAL
- The scheme is:
- PMR explain file manager of
search_lib
, Standard-graph(matplotlib, seaborn) for graphical visualization - PMR explain supervised learning concepts in our project, a powerful way of classifying section
- dictionary review
activity
,plant
,plant_part
,plant_compound
disease
- Everyone makes sure they have a dictionary that works and has standard attributes
name
term
-
wikidataID
(if known) -
wikipediaPage
(if known) -
description
(EN)
- Everyone makes sure they have a dictionary that works and has standard attributes
- Every dictionary should have a name which is
lowercase_underscore
anda title which contains this value
- Everyone makes sure they have only ONE top-level *.xml file for their dictionary.
- This should have a name with is LOWERCASE (and optional UNDERSCORES) ONLY
- It should be the same as the title attribute in the dictionary
- We are starting to come up with a Dictionary Naming Scheme
- Every dictionary should have a name which is
lowercase_underscore
anda title which contains this value
- Review the dictionary
activity
,plant
,invasive_plant
,plant_compound
- We make sure that the dictionary works and has standard attributes
name
term
wikidataID
wikipediaPage
description
- and children:
-
<description xml:lang...>
(optional) -
<synonym>
(optional) <synonym xml:lang...>
-
<related ...>
(optional)
-
- We make sure that the dictionary works and has standard attributes
- Every dictionary should also have a mini corpus that contains content enriched in its terms. Please check that your mini corpus works with your dictionary.
- Talha: create a document and maintain the current record of each dictionary including minicorpus, file name, location, etc.
- Standup by each intern.
-
pygetpapers
review by Ayush and demonstration of on command line to gita ma'am. - Ayush explain the
--restart
and--update
and create tutorial documentation of pygetpapers (Documentation) - Dictionary review(
activity
,plant_part
,invasive_plant
,plant_compound
) of each interns and convert then into the standard format -
search_lib
review by each intern to PMR by screen sharing. - THESIS strategyFor interns (Radhu, Kanishka, Vasant, Talha)
- everyone needs to do testing of
search_lib
and make a report on wikipage Wikipage for the report - To learn the
grep
tool so it will help in this project - everyone makes sure that their respective wikipages of dictionaries are up to date.
- Reports from alpha testers on
pyami (search_lib) - commandline
https://github.com/petermr/openDiagram/wiki/Test-Report-for-Search_lib - Each intern explain working of
search-lib
with their dictionary and also gives feedback - PMR told to analyze each
false positive
andfalse negative
value in thesearch_lib
result and list them - Review the current state of dictionaries and PMR told changes to the respective owner for their dictionary.
- PMR suggest tools such as
WEKA
,R programming
,python pandas
,Excel
for statistical analysis including frequency annotations - Review of
AMI gui.py
Code (This is experimental but will develop into a GUI for The Game) for quick result analysis
- Reports from alpha testers on
pyami (search_lib) - commandline
https://github.com/petermr/openDiagram/wiki/Test-Report-for-Search_lib - Each intern explain working of
search-lib
with their dictionary and also gives feedback - PMR told to analyze each
false positive
andfalse negative
value in thesearch_lib
result and list them - Review the current state of dictionaries and PMR told changes to the respective owner for their dictionary.
- PMR suggest tools such as
WEKA
,R programming
,python pandas
,Excel
for statistical analysis including frequency annotations - Review of
AMI gui.py
Code (This is experimental but will develop into a GUI for The Game) for quick result analysis
- Review of pygetpapers by Ayush and he's focusing on Europe-pmc.py where we can run the command without giving specific output file and also the aspect of multiprocessing so code become more precise
- Ayush also discuss the points of
gui.py
tools - PMR discuss the
gui.py
tool including the different parameter of code such astkinter
,button
,dictionary
,label
- in
gui.py
we define dictionaryinvasive_plant
,eoplant_part
andcountry
and by alteration, we are going to modifying the program as well as its framework - Review of each intern's dictionary: - adding synonyms. What software do we need? -updating
- Dictionary
Activity
andPlant
: it's up to date but once need to check English language synonyms - Dictionary
eoplant_part
: Description, Wikidata URL, Wikidata ID all should be present - Dictionary
plant_compound
: Delete all the synonyms and re-add new appropriate synonyms - Dictionary
invasive_plant
: Either to remove the language other than English from taxon common name Or remove comma, use any other separator
- Dictionary
- annotations and tooltips in non-EN languages
- Review of miniprojects (minicorpora)https://github.com/petermr/CEVOpen/tree/master/minicorpora
- Vasant will create wikipages for dictionary structure including metadata of each dictionary
- PMR suggest learning data analysis tools such as
matplotlib
R programming
so it will help in the project - Everyone has to add
README.md
page for their respective dictionary and minicorpus
- General discussion about the current situation of the world due to covid pandemic, lockdown, vaccine and all.
- We are concentrating very heavily on dictionaries, minicorpora, and GUI interfaces. One goal is to support "TheGame" in May.
- Dictionary review by Screen Sharing and every intern explain their work update and demonstration of
gui.py
on their system so it gives and whether it's working on the different operating system or not. - We need to add
any of genus
,geographic region
,types of plant
,common name
in theplant
dictionary - We review
plant_genus
with SPARQL query [https://w.wiki/3DpD] for taxon common name, images - PMR explain the importance of Metadata and we need to add it to our dictionary
- PMR also add the file module ethic.xml in the dictionary so
search_lib
run without any errors - in the
gui.py
module discuss the different option so it will give desire result including checkbox, additional file browser, etc
- Everyone need to add metadata to their dictionary
- everyone needs to test
gui.py
and write a report
- PMR discussed a Code of Conduct, everyone should agree with the document https://www.contributor-covenant.org/version/2/0/code_of_conduct/
- explanation of
gui.py
interface by PMR including all the parameter regarding dictionary, section - Ayush added
Html-links
in thepygetpapers
and explain how its work -
pygetpapers
working demonstration on jupyter notebook by Ayush - every intern's explain their respective dictionary by screen sharing and give's current updates
- PMR suggest writing a SPARQL query for additional features which we want to add it to the dictionary and then we will merge it with the current dictionary
- Everyone comes up with a new SPARQL query for their dictionary
- Alpha testing of
gui.py
and write a report https://github.com/petermr/pygetpapers/wiki/Test-report-of-gui.py
- PMR gives an update of the project to Gita ma'am and briefly discussed the progress and the future directions for the project
- GY: New Interns are to join us from next week
- PMR discuss the role of "volunteer" in the project https://github.com/petermr/CEVOpen/blob/master/VOLUNTEERING.md
- Standup by interns (those who are present)
- PMR explain Search engine optimization by searching "Lantana Camara" and explain how're different scientific portals give different hits
- Review of pygetpapers by Ayush with different flag such as
--references
, `--synonym, as well as Ayush, explain how we use specific date criteria to search papersAND(First_PDATE:[2006-05-24 TO 2021-05-19])
pygetpapers -q "(Lantana) AND (FIRST_PDATE:[2006-05-24 TO 2021-05-19])" -n
- PMR explain
ami-gui.py
it'slaunch
,browser
and it's different category of section
- Everybody should come up with a SPARQL query for the respective dictionary
- Welcome Leeja, the new NIPGR intern!
- Introduction given by each intern and a brief overview of their role in the project.
- Brief introduction was given to Leeja about getting started, projects, and dictionaries.
- Welcome Daniel Mietchen
- Review of Wikidata/Scholia concerning CEVOpen with Daniel
- https://github.com/Daniel-Mietchen/ideas/issues/499
- https://scholia.toolforge.org/topic/Q202864
- https://scholia.toolforge.org/venue/Q3359737
- https://scholia.toolforge.org/work/Q21090025
- List of newly described/ redescribed taxa http://tb.plazi.org/GgServer/static/newToday.html
- Also lexemes: https://www.wikidata.org/wiki/Wikidata:Lexicographical_coverage
- Re FAIR ethics, see also http://doi.org/10.5281/zenodo.2559998 and http://doi.org/10.5281/zenodo.4720432
- for generic questions about open science, you can use https://ask-open-science.org/
- Daniel discussed Zika Virus worked with wikidata and recently published works on this topic as well Ethics Statements for PMC articles
- Daniel also discussed **Plazi **is an association supporting and promoting the development of persistent and openly accessible digital taxonomic literature.
- PMR create an initial project for Leeja (especially a dictionary- `Essential oil compound)
- Brief recap of the 'CEVopen work' so far to Leeja
- PMR gives a demonstration of our software
pygetpapers
,pyami
&gui.py
to Leeja - Review of each interns SPARQL query and dictionary as well
- PMR also discussed dictionary enhancements, especially to help select terms (description, images, categories (e.g. different subtypes))
- Standup by interns
- what I did
- what I plan to do
- what is blocking me
- Discuss project of Leeja
- Leeja's role is to help create a phytochemistry resource that integrates the dictionaries:
- plants (Radhu)
- their essential oils and the compounds in them (Talha)
- geographical information (Ambreen)
- biological and other activity (Radhu)
- This will be driven by text from the phytochemical/EO literature and Wikidata. In general, the papers will report:
- what plant/s were used
- where they were found/harvested
- the oils extracted from them
- the activity reported
- Leeja's role is to help create a phytochemistry resource that integrates the dictionaries:
- Review of immediate priorities
- dictionaries
- searching using
ami-gui
- Shweata gave a brief overview of the
Ethics subproject
- Ayush discussed automated documentation of
pygetpapers
and a few new feature - Ayush introduced
a logo
,table of contents
andarchitecture diagrams
- Ayush Added other things in readme https://github.com/petermr/pygetpapers/blob/main/README.md
- PMR developed the update_from_Sparql function for dictionaries
- PMR discussed the code in
search_lib.SearchDictionary.test()
- The components are:
-
id_name
The field containing the wikidataURL -
sparql_name
Thename
ofbinding
in the sparql file -
dict_name
name of the new child element in the dictionary
-
- The components are:
- pyami update dictionaries from SPARQL
- Welcome Sagar the new NIPGR intern!
- Introduction given to Sagar and a brief overview of his role in the project
- PMR gave a brief introduction about the project and shows the dictionaries and explain the interrelationships and the minicorpora
- PMR demonstrates the software
ami_gui
,ami_search
to Sagar - Shweata gave a brief overview of the
Ethics Statement Project
- Review of Dictionary and their SPARQL output(interns who are present describe their projects)
- Radhu discussed
eoplant
andactivity
dictionary and update of SPARQL output of theeoplant
dictionary - Kanishka discussed the
invasiveplants
dictionary and PMR suggested to add GISD database - Vasant discussed images display of
plantpart
dictionary
- Radhu discussed
- PMR suggested the following points for the dictionary and SPARQL output
- SPARQL output are in .XML format
- Root element is Dictionary, and it must have a title. And it's got several entry elements. The entry element has a large number of attributes.
- Synonyms are child elements under entry
- Sagar need to collect a list of all intern dictionaries and indicate which require updating from SPARQL
- All SPARQL output names should be of the form:
sparql_d(d).xml
- Review of
pygetpapers
by Ayush- He added the
prototype code
at https://github.com/ayush4921/funlilrepo/blob/main/test.py - Ayush discussed the issue with supplementary files and added a check for zero size supp files
- He added the
- Review of
ami_gui.py
by PMR- PMR discuss how to extract images with the help of Selenium from PDF
- Creation of (exact) multiword search and demonstrate with the dictionaries
country
,eoplant
andorganization
- Creation of
sparql2amidict.py
including display options based onami_gui
by PMR
- Discussion about today's meeting agenda
- PMR demonstrates the progress of
ami_gui
with the dictionaryeoPlant
andorganization
and explain the multiword terms searching and getting images in the paper - Allocation of work to the Sagar - Dictionary manager
- Report on dictionaries and Sagar create Wiki table and present it https://github.com/petermr/CEVOpen/wiki/Intern-Dictionaries and the tasks include:
- checking the title of the dictionary is the same as the filename
- for each entry:
- checking that Wikipedia links are present
- checking Wikidata links
- checking that term is a useful noun of phrase Much of this can be done automatically
- Review of
eoPlant
andactivity
dictionary and discussed the changes as below:- rename
plant
toeoPlant
name of dictionary - add minicorpus of 1000 paper for the
activity
dictionary
- rename
- Shweata showed how Ayush and she were able to extract phrases from Ethics Statement using
SpaCy
and also discussed the problems of organization dictionary i.e. the rendering issue
- Welcome new intern - Bhavini
- Introduction given to Bhavini and a brief overview of his role in the project
- Brief introduction by each intern and getting started
- Shweata gave an introduction about the project and shows the dictionaries and explain the interrelationships and the minicorpora
- Shweata gave a brief about code of conduct to Bhavani
- Each intern present their work by screen sharing and explain the working status
-
ami-gui
review by PMR- search strategy, term extraction (Rake) - Ayush explain
pygetpapers
to bhavini and also discussed suppdata and images issue with PMR - Feedback from ethics project by Shweata
- Shweata and PMR discussed the issue with publishing scholarly articles
- PMR demonstrate
ami_gui
witheoPlant
andorganization
dictionary - Sagar presents the list of dictionaries that need automatic updating by SPARQL files https://github.com/petermr/CEVOpen/wiki/Intern-Dictionaries
- PMR updated the SPARQL update tool, At present, it's a test
SearchDictionary.test_update_in_repo()
- with the help of SPARQL update tool PMR update features such as
image_link
,taxon
ineoPlant
dictionary
- Every intern make sure to give information on dictionaries and their issues to Sagar
- Gita ma'am and PMR discuss the Guidance for interns thesis/reports https://github.com/petermr/CEVOpen/wiki/THESIS-strategy#update-2021-05-24
- Sagar present the current update of the intern's dictionary and their SPARQL output to Gita ma'am
- Gita ma'am suggest point such as dictionary link, table strategy to Sagar and also suggest NIPGR intern make a small video clip on their work
- PMR demonstrate the latest version of
ami_gui
to gita Ma'am and shows how we can extract the image for each particular plant species - Process the updating of:
-
eoPlant
@Radhu Ladani -
activity
@Radhu Ladani -
Invasive
@Kanishka -
Compounds
@Talha Hasan -
Plant Parts
@VASANT KUMAR -
Plant Genus
@Shweata Hegde -
Organization
@Shweata Hegde
-
- Welcome new intern - Chaitanya
- Introduction is given to the new intern and a brief overview of his role in the project
- Brief introduction by each intern and getting started
- Shweata introduced the project framework and working management and role of each intern https://github.com/petermr/CEVOpen/wiki/Interns-and-Roles
- Thesis discussion and clarification with Gita ma'am with Kanishka and Radhu
- PMR and Gita ma'am discussed the new intern's project topic related to Kanishka's Project Invasive species
- Sagar presents his work-related data collection of intern's dictionary management https://github.com/petermr/CEVOpen/wiki/Intern-Dictionaries and discussed the issue with duplicate entries in the dictionary
- PMR demonstrate the software to the Chaitanya and gives some brief about working
- We also discussed the duplicate value in the dictionary and False Positive and False Negative analysis
- Radhu presents the
ami search
result ofactivity_corpus
with dictionaryactivity
,eoPlant
, andplant_compound
and also defines the problem with the case sensitive frequency result - PMR suggest building a tool that works the same as traditional
ami
so we can solve the problem of case sensitive issue with the data frequency table - PMR discuss the open literature search https://openknowledgemaps.org/ with the invasive spices
- Vasant gives an update of the gene list and discusses the direction of creating the dictionary on it
- Gita ma'am discussed the Sagar work for the project to management and Sagar needs to start work on a separate project.
- Sagar discussed his idea of secondary metabolite dictionary with PMR and Gita ma'am
- Gita ma'am suggests Sagar work with Radhu's and Kanishka's dictionary to maintain it in a proper way and need to find the missing data of synonyms and Wikipedia page URL of entries in the dictionary
- PMR and Gita ma'am discuss the process of assigning roles for new interns.
- current update of
ami_gui
demonstrated by PMR to Gita ma'am and value extraction for the specific section of the paper for the data table value. - Radhu and shweata present their dictionary by screen sharing and Talha, Vasant also give the update to the Gita ma'am
- PMR discussed the inter annotated agreement for the
ami_gui
- by
ami_gui
PMR introduced the classification of values on the basis of false-positive and false-negative criteria - PMR also introduced the GoldStandard in Abstracts of oil26 https://github.com/petermr/CEVOpen/wiki/GoldStandard-in-Abstracts-of-oil26
- Every intern come up with the data file for GoldStandard in Abstracts of oil26
- Welcome new intern Vishmaya
- Shweata introduced the project framework and working management and role of each intern https://github.com/petermr/CEVOpen/wiki/Interns-and-Roles
- Radhu and Kanishka explain their works and dictionary to Vishmaya
- PMR explains the refactoring of
ami-gui
intopyami
and narratives/sub workflows through the modules which proposed functionality - PMR also gives guidance for the gold standard of the abstract of
oil26
https://github.com/petermr/CEVOpen/wiki/GoldStandard:-Abstracts-of-oil26 - Chaitanya and PMR discussed developments
inML.classification (TfIdfTokenizer)
- Every intern come up with the sets of rules for their miniproject and dictionaries
January interns (Radhu, Kanishka, Talha, Vasant) presented their research work and talked about their thesis. Radhu's stint came to an end as she submitted her thesis and video presentation. Kanishka's report showed improvements, a few suggestions were made by GY and PMR to improve the report ( Mentioning the fact that corpus size and query heavily effect the conclusions we can draw from our corpus ). Vasant and Talha have one month to improve their dictionaries and deal with scientific problems (e.g dealing with isomers etc. ) . June interns (Bhavini, Chaitanya) presented their research plan after completing one month in the project. Priorities were assigned by PMR to both Bhavini and Chaitanya. Chaitanya will work on the ethics statement project alongside Shweata. Ayush talked about improvements in pygetpapers and how it more conformant. Usage of pylint discussed in brief. PMR demonstrated pyami usage in command line and the usage of symbolic names using config files. Sagar gave an update on current status of the plant gene dictionary. The meeting concluded after Shweata demonstrated the usage if k-means clustering in her ethics statement project to derive features and classify sentences into clusters.