-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thematic analytics #4505
Comments
implementation options vary from a static regular PDF summaries to on-portal thematic section to hosted portals. Key elements: 1) numbers, all kinds 2) map 3) taxonomy, e.g. pie or box chart plus a tree (if dynamic, like metrics) |
@dschigel, for any type of statistics, someone would probably have to identify and tag all the relevant datasets |
Yes. Before BoR is fixed, this will have to be done at the i) dataset (tagging) ii) extension use and sometimes iii) publisher level. Example: everything from the MGnify publisher should contribute to the DNA analytics, we don't need to bother with dataset tagging and extension was not used - confirmed by @thomasstjerne |
I would expect that for this kind of statistics, we probably want everything to be accessible in one query. We wouldn't want something like "this publisher and those three datasets and these three records", we probably want something like "all the datasets with tag A". Maybe when we have categories for datasets (gbif/registry#247), some things will be easier. But in any case, someone will have to identify the datasets that you want to make metrics on. That probably will be the most time consuming. |
Not yet, but this can and should be done once we here from INF |
Would soil, freshwater etc be dataset-based metrics, or taxon-based? |
Needs thinking and testing, but a quick answer is datasets based (in plural), with some cases where tagging by theme can be speeded up / verified / automated by detecting use of known extension or by origins from certain publishers. |
For DNA, this is currently something like: |
Can we test this with DNA, to start with? The WP2023 says: Projected outcomes for 2027: Knowledge gaps are reduced by consolidating data coverage across the thematic areas of relevance. How do we know we did? But doing regular analytics, the before and after, theme by theme. Would country analytics code x criteria offered by @thomasstjerne do the trick? |
Maybe we can think about this with a country report at hand as a model https://analytics-files.gbif-uat.org/country/DK/GBIF_CountryReport_DK.pdf - we can even plan over a printout which elements are applicable for thematic analytics, which not? Even if analytics as vizualisation will not be ready soon, a capture of January state of data per theme is necessary. DNA is the most compact and straightforward to try, once we know it will work, we can proceed to tag soil, health and freshwater datasets (2023 priorities), then eventually all the rest. |
Thematic analytics
As discussed with @tobiasgf @thomasstjerne , it would be nice to have regular analytics, similar to country reports, for the key thematic segments in GBIF, especially DNA, soil, freshwater, etc. @kingenloff we will need this for heath, too.
Github user: @dschigel
User: See in registry - Send email
System: Chrome 108.0.0 / Windows 10.0.0
Referer: https://www.gbif.org/health
Window size: width 1847 - height 913
API log
Site log
System health at time of feedback: OPERATIONAL
The text was updated successfully, but these errors were encountered: