Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full rendering - first glance #5

Closed
16 of 18 tasks
mslw opened this issue Apr 12, 2023 · 11 comments
Closed
16 of 18 tasks

Full rendering - first glance #5

mslw opened this issue Apr 12, 2023 · 11 comments

Comments

@mslw
Copy link
Collaborator

mslw commented Apr 12, 2023

This is an unordered listing of nitpicks from the group chat:

  • funding rendering of superdataset is suboptimal (e.g. lacks grant identifier & description)
  • italics tags in several publications (e.g. on CAPRIN1) and <scp> tags in C03 -- IIRC they came from Crossref, probably can be stripped by the metadata translator
  • Capitalization issue in author names (see e.g. A04, INF, Z01) -- can be fixed in the web scraping script
  • A01 has "Open Position" as last author -- can be fixed in the web scraping script
  • No project name on a project subdataset landing page -- can be fixed in the web scraping script to include project (code)name in the title (1)
  • All SFB project datasets must also have the SFB funding statement -- if possible it would be best to add that to cff, otherwise we would need to use studyminimeta in combination or instead of cff, tricky...
  • The superdataset itself has Keywords, but the individual projects do not - I suspect this is what makes the keyword search currently dysfunctional?
  • What Keywords would be supposed to go into the project-subds keyword section? Would it be possible and sensible to omit the section if there are no keywords? -- for the first part, sfb project pages that were scraped have no keywords, so they would be up to our imagination and manual curation
  • if it is easily possible, having a visual indicator for (the number of) subdatasets or a subdataset (in the subdataset listing of a dataset) would be a nice feature for avoiding the click-to-disappoint experience ;-)
  • Mangle GIN URLs a bit more #3
  • Along these lines: when I click on https://psychoinformatics-de.github.io/sfb1451-projects-catalog/#/dataset/972860f9-75b9-4ecc-b546-99e1d6aad5f9/098bc74ecb94586948991fa05bec12f73ec99f8b I see "There are no subdatasets listed for the current dataset", rather than the useful bit of information (publications) that this dataset has
  • What usage do we expect for "Export metadata"? I cannot come up with one right away, and if that is symptomatic, I think the button should be less prominent
  • I get a 404 when I click the "i" button on the top-right
  • I clicked on "export metadata" for https://psychoinformatics-de.github.io/sfb1451-projects-catalog/#/dataset/f1a7ead6-a448-4c29-aad5-921e59db6aba/9461caf6458e09fa69879438f6984b1d9ad4ffe9, and something seems to be odd with certain parts of the metadata extraction (2)
  • There is a twitter button on the top-right. I think it is inappropriate for the SFB data catalog to serve as a follower aggregator for datalad. If the SFB has a social media presence, mabye the Twitter icon could link to that instead of DataLad?
  • code-base link could similarly link to the sfb-catalog sources? / it makes sense to point to the catalog docs on the top-right, but having two links and one being the source code of the generator seems a bit much for a concrete deployment like this
  • No journal for the publication in https://psychoinformatics-de.github.io/sfb1451-projects-catalog/#/dataset/12c3086b-71dc-45c1-9169-1feb0bea759a/7aa3a891f1e89f3845271fa9e9af48dd67da503d - maybe because its a preprint?
  • Publications are duplicated #4

(1) quoting @mih: Conceptually we cannot propagate the subdataset name in the superdataset to that page (the page is the same regardless of how many different names this datasets has in different superdatasets). So the choice needs to be made elsewhere, but it must be made. Likely the title metadata source should be different or a composite

(2) "key_source_map": {"name": ["datacite_gin"], "description": ["datacite_gin"], "license": ["datacite_gin"], "authors": ["datacite_gin"], "keywords": ["datacite_gin"], "type": ["metalad_core"], "dataset_id": ["metalad_core"], "dataset_version": ["metalad_core"], "url": ["metalad_core"]}, "sources": [{"source_name": "datacite_gin", "source_version": "0.0.1", "source_parameter": {}, "source_time": 1681234618.8610961, "agent_email": "[email protected]", "agent_name": "Micha\u0142 Szczepanik"}

@jsheunis
Copy link
Collaborator

Some comments from my side, apologies if this duplicates information stated elsewhere:

funding rendering of superdataset is suboptimal (e.g. lacks grant identifier & description)

Easily fixed in metadata, but perhaps the translator can also extract the information in a smarter way?

No project name on a project subdataset landing page -- can be fixed in the web scraping script to include project (code)name in the title (1)

An additional step could be to include the project name as a keyword in the cff file.

The superdataset itself has Keywords, but the individual projects do not - I suspect this is what makes the keyword search currently dysfunctional?

For context, any dataset in the catalog can have keywords that were provided via metadata, and if datasets containing keywords are listed as subdatasets of a particular dataset, all of these keywords spanning subdatasets together will constitute the keyword search space.

What Keywords would be supposed to go into the project-subds keyword section? Would it be possible and sensible to omit the section if there are no keywords? -- for the first part, sfb project pages that were scraped have no keywords, so they would be up to our imagination and manual curation

@adswa I am not sure whether it would be sensible to omit the keyword search if there aren't any subdataset keywords. This is a general UX challenge: figuring out what to hide or display based on the availability of content. Hiding it makes for an unpredictable UX (the search field is sometimes there, sometimes not), and showing it in the absence of keywords could be confusing. I lean towards the former, but can be convinced otherwise. W.r.t. the SFB catalog in particular, i think we should manually curate some keywords, it would make the catalog better to interact with.

if it is easily possible, having a visual indicator for (the number of) subdatasets or a subdataset (in the subdataset listing of a dataset) would be a nice feature for avoiding the click-to-disappoint experience ;-)

@mih For clarity, does this mean displaying the number of subdatasets of a _sub_dataset? Certainly possible. Relevant catalog issue: datalad/datalad-catalog#280

Along these lines: when I click on https://psychoinformatics-de.github.io/sfb1451-projects-catalog/#/dataset/972860f9-75b9-4ecc-b546-99e1d6aad5f9/098bc74ecb94586948991fa05bec12f73ec99f8b I see "There are no subdatasets listed for the current dataset", rather than the useful bit of information (publications) that this dataset has

@mih It's easy to show the publications if other tabs are empty, but I'm not sure what would be the best for a consistent UX. We could of course do it differently for SFB versus catalog. Relevant recent discussion about the same topic here: datalad/datalad-catalog#266

What usage do we expect for "Export metadata"? I cannot come up with one right away, and if that is symptomatic, I think the button should be less prominent

Yeah I don't think users would use this feature a lot, I agree it can be made less prominent. Catalog issue: datalad/datalad-catalog#281

I get a 404 when I click the "i" button on the top-right

Thanks for catching. This needs to default to something standard, or the button should be hidden, if "about" content is not provided during catalog generation. Relevant catalog issue: datalad/datalad-catalog#270

I clicked on "export metadata" for https://psychoinformatics-de.github.io/sfb1451-projects-catalog/#/dataset/f1a7ead6-a448-4c29-aad5-921e59db6aba/9461caf6458e09fa69879438f6984b1d9ad4ffe9, and something seems to be odd with certain parts of the metadata extraction (2)

@adswa I did the same and find nothing wrong with the metadata. To clarify, this is catalog metadata, not metalad-extracted metadata, and it therefore adheres to its own schema which could explain the unexpected "something seems to be odd". Or does this comment refer to something else?

There is a twitter button on the top-right. I think it is inappropriate for the SFB data catalog to serve as a follower aggregator for datalad. If the SFB has a social media presence, mabye the Twitter icon could link to that instead of DataLad?
code-base link could similarly link to the sfb-catalog sources? / it makes sense to point to the catalog docs on the top-right, but having two links and one being the source code of the generator seems a bit much for a concrete deployment like this

Agreed. Catalog issue: datalad/datalad-catalog#282

@mslw
Copy link
Collaborator Author

mslw commented Apr 12, 2023

funding rendering of superdataset is suboptimal (e.g. lacks grant identifier & description)

Easily fixed in metadata, but perhaps the translator can also extract the information in a smarter way?

I wonder how to do it best without breaking the paradigm. Probably translator indeed. We use studyminimeta as the source for the funding information, and there it's just text, no fields. I opened an issue in wackyextras - would be hestiant to add the same logic to the catalog's translator. mslw/datalad-wackyextra#2

@mslw
Copy link
Collaborator Author

mslw commented Apr 12, 2023

All SFB project datasets must also have the SFB funding statement -- if possible it would be best to add that to cff, otherwise we would need to use studyminimeta in combination or instead of cff, tricky...

CFF valid keys do not include anything that would correspond to funding.

CFF has references keys which stores an array of objects. Object's fields include type, and reference.type is an enum for which "grant" is valid. But using it to store funding information feels like abuse of the format (no funding-specific keys either).

So the only "dataset metadata file" format we have for grants is studyminimeta. But we decided to use CFF for project superdatasets because it's a wider standard. We could drop in a studyminimeta file into all project datasets in addition (in the current catalog, funding info is merged, all things we care about have priorities).

Studyminimeta file (if I understand correctly) has to have at least oone keyword. So through this we would also give everyone a keyword (probably "motor control"). Which may in fact be good in light of the comments about keywords made earlier.

At this stage I wonder if I should worry about "breaking paradigm" (paradigm being that all metadata in the catalog needs to come from datasets through extraction and translation). I could, after all, for every project update add one more metadata item, that would contain funding information (metadata source: catalog curation)...

@jsheunis
Copy link
Collaborator

I think breaking paradigm is fine for this specific goal. We're keeping track of everything, so that information will feed back into the process and inform updates.

@adswa
Copy link

adswa commented Apr 12, 2023

For context, any dataset in the catalog can have keywords that were provided via metadata, and if datasets containing keywords are listed as subdatasets of a particular dataset, all of these keywords spanning subdatasets together will constitute the keyword search space.

ah, I guess it just showed me "no metadata tags found" because there are so few. Thx :)

@adswa
Copy link

adswa commented Apr 12, 2023

@adswa I did the same and find nothing wrong with the metadata. To clarify, this is catalog metadata, not metalad-extracted metadata, and it therefore adheres to its own schema which could explain the unexpected "something seems to be odd". Or does this comment refer to something else?

The metadata I was referring to is in footnote 2 of the original post:

(2) "key_source_map": {"name": ["datacite_gin"], "description": ["datacite_gin"], "license": ["datacite_gin"], "authors": ["datacite_gin"], "keywords": ["datacite_gin"], "type": ["metalad_core"], "dataset_id": ["metalad_core"], "dataset_version": ["metalad_core"], "url": ["metalad_core"]}, "sources": [{"source_name": "datacite_gin", "source_version": "0.0.1", "source_parameter": {}, "source_time": 1681234618.8610961, "agent_email": "[email protected]", "agent_name": "Micha\u0142 Szczepanik"}

specifically all those "datacite_gin" values struck me as odd - e.g., "license" = "datacide_gin"?

Can you make sense of it?

@mslw
Copy link
Collaborator Author

mslw commented Apr 12, 2023

specifically all those "datacite_gin" values struck me as odd - e.g., "license" = "datacide_gin"?

Can you make sense of it?

That's still under "key_source_map" key. To enable setting source priority for a given field (source names in preferred order, in catalog config), the metadata source is stored in the catalog. If that is the only metadata, I would worry though ;)

@adswa
Copy link

adswa commented Apr 12, 2023

Oh man, thanks for clarifying :) 🤦

@jsheunis
Copy link
Collaborator

@mih @mslw Updated comment re:

if it is easily possible, having a visual indicator for (the number of) subdatasets or a subdataset (in the subdataset listing of a dataset) would be a nice feature for avoiding the click-to-disappoint experience ;-)

datalad/datalad-catalog#280 (comment)

@jsheunis
Copy link
Collaborator

Along these lines: when I click on https://psychoinformatics-de.github.io/sfb1451-projects-catalog/#/dataset/972860f9-75b9-4ecc-b546-99e1d6aad5f9/098bc74ecb94586948991fa05bec12f73ec99f8b I see "There are no subdatasets listed for the current dataset", rather than the useful bit of information (publications) that this dataset has

@mih It's easy to show the publications if other tabs are empty, but I'm not sure what would be the best for a consistent UX. We could of course do it differently for SFB versus catalog. Relevant recent discussion about the same topic here: datalad/datalad-catalog#266

@mslw
Copy link
Collaborator Author

mslw commented Sep 27, 2023

I went through the list and ticked the boxes that no longer reply (although I didn't try to pinpoint specific changes that introduced them).

Of note, (not) displaying the empty subdatasets page has seen changes that were ultimately reverted in the catalog, so the comment above is still valid.

I am closing this issue as resolved.

@mslw mslw closed this as completed Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants