Full rendering - first glance #5

mslw · 2023-04-12T08:42:49Z

This is an unordered listing of nitpicks from the group chat:

(1) quoting @mih: Conceptually we cannot propagate the subdataset name in the superdataset to that page (the page is the same regardless of how many different names this datasets has in different superdatasets). So the choice needs to be made elsewhere, but it must be made. Likely the title metadata source should be different or a composite

(2) "key_source_map": {"name": ["datacite_gin"], "description": ["datacite_gin"], "license": ["datacite_gin"], "authors": ["datacite_gin"], "keywords": ["datacite_gin"], "type": ["metalad_core"], "dataset_id": ["metalad_core"], "dataset_version": ["metalad_core"], "url": ["metalad_core"]}, "sources": [{"source_name": "datacite_gin", "source_version": "0.0.1", "source_parameter": {}, "source_time": 1681234618.8610961, "agent_email": "[email protected]", "agent_name": "Micha\u0142 Szczepanik"}

The text was updated successfully, but these errors were encountered:

jsheunis · 2023-04-12T15:19:20Z

Some comments from my side, apologies if this duplicates information stated elsewhere:

funding rendering of superdataset is suboptimal (e.g. lacks grant identifier & description)

Easily fixed in metadata, but perhaps the translator can also extract the information in a smarter way?

No project name on a project subdataset landing page -- can be fixed in the web scraping script to include project (code)name in the title (1)

An additional step could be to include the project name as a keyword in the cff file.

The superdataset itself has Keywords, but the individual projects do not - I suspect this is what makes the keyword search currently dysfunctional?

For context, any dataset in the catalog can have keywords that were provided via metadata, and if datasets containing keywords are listed as subdatasets of a particular dataset, all of these keywords spanning subdatasets together will constitute the keyword search space.

What Keywords would be supposed to go into the project-subds keyword section? Would it be possible and sensible to omit the section if there are no keywords? -- for the first part, sfb project pages that were scraped have no keywords, so they would be up to our imagination and manual curation

@adswa I am not sure whether it would be sensible to omit the keyword search if there aren't any subdataset keywords. This is a general UX challenge: figuring out what to hide or display based on the availability of content. Hiding it makes for an unpredictable UX (the search field is sometimes there, sometimes not), and showing it in the absence of keywords could be confusing. I lean towards the former, but can be convinced otherwise. W.r.t. the SFB catalog in particular, i think we should manually curate some keywords, it would make the catalog better to interact with.

if it is easily possible, having a visual indicator for (the number of) subdatasets or a subdataset (in the subdataset listing of a dataset) would be a nice feature for avoiding the click-to-disappoint experience ;-)

@mih For clarity, does this mean displaying the number of subdatasets of a _sub_dataset? Certainly possible. Relevant catalog issue: datalad/datalad-catalog#280

Along these lines: when I click on https://psychoinformatics-de.github.io/sfb1451-projects-catalog/#/dataset/972860f9-75b9-4ecc-b546-99e1d6aad5f9/098bc74ecb94586948991fa05bec12f73ec99f8b I see "There are no subdatasets listed for the current dataset", rather than the useful bit of information (publications) that this dataset has

@mih It's easy to show the publications if other tabs are empty, but I'm not sure what would be the best for a consistent UX. We could of course do it differently for SFB versus catalog. Relevant recent discussion about the same topic here: datalad/datalad-catalog#266

What usage do we expect for "Export metadata"? I cannot come up with one right away, and if that is symptomatic, I think the button should be less prominent

Yeah I don't think users would use this feature a lot, I agree it can be made less prominent. Catalog issue: datalad/datalad-catalog#281

I get a 404 when I click the "i" button on the top-right

Thanks for catching. This needs to default to something standard, or the button should be hidden, if "about" content is not provided during catalog generation. Relevant catalog issue: datalad/datalad-catalog#270

I clicked on "export metadata" for https://psychoinformatics-de.github.io/sfb1451-projects-catalog/#/dataset/f1a7ead6-a448-4c29-aad5-921e59db6aba/9461caf6458e09fa69879438f6984b1d9ad4ffe9, and something seems to be odd with certain parts of the metadata extraction (2)

@adswa I did the same and find nothing wrong with the metadata. To clarify, this is catalog metadata, not metalad-extracted metadata, and it therefore adheres to its own schema which could explain the unexpected "something seems to be odd". Or does this comment refer to something else?

There is a twitter button on the top-right. I think it is inappropriate for the SFB data catalog to serve as a follower aggregator for datalad. If the SFB has a social media presence, mabye the Twitter icon could link to that instead of DataLad?
code-base link could similarly link to the sfb-catalog sources? / it makes sense to point to the catalog docs on the top-right, but having two links and one being the source code of the generator seems a bit much for a concrete deployment like this

Agreed. Catalog issue: datalad/datalad-catalog#282

mslw · 2023-04-12T16:10:35Z

funding rendering of superdataset is suboptimal (e.g. lacks grant identifier & description)

Easily fixed in metadata, but perhaps the translator can also extract the information in a smarter way?

I wonder how to do it best without breaking the paradigm. Probably translator indeed. We use studyminimeta as the source for the funding information, and there it's just text, no fields. I opened an issue in wackyextras - would be hestiant to add the same logic to the catalog's translator. mslw/datalad-wackyextra#2

mslw · 2023-04-12T16:30:29Z

All SFB project datasets must also have the SFB funding statement -- if possible it would be best to add that to cff, otherwise we would need to use studyminimeta in combination or instead of cff, tricky...

CFF valid keys do not include anything that would correspond to funding.

CFF has references keys which stores an array of objects. Object's fields include type, and reference.type is an enum for which "grant" is valid. But using it to store funding information feels like abuse of the format (no funding-specific keys either).

So the only "dataset metadata file" format we have for grants is studyminimeta. But we decided to use CFF for project superdatasets because it's a wider standard. We could drop in a studyminimeta file into all project datasets in addition (in the current catalog, funding info is merged, all things we care about have priorities).

Studyminimeta file (if I understand correctly) has to have at least oone keyword. So through this we would also give everyone a keyword (probably "motor control"). Which may in fact be good in light of the comments about keywords made earlier.

At this stage I wonder if I should worry about "breaking paradigm" (paradigm being that all metadata in the catalog needs to come from datasets through extraction and translation). I could, after all, for every project update add one more metadata item, that would contain funding information (metadata source: catalog curation)...

jsheunis · 2023-04-12T16:36:36Z

I think breaking paradigm is fine for this specific goal. We're keeping track of everything, so that information will feed back into the process and inform updates.

adswa · 2023-04-12T16:39:55Z

For context, any dataset in the catalog can have keywords that were provided via metadata, and if datasets containing keywords are listed as subdatasets of a particular dataset, all of these keywords spanning subdatasets together will constitute the keyword search space.

ah, I guess it just showed me "no metadata tags found" because there are so few. Thx :)

adswa · 2023-04-12T16:41:34Z

@adswa I did the same and find nothing wrong with the metadata. To clarify, this is catalog metadata, not metalad-extracted metadata, and it therefore adheres to its own schema which could explain the unexpected "something seems to be odd". Or does this comment refer to something else?

The metadata I was referring to is in footnote 2 of the original post:

(2) "key_source_map": {"name": ["datacite_gin"], "description": ["datacite_gin"], "license": ["datacite_gin"], "authors": ["datacite_gin"], "keywords": ["datacite_gin"], "type": ["metalad_core"], "dataset_id": ["metalad_core"], "dataset_version": ["metalad_core"], "url": ["metalad_core"]}, "sources": [{"source_name": "datacite_gin", "source_version": "0.0.1", "source_parameter": {}, "source_time": 1681234618.8610961, "agent_email": "[email protected]", "agent_name": "Micha\u0142 Szczepanik"}

specifically all those "datacite_gin" values struck me as odd - e.g., "license" = "datacide_gin"?

Can you make sense of it?

mslw · 2023-04-12T18:13:19Z

specifically all those "datacite_gin" values struck me as odd - e.g., "license" = "datacide_gin"?

Can you make sense of it?

That's still under "key_source_map" key. To enable setting source priority for a given field (source names in preferred order, in catalog config), the metadata source is stored in the catalog. If that is the only metadata, I would worry though ;)

adswa · 2023-04-12T18:37:28Z

Oh man, thanks for clarifying :) 🤦

jsheunis · 2023-04-13T14:17:56Z

@mih @mslw Updated comment re:

if it is easily possible, having a visual indicator for (the number of) subdatasets or a subdataset (in the subdataset listing of a dataset) would be a nice feature for avoiding the click-to-disappoint experience ;-)

datalad/datalad-catalog#280 (comment)

jsheunis · 2023-04-14T06:32:07Z

Along these lines: when I click on https://psychoinformatics-de.github.io/sfb1451-projects-catalog/#/dataset/972860f9-75b9-4ecc-b546-99e1d6aad5f9/098bc74ecb94586948991fa05bec12f73ec99f8b I see "There are no subdatasets listed for the current dataset", rather than the useful bit of information (publications) that this dataset has

@mih It's easy to show the publications if other tabs are empty, but I'm not sure what would be the best for a consistent UX. We could of course do it differently for SFB versus catalog. Relevant recent discussion about the same topic here: datalad/datalad-catalog#266

mslw · 2023-09-27T10:23:19Z

I went through the list and ticked the boxes that no longer reply (although I didn't try to pinpoint specific changes that introduced them).

Of note, (not) displaying the empty subdatasets page has seen changes that were ultimately reverted in the catalog, so the comment above is still valid.

I am closing this issue as resolved.

jsheunis mentioned this issue Apr 13, 2023

ENH+NF: add javascript customization options via config datalad/datalad-catalog#283

Merged

mslw mentioned this issue Apr 13, 2023

ENH: javascript tweaks #6

Merged

mslw closed this as completed Sep 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full rendering - first glance #5

Full rendering - first glance #5

mslw commented Apr 12, 2023 •

edited

Loading

jsheunis commented Apr 12, 2023

mslw commented Apr 12, 2023

mslw commented Apr 12, 2023

jsheunis commented Apr 12, 2023

adswa commented Apr 12, 2023

adswa commented Apr 12, 2023

mslw commented Apr 12, 2023

adswa commented Apr 12, 2023

jsheunis commented Apr 13, 2023

jsheunis commented Apr 14, 2023

mslw commented Sep 27, 2023

Full rendering - first glance #5

Full rendering - first glance #5

Comments

mslw commented Apr 12, 2023 • edited Loading

jsheunis commented Apr 12, 2023

mslw commented Apr 12, 2023

mslw commented Apr 12, 2023

jsheunis commented Apr 12, 2023

adswa commented Apr 12, 2023

adswa commented Apr 12, 2023

mslw commented Apr 12, 2023

adswa commented Apr 12, 2023

jsheunis commented Apr 13, 2023

jsheunis commented Apr 14, 2023

mslw commented Sep 27, 2023

mslw commented Apr 12, 2023 •

edited

Loading