Add Schema.org or Croissant metadata to header of Dataset view page #350

ekraffmiller · 2024-03-19T22:22:54Z

Currently the JSF Dataset page has schema.org info embedded in the header, which in the future may be replaced with Croissant. The SPA version of the page has to replicate this.
Here is what it looks like in the JSF Header:

<script type="application/ld+json">{"@context":"http://schema.org","@type":"Dataset","@id":"https://doi.org/10.5072/FK2/SCYB0O","identifier":"https://doi.org/10.5072/FK2/SCYB0O","name":"Testing embargo","creator":[{"@type":"Person","givenName":"Guillermo","familyName":"Portas","name":"Portas, Guillermo"}],"author":[{"@type":"Person","givenName":"Guillermo","familyName":"Portas","name":"Portas, Guillermo"}],"datePublished":"2024-03-14","dateModified":"2024-03-14","version":"1","description":"test","keywords":["Business and Management"],"license":"http://creativecommons.org/publicdomain/zero/1.0","includedInDataCatalog":{"@type":"DataCatalog","name":"Root","url":"https://beta.dataverse.org"},"publisher":{"@type":"Organization","name":"Root"},"provider":{"@type":"Organization","name":"Root"},"distribution":[{"@type":"DataDownload","name":"dataverse_files (2).zip","encodingFormat":"application/zip","contentSize":4540,"contentUrl":"https://beta.dataverse.org/api/access/datafile/26133"},{"@type":"DataDownload","name":"FilesIT.java","encodingFormat":"text/x-java-source","contentSize":154657,"contentUrl":"https://beta.dataverse.org/api/access/datafile/26132"}]}

The Dataverse API for getting this uses the exporter, for Schema.org:
https://beta.dataverse.org/api/datasets/export?exporter=schema.org&persistentId=doi:10.5072/FK2/SCYB0O
And for Croissant format:
https://beta.dataverse.org/api/datasets/export?exporter=croissant&persistentId=doi:10.5072/FK2/SCYB0O

To test rich results Search Google Rich Results

The text was updated successfully, but these errors were encountered:

pdurbin · 2024-05-07T14:35:22Z

One concern we have is what to do when the schema.org or croissant files get large, such as 7 MB for a dataset with 25k files. These issues are related:

Also, in JSF we show the schema.org version unless the croissant jar file is present:

Allow optional Croissant exporter to replace JSON-LD <head> content dataverse#10382

I wrote some docs about this in an (open) pull request:

g-saracca · 2024-05-22T19:00:20Z

For a quick proof of concept, it would be ideal to do a simple insert of the expected script (hardcoded & type="application/ld+json") in question into the head of the single index.html that handles the SPA.
Simply from the home (Collection page), in a useEffect that runs only once, so we can simulate how it would really be the insertion of this script inside the head once the SPA Javascript is loaded and thus confirm through Search Google Rich Results if the script is being detected or not.

As a second approach, if we know that the script is detected, we should detect the persistentId in question through the url of the page of a Dataset, fetch the endpoint mentioned with the persitentId and insert the result in a script type “application/ld+json” in the header of the html.
And when the user navigates away of the page, in the return of the useEffect that will be executed when this component/page is unmounted, delete the script in question. (This only if we are not in a mobile device, this could be detected in a very simple way at the moment through the screen width.)

useEffect(() => {
  const contentOfTheScriptToInsert = fetchToLoadScript()

  // Insert the script into the head of the document here...

  return () => {
    // Remove the script from the head of the document here...
  };
}, []);

ekraffmiller · 2024-06-11T16:51:58Z

beta.dataverse.org has been updated with a robots.txt to allow all, so now https:/beta.dataverse.org is being crawled successfully, but individual dataset pages are not being indexed by Google. See this page for the Rich Results test: https://search.google.com/test/rich-results/result?id=XS1bhHFD7CEtXP5vHMIxog. Putting it back in This Sprint for further investigation, since it's a lower priority for Q2.

g-saracca · 2024-07-03T14:33:13Z

Moving it to the backlog due to a problem with the server configuration for the SPA redirection.
Currently when entering directly to a SPA url other than the main /spa/ it is returning the index.html document but with a 404.
This is because of web.xml located on frontend repo under deployments/payara/ is handling urls that dont belong to an actual file or folder as an error page and returning index.html with a 404 Not Found page status, making it not crawlable.

  <error-page>
    <error-code>404</error-code>
    <location>/index.html</location>
  </error-page>

This problem must be solved in order to return to this issue.

cmbz · 2024-07-10T22:06:33Z

2024/07/10

Removing the On Hold status and moving back to SPA classification

ekraffmiller added pm.GREI-d-2.7.1 NIH, yr2, aim7, task1: R&D UI modules for creating datasets and supporting publishing workflows pm.GREI-d-2.7.2 NIH, yr2, aim7, task2: Implement UI modules for creating datasets and publishing workflows labels Mar 19, 2024

ekraffmiller added this to IQSS Dataverse Project Mar 19, 2024

ekraffmiller removed this from IQSS Dataverse Project Mar 19, 2024

ekraffmiller added this to IQSS Dataverse Project Mar 19, 2024

ekraffmiller added the SPA: Dataset page (View) label Mar 19, 2024

pdurbin mentioned this issue May 7, 2024

Project: Kaggle (Croissant) IQSS/dataverse-pm#163

Open

12 tasks

g-saracca added the Size: 10 A percentage of a sprint. 7 hours. label May 22, 2024

GPortas moved this to SPRINT READY in IQSS Dataverse Project May 22, 2024

g-saracca moved this from SPRINT READY to This Sprint 🏃‍♀️ 🏃 in IQSS Dataverse Project May 22, 2024

ekraffmiller self-assigned this May 23, 2024

ekraffmiller moved this from This Sprint 🏃‍♀️ 🏃 to In Progress 💻 in IQSS Dataverse Project May 23, 2024

This was referenced May 24, 2024

render Json-LD in Dataset Page #412

Draft

Customize robots.txt in dockerized Dataverse IQSS/dataverse#10593

Open

ekraffmiller moved this from In Progress 💻 to This Sprint 🏃‍♀️ 🏃 in IQSS Dataverse Project Jun 11, 2024

ekraffmiller removed their assignment Jun 11, 2024

cmbz added the FY24 Sprint 26 FY24 Sprint 26 label Jun 20, 2024

GPortas moved this from This Sprint 🏃‍♀️ 🏃 to SPRINT READY in IQSS Dataverse Project Jun 20, 2024

GPortas moved this from SPRINT READY to This Sprint 🏃‍♀️ 🏃 in IQSS Dataverse Project Jun 20, 2024

cmbz added the GREI Re-arch GREI re-architecture-related label Jun 20, 2024

g-saracca moved this from This Sprint 🏃‍♀️ 🏃 to In Progress 💻 in IQSS Dataverse Project Jul 2, 2024

g-saracca self-assigned this Jul 2, 2024

g-saracca moved this from In Progress 💻 to SPRINT READY in IQSS Dataverse Project Jul 3, 2024

g-saracca moved this from SPRINT READY to Waiting ⌛ in IQSS Dataverse Project Jul 3, 2024

g-saracca added the Waiting label Jul 3, 2024

g-saracca moved this from Waiting ⌛ to SPRINT READY in IQSS Dataverse Project Jul 3, 2024

g-saracca moved this from SPRINT READY to Waiting ⌛ in IQSS Dataverse Project Jul 3, 2024

g-saracca removed their assignment Jul 3, 2024

cmbz removed the status in IQSS Dataverse Project Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Schema.org or Croissant metadata to header of Dataset view page #350

Add Schema.org or Croissant metadata to header of Dataset view page #350

ekraffmiller commented Mar 19, 2024 •

edited by g-saracca

Loading

pdurbin commented May 7, 2024

g-saracca commented May 22, 2024 •

edited

Loading

ekraffmiller commented Jun 11, 2024

g-saracca commented Jul 3, 2024

cmbz commented Jul 10, 2024

Add Schema.org or Croissant metadata to header of Dataset view page #350

Add Schema.org or Croissant metadata to header of Dataset view page #350

Comments

ekraffmiller commented Mar 19, 2024 • edited by g-saracca Loading

pdurbin commented May 7, 2024

g-saracca commented May 22, 2024 • edited Loading

ekraffmiller commented Jun 11, 2024

g-saracca commented Jul 3, 2024

cmbz commented Jul 10, 2024

ekraffmiller commented Mar 19, 2024 •

edited by g-saracca

Loading

g-saracca commented May 22, 2024 •

edited

Loading