Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(theme): use JSON-LD instead of microdata for blog structured data #9669

Merged
merged 23 commits into from
Feb 15, 2024

Conversation

johnnyreilly
Copy link
Contributor

@johnnyreilly johnnyreilly commented Dec 26, 2023

Pre-flight checklist

Motivation

I originally contributed Structured Data support for blog posts back in 2021: #5322

@lex111 subsequently submitted a PR to migrate the approach to use microdata instead: #5355

I had reservations which I voiced at the time, but left it at that. Since then time I've had something of a baptism of fire around the world of SEO. And consequently I've been working with some excellent folk in the SEO industry to improve my own ranking. A thing that comes up repeatedly is a suggestion to use JSON-LD instead of microdata as that is what Google prefers: https://developers.google.com/search/docs/appearance/structured-data/intro-structured-data#supported-formats

In general, Google recommends using JSON-LD for structured data if your site's setup allows it, as it's the easiest solution for website owners to implement and maintain at scale (in other words, less prone to user errors).

I raised #9274 to discuss this and received some good feedback.

I've now implemented JSON-LD support for the blog; both individual posts and the blog listing page. With this change in place, it's now possible to separately configure the Structured Data through swizzling the two new components:

  • BlogListPage/StructuredData
  • BlogPostPage/StructuredData

From @Josh-Cena:

Swizzability does seem desirable. I also wonder if there are cases in the wild where people swizzle blog component and inadvertently broke microdata. This sounds reasonable to me.

The default behaviour for these components is to produce JSON-LD structured data that aligns with the Schema.org and Google's Rich Results guidelines.

Let's talk for a moment about each of these components.

BlogListPage/StructuredData

This component is responsible for generating the Structured Data for the blog list page. It renders JSON-LD structured data that aligns with the https://schema.org/Blog schema. (Please note the examples at the bottom of the page which this implementation aligns with.)

BlogPostPage/StructuredData

This component is responsible for generating the Structured Data for the blog post page. It renders JSON-LD structured data that aligns with the https://schema.org/BlogPosting schema. (Please note the examples at the bottom of the page which this implementation aligns with.)

The BlogPosting schema is one of the structured data types that Google explicitly supports for Rich Results: https://developers.google.com/search/docs/appearance/structured-data/article#structured-data-type-definitions

All the Google-supported properties are included in the Structured Data generated by this component apart from dateModified which is optional. A number of other properties documented in the BlogPosting schema are included as well.

Test Plan

I will use the pull request preview on this PR to demonstrate that the Structured Data is generated as expected. I will also use the Structured Data Testing Tools to verify that the Structured Data is valid:

Expect screenshots to be added to this PR.

Test links

Deploy preview: https://deploy-preview-9669--docusaurus-2.netlify.app/

BlogListPage/StructuredData

If we go to the test preview of the /blog page: https://deploy-preview-9669--docusaurus-2.netlify.app/blog

We can validate with schema.org that the Blog structured data is valid: https://validator.schema.org/#url=https%3A%2F%2Fdeploy-preview-9669--docusaurus-2.netlify.app%2Fblog

image

BlogPostPage/StructuredData

If we go to the test preview of the /blog/releases/2.4/ page: https://deploy-preview-9669--docusaurus-2.netlify.app/blog/releases/2.4/

We can validate with schema.org that the BlogPosting structured data is valid: https://validator.schema.org/#url=https%3A%2F%2Fdeploy-preview-9669--docusaurus-2.netlify.app%2Fblog%2Freleases%2F2.4%2F

image

And we can also test this type with the Rich Results tool: https://search.google.com/test/rich-results

image

You can also see this in the Ahrefs Chrome extension: https://chromewebstore.google.com/detail/ahrefs-seo-toolbar-on-pag/hgmoccdbjhknikckedaaebbpdeebhiei?pli=1

image

Related issues/PRs

#9274

@facebook-github-bot facebook-github-bot added the CLA Signed Signed Facebook CLA label Dec 26, 2023
Copy link

netlify bot commented Dec 26, 2023

[V2]

Built without sensitive environment variables

Name Link
🔨 Latest commit e0da5cf
🔍 Latest deploy log https://app.netlify.com/sites/docusaurus-2/deploys/658a901701f7a80008a486f9
😎 Deploy Preview https://deploy-preview-9669--docusaurus-2.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

netlify bot commented Dec 26, 2023

[V2]

Name Link
🔨 Latest commit 96073e8
🔍 Latest deploy log https://app.netlify.com/sites/docusaurus-2/deploys/65ce26f897d19b0008565473
😎 Deploy Preview https://deploy-preview-9669--docusaurus-2.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

github-actions bot commented Dec 26, 2023

⚡️ Lighthouse report for the deploy preview of this PR

URL Performance Accessibility Best Practices SEO PWA Report
/ 🟠 66 🟢 98 🟢 96 🟢 100 🟠 88 Report
/docs/installation 🟢 90 🟢 96 🟢 100 🟢 100 🟠 88 Report
/docs/category/getting-started 🟠 77 🟢 100 🟢 100 🟢 90 🟠 88 Report
/blog 🟠 71 🟢 100 🟢 100 🟢 90 🟠 88 Report
/blog/preparing-your-site-for-docusaurus-v3 🟠 66 🟢 96 🟢 100 🟢 100 🟠 88 Report
/blog/tags/release 🟠 70 🟢 100 🟢 100 🟠 80 🟠 88 Report
/blog/tags 🟠 77 🟢 100 🟢 100 🟢 90 🟠 88 Report

@johnnyreilly
Copy link
Contributor Author

Hi @Josh-Cena and @slorber!

I was wondering if there were any thoughts about this PR? There's been no comments on it and so I'm not sure if you're aware it is here? I've been checking back every week or so for a while but there appears to be no activity.

It's possible you're not interested in the PR - if so would you be able to let me know and I'll close it for tidiness sake?

Copy link
Collaborator

@Josh-Cena Josh-Cena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry! I'm indeed aware of this. However my repo access isn't renewed so there isn't much I can do. If you've tested it yourself and it works, I'm personally happy to try it out and improve it where necessary.

@johnnyreilly
Copy link
Contributor Author

Yeah this PR was a Christmas project for me - I think it's a really good piece of work actually! (Of course I'm biased 😀)

I think it puts the structured data story of Docusaurus in a really great place as it offers a really good default JSON-LD structured data story and freedom for users to straightforwardly control the structured data produced through the magic of swizzling. (In fact if they wanted to they could easily use the same mechanism to stop producing structured data)

If you've tested it yourself and it works, I'm personally happy to try it out and improve it where necessary

I have indeed and I'm happy to take feedback to improve it as necessary.

Copy link
Collaborator

@slorber slorber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that seems reasonable to use the solution recommended by Google 👍

Review:

  • I'd like to get rid of the 2 meta attributes you added
  • We can probably reduce code duplication

@johnnyreilly
Copy link
Contributor Author

Thanks for the review @slorber - useful points, will address them soon!

@johnnyreilly johnnyreilly requested a review from slorber February 9, 2024 10:21
Copy link
Collaborator

@slorber slorber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional changes requested and a few questions

If we merge this, should this be considered as a breaking change? 🤷‍♂️

packages/docusaurus-plugin-content-blog/src/index.ts Outdated Show resolved Hide resolved
Comment on lines 16 to 18
// We're using dangerouslySetInnerHTML because we want to avoid React
// transforming quotes into " which upsets parsers.
// The entire contents is a stringified JSON object so it is safe
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain how to reproduce that problem here?

Was the code we documented before affected by any issue?

        <script type="application/ld+json">
          {JSON.stringify({
            '@context': 'https://schema.org/',
            '@type': 'Organization',
            name: 'Meta Open Source',
            url: 'https://opensource.fb.com/',
            logo: 'https://opensource.fb.com/img/logos/Meta-Open-Source.svg',
          })}
        </script>

Can you show side-by-side examples in a repro, before/after, rendering differently in practice? And explain how it upsets parsers?

Copy link
Contributor Author

@johnnyreilly johnnyreilly Feb 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the code we documented before affected by any issue?

Yes.

So this was a curious one. The issue surfaces in the Google Search Console, and relates to the unsuccessful parsing of the inner JSON when it is directly rendered internally to the <script type="application/ld+json"> element. The Google Search Console sends a notification asking you to fix this issue:

Parsing error: Missing '}' or object member name.

image

This happens because by not using the dangerouslySetInnerHTML approach, the " characters in the JSON-LD are rendered as &quot; - which is not valid JSON. So something like this:

<script type="application/ld+json">
  {
    &quot;@context&quot;: &quot;https://schema.org/&quot;,
    &quot;@type&quot;: &quot;Organization&quot;,
    &quot;name&quot;: &quot;Meta Open Source&quot;,
    &quot;url&quot;: &quot;https://opensource.fb.com/&quot;,
    &quot;logo&quot;: &quot;https://opensource.fb.com/img/logos/Meta-Open-Source.svg&quot;
  }
</script>

Rather than:

<script type="application/ld+json">
  {
    "@context": "https://schema.org/",
    "@type": "Organization",
    "name": "Meta Open Source",
    "url": "https://opensource.fb.com/",
    "logo": "https://opensource.fb.com/img/logos/Meta-Open-Source.svg"
  }
</script>

Curiously, Google will sometimes parse the &quot; style successfully. But more often it won't (TBH I'm surprised it ever succeeds). When I migrated to the dangerouslySetInnerHTML approach instead it always parsed successfully and this fixed the issue being logged in the Google Search Console:

image

For reference, this is when I implemented the fix on my own site: https://github.com/johnnyreilly/blog.johnnyreilly.com/pull/664/files#diff-c2bd2d1e0092d85d7acaff15ce9223d0202ef706c2497f7500b1a24db9bc0366

website/docs/seo.mdx Outdated Show resolved Hide resolved
@slorber slorber added the pr: polish This PR adds a very minor behavior improvement that users will enjoy. label Feb 10, 2024
@slorber slorber changed the title feat: JSON-LD structured data implementation for blog refactor(theme): use JSON-LD instead of microdata for blog structured data Feb 10, 2024
@johnnyreilly
Copy link
Contributor Author

johnnyreilly commented Feb 10, 2024

If we merge this, should this be considered as a breaking change? 🤷‍♂️

No - I can't think of any reason why it would be

Some additional changes requested and a few questions

Cool - I've addressed these. See my responses above!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Signed Facebook CLA pr: polish This PR adds a very minor behavior improvement that users will enjoy.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: migrate blog structured data back to JSON-LD
5 participants