Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(gatsby-plugin-sitemap): omit assetPrefix from page URLs #32107

Closed
wants to merge 1 commit into from
Closed

fix(gatsby-plugin-sitemap): omit assetPrefix from page URLs #32107

wants to merge 1 commit into from

Conversation

nonAlgebraic
Copy link
Contributor

@nonAlgebraic nonAlgebraic commented Jun 24, 2021

Description

The sitemap plugin uses an internal function prefixPath to append a site's base URL, as well as its path prefix if one is set, to the URL of each serialized page in the sitemap. The path prefix is retrieved from the pathPrefix string available via the arguments to the onPostBuild hook. This string is a concatenation of the assetPrefix and pathPrefix configuration options. Currently, the sitemap plugin naively uses this string in generating sitemap page URLs, when in fact only the pathPrefix string should be part of a page URL in the sitemap and the assetPrefix should not.

This PR replaces the use of pathPrefix with basePath, which - unlike pathPrefix - does not include the value of assetPrefix.

Notes

This solution was suggested by @antdking on a previous version of this PR.

@gatsbot gatsbot bot added the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label Jun 24, 2021
@wardpeet wardpeet changed the title (gatsby-plugin-sitemap) omit assetPrefix from page URLs fix(gatsby-plugin-sitemap): omit assetPrefix from page URLs Jun 25, 2021
Copy link
Contributor

@moonmeister moonmeister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing the work to fix this. Below are a couple small nits and one we definately need to resolve. Also, I can't find any documentation on this basePath that you're using, can you point me to that? Thanks!

export function prefixPath({ url, siteUrl, pathPrefix = `` }) {
return new URL(pathPrefix + url, siteUrl).toString()

export function prefixPath({ url, siteUrl, basePath }) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may still need the default string here incase basePath isn't set.

Suggested change
export function prefixPath({ url, siteUrl, basePath }) {
export function prefixPath({ url, siteUrl, basePath = `` }) {

@@ -13,15 +13,16 @@ export const withoutTrailingSlash = path =>
/**
* @name prefixPath
*
* Properly handles prefixing relative path with site domain, Gatsby pathPrefix and AssetPrefix
* Properly handles prefixing relative path with site domain and Gatsby basePath
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should still say path Prefix as that's what all the docs refer to it as.

@@ -83,7 +83,7 @@ exports.onPostBuild = async (
}

const sitemapWritePath = path.join(`public`, output)
const sitemapPublicPath = path.posix.join(pathPrefix, output)
const sitemapPublicPath = path.posix.join(basePath, output)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be fixed but not in this way.

When using path and Asset prefixes this is rendering: https://mysite.com/cdn.example.com/foo/sitemap/sitemap-0.xml

This is wrong but I do believe it should still have the asset prefix and read: https://cdn.example.com/foo/sitemap/sitemap-0.xml

According to https://www.gatsbyjs.com/docs/how-to/previews-deploys-hosting/asset-prefix/ everything that's not HTML should be available on the assetPrefix path. That would include sitemaps.

There is DEFINITELY an issue as the URL is mangled, but I believe we need to keep the assetPrefix.

Do you agree?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized the thing that's probably mangaling this is the sitemap writer. It's expecting a path and getting an FQDN. short of fixing this upstream or re writing that implementation this may not be easy to fix. One option would be just to not use asset prefix anywhere. Meaning you current fix on this is fine but we'd need to mirror that change in gatsby-serve.js. I'd be interested if the core team has any thoughts on this.

@GriffinJohnston
Copy link

GriffinJohnston commented Sep 7, 2021

Just adding some additional color here. I have a site that is using assetPrefix to serve assets with Cloudfront. Because of this, the sitemap is only available at xxx.cloudfront.net, which prevents me from submitting the sitemap in Google Search Console since you can't use cross-domain sitemaps with GSC. That's a significant problem. At minimum the option should be available to ignore assetPrefix.

@ravindra-euphorika
Copy link

Yes, totally agree with @GriffinJohnston , there should be at least some way to ignore assetPrefix. as we are also going through same issue.

@KatiWinkler6612
Copy link

I would like to second both @GriffinJohnston and @ravindra-euphorika. I am experiencing the same issue and it's causing a slew of problems with our site and our optimization goals.

@moonmeister
Copy link
Contributor

moonmeister commented Nov 1, 2021

Happy to help get this across the line but @nonAlgebraic did not respond to my review. If someone wants to open a PR with this fix and address my review. Please do.

@LekoArts LekoArts removed the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label Nov 19, 2021
@LekoArts
Copy link
Contributor

Sadly I have to close this PR as the author deleted their branch (thus we can no longer push to it) and didn't answer yet. Please put up a new PR with these changes here + the suggested changes from @moonmeister - thanks!

@donaldboulton
Copy link

For now remove sitemap plugin after copying build of public /sitemap/sitemap-0.xml, then make the .xml pretty.

add <lastmod>2021-11-05T01:11:47.980Z</lastmod> to each link with your modified dates

as

<url> <loc>https://gatsbystarterbasicinstructions.gtsb.io/</loc> <lastmod>2021-11-15T01:11:47.980Z</lastmod> <changefreq>daily</changefreq> <priority>0.7</priority> </url>

Rename it to sitemap.xml and put it in your static folder.

Then link it somewhere in the head...

<Helmet> <link rel="sitemap" type="application/xml" title="Sitemap Index" href="/static/sitemap.xml" /> </Helmet>

I put in layout with

import { Helmet } from 'react-helmet'

if you really want proper indexing then build a sitemap-index and individual sitemap pages for pages, posts, tags and categories.

Example at Bibwoe

Adding the links for each sitemap in the head of your site as above.

Do not forget the sitemap.xsl

Then tell Google to index each sitemap not just sitemap.xml

Or someone could build a plugin to do the above. Sitemaps the correct way!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants