-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(gatsby-plugin-sitemap): omit assetPrefix from page URLs #32107
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing the work to fix this. Below are a couple small nits and one we definately need to resolve. Also, I can't find any documentation on this basePath
that you're using, can you point me to that? Thanks!
export function prefixPath({ url, siteUrl, pathPrefix = `` }) { | ||
return new URL(pathPrefix + url, siteUrl).toString() | ||
|
||
export function prefixPath({ url, siteUrl, basePath }) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you may still need the default string here incase basePath isn't set.
export function prefixPath({ url, siteUrl, basePath }) { | |
export function prefixPath({ url, siteUrl, basePath = `` }) { |
@@ -13,15 +13,16 @@ export const withoutTrailingSlash = path => | |||
/** | |||
* @name prefixPath | |||
* | |||
* Properly handles prefixing relative path with site domain, Gatsby pathPrefix and AssetPrefix | |||
* Properly handles prefixing relative path with site domain and Gatsby basePath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should still say path Prefix as that's what all the docs refer to it as.
@@ -83,7 +83,7 @@ exports.onPostBuild = async ( | |||
} | |||
|
|||
const sitemapWritePath = path.join(`public`, output) | |||
const sitemapPublicPath = path.posix.join(pathPrefix, output) | |||
const sitemapPublicPath = path.posix.join(basePath, output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be fixed but not in this way.
When using path and Asset prefixes this is rendering: https://mysite.com/cdn.example.com/foo/sitemap/sitemap-0.xml
This is wrong but I do believe it should still have the asset prefix and read: https://cdn.example.com/foo/sitemap/sitemap-0.xml
According to https://www.gatsbyjs.com/docs/how-to/previews-deploys-hosting/asset-prefix/ everything that's not HTML should be available on the assetPrefix path. That would include sitemaps.
There is DEFINITELY an issue as the URL is mangled, but I believe we need to keep the assetPrefix
.
Do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized the thing that's probably mangaling this is the sitemap writer. It's expecting a path and getting an FQDN. short of fixing this upstream or re writing that implementation this may not be easy to fix. One option would be just to not use asset prefix anywhere. Meaning you current fix on this is fine but we'd need to mirror that change in gatsby-serve.js
. I'd be interested if the core team has any thoughts on this.
Just adding some additional color here. I have a site that is using assetPrefix to serve assets with Cloudfront. Because of this, the sitemap is only available at xxx.cloudfront.net, which prevents me from submitting the sitemap in Google Search Console since you can't use cross-domain sitemaps with GSC. That's a significant problem. At minimum the option should be available to ignore assetPrefix. |
Yes, totally agree with @GriffinJohnston , there should be at least some way to ignore assetPrefix. as we are also going through same issue. |
I would like to second both @GriffinJohnston and @ravindra-euphorika. I am experiencing the same issue and it's causing a slew of problems with our site and our optimization goals. |
Happy to help get this across the line but @nonAlgebraic did not respond to my review. If someone wants to open a PR with this fix and address my review. Please do. |
Sadly I have to close this PR as the author deleted their branch (thus we can no longer push to it) and didn't answer yet. Please put up a new PR with these changes here + the suggested changes from @moonmeister - thanks! |
For now remove sitemap plugin after copying build of public /sitemap/sitemap-0.xml, then make the .xml pretty. add as
Rename it to sitemap.xml and put it in your static folder. Then link it somewhere in the head...
I put in layout with
if you really want proper indexing then build a sitemap-index and individual sitemap pages for pages, posts, tags and categories. Example at Bibwoe Adding the links for each sitemap in the head of your site as above. Do not forget the sitemap.xsl Then tell Google to index each sitemap not just sitemap.xml Or someone could build a plugin to do the above. Sitemaps the correct way! |
Description
The sitemap plugin uses an internal function
prefixPath
to append a site's base URL, as well as its path prefix if one is set, to the URL of each serialized page in the sitemap. The path prefix is retrieved from thepathPrefix
string available via the arguments to theonPostBuild
hook. This string is a concatenation of theassetPrefix
andpathPrefix
configuration options. Currently, the sitemap plugin naively uses this string in generating sitemap page URLs, when in fact only thepathPrefix
string should be part of a page URL in the sitemap and theassetPrefix
should not.This PR replaces the use of
pathPrefix
withbasePath
, which - unlikepathPrefix
- does not include the value ofassetPrefix
.Notes
This solution was suggested by @antdking on a previous version of this PR.