Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gatsby-plugin-sitemap always generates sitemap-index even on low page count #31282

Closed
ichik opened this issue May 6, 2021 · 14 comments · Fixed by #32113
Closed

gatsby-plugin-sitemap always generates sitemap-index even on low page count #31282

ichik opened this issue May 6, 2021 · 14 comments · Fixed by #32113
Labels
good first issue Issue that doesn't require previous experience with Gatsby help wanted Issue with a clear description that the community can help with. topic: plugins Related to plugin system, themes & catch-all for plugins that don't have a label type: documentation An issue or pull request for improving or updating Gatsby's documentation

Comments

@ichik
Copy link

ichik commented May 6, 2021

Description

According to documentation gatsby-plugin-sitemap should only generate sitemap-index when having more entries than entryLimit option

entryLimit (number = 45000) Number of entries per sitemap file, a sitemap index and multiple sitemaps are created if you have more entries.

With recent updates (4.0) this is not the case anymore. sitemap-index.xml is always generated, which is quite weird for small amount of entries. It is a different problem than the one being solved currently in #31167, but it existence seems related to recent activity.

Steps to reproduce

Branch with reproducible case: https://github.com/ichik/ichik.xyz/tree/sitemap-bug

Expected result

Link to sitemap.xml generated

Actual result

Link to sitemap-index.xml generated

Environment


  System:
    OS: macOS 11.3.1
    CPU: (4) x64 Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz
    Shell: 5.8 - /bin/zsh
  Binaries:
    Node: 14.16.1 - ~/.nvm/versions/node/v14.16.1/bin/node
    npm: 6.14.13 - ~/.nvm/versions/node/v14.16.1/bin/npm
  Languages:
    Python: 2.7.16 - /usr/bin/python
  Browsers:
    Chrome: 90.0.4430.93
    Firefox: 88.0
    Safari: 14.1
  npmPackages:
    gatsby: ^3.4.1 => 3.4.1 
    gatsby-plugin-humans-txt: ^1.1.4 => 1.1.4 
    gatsby-plugin-image: ^1.4.0 => 1.4.0 
    gatsby-plugin-manifest: ^3.4.0 => 3.4.0 
    gatsby-plugin-mdx: ^2.4.0 => 2.4.0 
    gatsby-plugin-netlify: ^3.4.0 => 3.4.0 
    gatsby-plugin-offline: ^4.4.0 => 4.4.0 
    gatsby-plugin-react-helmet: ^4.4.0 => 4.4.0 
    gatsby-plugin-robots-txt: ^1.5.6 => 1.5.6 
    gatsby-plugin-sharp: ^3.4.1 => 3.4.1 
    gatsby-plugin-sitemap: ^4.0.0 => 4.0.0 
    gatsby-plugin-styled-components: ^4.4.0 => 4.4.0 
    gatsby-plugin-typescript: ^3.4.0 => 3.4.0 
    gatsby-remark-images: ^5.1.0 => 5.1.0 
    gatsby-remark-unwrap-images: ^1.0.2 => 1.0.2 
    gatsby-source-filesystem: ^3.4.0 => 3.4.0 
    gatsby-transformer-sharp: ^3.4.0 => 3.4.0 
@ichik ichik added the type: bug An issue or pull request relating to a bug in Gatsby label May 6, 2021
@gatsbot gatsbot bot added the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label May 6, 2021
@LekoArts LekoArts added topic: sitemap and removed status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer labels May 6, 2021
@LekoArts
Copy link
Contributor

LekoArts commented May 6, 2021

I don't know if this was done on purpose @moonmeister?

We now use simpleSitemapAndIndex:

return simpleSitemapAndIndex({
hostname: siteUrl,
destinationDir: sitemapPath,
sourceData: serializedPages,
limit: entryLimit,
gzip: false,
})
}

According to https://github.com/ekalinin/sitemap.js/blob/4141469ca90e53f9f15b5107a25d16591175984f/README.md#create-sitemap-and-index-files-from-one-large-list:

If you know you are definitely going to have more than 50,000 urls in your sitemap, you can use this slightly more complex interface to create a new sitemap every 45,000 entries and add that file to a sitemap index.

So the behavior changed and thus it's currently working as intended but the docs then might be outdated if entryLimit is only affecting the splitting and not the creation of an index itself.

@moonmeister
Copy link
Contributor

Yeah the index file always exists. Good catch, a PR would be much appreciated to clarify this language.

@moonmeister moonmeister added type: documentation An issue or pull request for improving or updating Gatsby's documentation good first issue Issue that doesn't require previous experience with Gatsby and removed type: bug An issue or pull request relating to a bug in Gatsby labels May 6, 2021
@ichik
Copy link
Author

ichik commented May 6, 2021

I think that it's somewhat of an overkill to always generate index in case you might have over 50,000 entries. For a lot of gatsby projects out there previous behavior which resulted in a single sitemap.xml looked way more sensible.

@moonmeister
Copy link
Contributor

Yeah, It may be a little "cleaner", but at the end of the day I couldn't think why it really mattered. The bots don't care.

The reason it changed was because the default behavior of the library we use changed. I wasn't going to spend hours rewriting code to fix it.

If this is important to you, feel free to do the work and submit a PR.

@abheist
Copy link

abheist commented May 20, 2021

One more bug though: generated index file is containing wrong URLs to individual files:
https://abheist.com/sitemap/sitemap-index.xml:

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <sitemap>
        <loc>https://abheist.com/sitemap-0.xml</loc>
    </sitemap>
</sitemapindex>

loc link should be: https://abheist.com/sitemap/sitemap-0.xml, missing a sitemap directory.

@moonmeister
Copy link
Contributor

This has been reported and fixed in the next release.

@abheist
Copy link

abheist commented May 20, 2021

This has been reported and fixed in the next release.

Perfect, thanks!

@LekoArts LekoArts added topic: plugins Related to plugin system, themes & catch-all for plugins that don't have a label help wanted Issue with a clear description that the community can help with. and removed topic: sitemap labels May 28, 2021
@SomiDivian
Copy link

What's the news?

@prichey
Copy link
Contributor

prichey commented Jun 23, 2021

Related to this, the sitemap-index.xml file also is also mangling my siteUrl. In this example, it's set to https://mysite.com/foo.

sitemap-index.xml:

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://mysite.com/sitemap-0.xml</loc>
</sitemap>
</sitemapindex>

The 4. release seems to have lots of issues so I'm going to try downgrading until these are worked out

@moonmeister
Copy link
Contributor

@prichey Please open a separate issue for this and a reproduction or more detailed example of your configuration. and what is happening vs what you expect to happen.

@prichey
Copy link
Contributor

prichey commented Jun 23, 2021

@moonmeister #32080

@justsilencia
Copy link

Yeah, It may be a little "cleaner", but at the end of the day I couldn't think why it really mattered. The bots don't care.

The reason it changed was because the default behavior of the library we use changed. I wasn't going to spend hours rewriting code to fix it.

If this is important to you, feel free to do the work and submit a PR.

You mention that "the bots don't care." However, after doing some research, it seems that having the sitemap in a subfolder will only allow google to scan pages in that subfolder. Here's a reference to this:

Stackoverflow answer with link to google docs

Could you please explain if this is incorrect?

@Pandanaax
Copy link

I have multi domains and work with Gatsby 2, but since I move for Gatsby 4 my sitemap is like this :

https://inte.****.fr/https://inte.****.it/intimo-premaman
But i need only .it for work for this exemple.

Could you help me please.

@moonmeister
Copy link
Contributor

I have multi domains and work with Gatsby 2, but since I move for Gatsby 4 my sitemap is like this :

https://inte.****.fr/https://inte.****.it/intimo-premaman
But i need only .it for work for this exemple.

Could you help me please.

Hi @Pandanaax !

Sorry to hear you're running into an issue. To help us best begin debugging the underlying cause, it is incredibly helpful if you open a new issue. Please include a minimal reproduction and a copy of any relevant code/config or a link to the repository.

Thanks for using Gatsby! 💜

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Issue that doesn't require previous experience with Gatsby help wanted Issue with a clear description that the community can help with. topic: plugins Related to plugin system, themes & catch-all for plugins that don't have a label type: documentation An issue or pull request for improving or updating Gatsby's documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants