Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty sitemap if a taxonomy is removed #14981

Closed
1 task done
benjaminrancourt opened this issue Jun 18, 2022 · 3 comments · Fixed by #15571
Closed
1 task done

Empty sitemap if a taxonomy is removed #14981

benjaminrancourt opened this issue Jun 18, 2022 · 3 comments · Fixed by #15571
Labels
bug [triage] something behaving unexpectedly Hacktoberfest Issues suitable for hacktoberfest participants help wanted [triage] Ideal issues for contributors to help with

Comments

@benjaminrancourt
Copy link
Contributor

Issue Summary

By deleting a taxonomy from the routes.yml file, the deleted taxonomies pages are not generated, but their respective sitemap are. They are empty, but some tools like Google Search Console warn users that their sitemap is wrong because it's empty.

Example of my website, where I've disabled the authors taxonomy.

<tr>
  <td>
    <a href="https://www.benjaminrancourt.ca/sitemap-authors.xml">
      https://www.benjaminrancourt.ca/sitemap-authors.xml
   </a>
 </td>
  <td>1970-01-01 00:00</td>
</tr>

image

The sitemap can be read, but it contains errors | Empty sitemap

image


I tried to find a way to fix this in Ghost source code, but I couldn't run a working setup on my machine (I really should install Linux... 🙈).

However, I'll paste my notes I've took while looking into this bug, in the hope that it will at least help you.

core/frontend/services/sitemap/manager.js

    // [Ben] The index sitemap is generated here
    getIndexXml() {
        return this.index.getXml();
    }

    createIndexGenerator(options) {
        // [Ben] Solution 1: If some taxonomies are disabled, can we remove them from the options below?
        return new IndexMapGenerator({
            types: {
                pages: this.pages,
                posts: this.posts,
                authors: this.authors,
                tags: this.tags
            },
            maxPerPage: options.maxPerPage
        });
    }

core/frontend/services/sitemap/index-generator.js

    generateSiteMapUrlElements() {
        // [Ben] We iterate over each resource type here
        return _.map(this.types, (resourceType) => {
            // `|| 1` = even if there are no items we still have an empty sitemap file
            const noOfPages = Math.ceil(Object.keys(resourceType.nodeLookup).length / this.maxPerPage) || 1;
            const pages = [];

            for (let i = 0; i < noOfPages; i++) {
                const page = i === 0 ? '' : `-${i + 1}`;
                const url = urlUtils.urlFor({relativeUrl: '/sitemap-' + resourceType.name + page + '.xml'}, true);
                const lastModified = resourceType.lastModified;

                // [Ben] Solution 2: For disabled taxonomies, I suspect their lastModified property is undefined.
                // Therefore, maybe we could not push this resource sitemap if it's the case?

                pages.push({
                    sitemap: [
                        {loc: url},
                        {lastmod: moment(lastModified).toISOString()}
                    ]
                });
            }

            return pages;
        }).flat();
    }

Thanks to the Ghost team!

Steps to Reproduce

  1. Disable a taxonomy like tags or authors (https://ghost.org/docs/themes/routing/#removing-taxonomies)
  2. Go to /sitemap.xml
  3. The disabled taxonomies will have an empty sitemap with a Last Modified date of 1970-01-01 00:00

Ghost Version

5.2.3

Node.js Version

v16.15.0

How did you install Ghost?

Docker

Database type

MySQL 8

Browser & OS version

Google Chrome | Windows 10

Relevant log / error output

No response

Code of Conduct

  • I agree to be friendly and polite to people in this repository
@github-actions github-actions bot added the needs:triage [triage] this needs to be triaged by the Ghost team label Jun 18, 2022
@github-actions
Copy link
Contributor

This issue is currently awaiting triage from @ErisDS. We're having a busy time right now, but we'll update this issue ASAP. If you have any more information to help us triage faster please leave us some comments. Thank you for understanding 🙂

@ErisDS ErisDS added the bug [triage] something behaving unexpectedly label Jul 26, 2022
@github-actions github-actions bot removed the needs:triage [triage] this needs to be triaged by the Ghost team label Jul 26, 2022
@ErisDS
Copy link
Member

ErisDS commented Jul 26, 2022

Hey there, thank you so much for the detailed bug report.

That does look like something that shouldn't happen! A PR to fix this issue would be very welcome 🙂

@ErisDS ErisDS added the help wanted [triage] Ideal issues for contributors to help with label Jul 26, 2022
@ErisDS ErisDS added the Hacktoberfest Issues suitable for hacktoberfest participants label Aug 15, 2022
@jbenezech
Copy link
Contributor

@ErisDS I'm submitting a PR for this one however

  1. Generating a taxonomy entry in the index sitemap even when it doesn't have any urls was done on purpose in this PR https://github.com/TryGhost/Ghost/pull/13698/files#diff-47edb0d155714257c72a2993b7480f660de663942214ed29002e4715ffcd1e2eR34
  2. Generating an empty sitemap for that taxonomy was also done on purpose in the same PR https://github.com/TryGhost/Ghost/pull/13698/files#diff-67aa668df4b26f67dd0cd14fa0f956472eca6b5dc6944566675b2f4ebdd71473R42 . But this implementation results in an invalid xml file (empty content)

I would think google reports an error because the file is not a valid xml, rather than because it has no entry. But I would guess other SEO tools might report at least a warning for an empty sitemap. Moreover I do not really see the point of generating an empty sitemap.
So, unless there was a proper reason for empty (but valid) sitemaps, I'm going for no entry in the index sitemap and 404 on the taxonomy sitemap.

jbenezech added a commit to jbenezech/Ghost that referenced this issue Oct 8, 2022
Closes TryGhost#14981
- Taxonomy-specific sitemaps were invalid xml when there was no data
- These invalid empty sitemaps were referenced in the index sitemap causing SEO tools to report errors
jbenezech added a commit to jbenezech/Ghost that referenced this issue Oct 10, 2022
Closes TryGhost#14981
- Taxonomy-specific sitemaps were invalid xml when there was no data
- These invalid empty sitemaps were referenced in the index sitemap causing SEO tools to report errors
ErisDS pushed a commit that referenced this issue Oct 12, 2022
closes: #14981

- Taxonomy-specific sitemaps were invalid xml when there was no data
- These invalid empty sitemaps were referenced in the index sitemap causing SEO tools to report errors
moosoul pushed a commit to stark-tech-space/ksnews-Ghost that referenced this issue Oct 12, 2022
closes: TryGhost#14981

- Taxonomy-specific sitemaps were invalid xml when there was no data
- These invalid empty sitemaps were referenced in the index sitemap causing SEO tools to report errors
sam-lord pushed a commit that referenced this issue Oct 17, 2022
closes: #14981

- Taxonomy-specific sitemaps were invalid xml when there was no data
- These invalid empty sitemaps were referenced in the index sitemap causing SEO tools to report errors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug [triage] something behaving unexpectedly Hacktoberfest Issues suitable for hacktoberfest participants help wanted [triage] Ideal issues for contributors to help with
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants