Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improved sitemap #3579

Merged
merged 8 commits into from
Jun 16, 2022
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions .changeset/popular-cherries-float.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
'@astrojs/sitemap': minor
---

# Key features

- Split up your large sitemap into multiple sitemaps by custom limit.
- Ability to add sitemap specific attributes such as `lastmod` etc.
- Final output customization via JS function.
- Localization support.
- Automatically creates a link to sitemap in `<head>` section of generated pages.
- Reliability: all config options are validated.

## Important changes

The integration always generates at least two files instead of one:

- `sitemap-index.xml` - index file;
- `sitemap-{i}.xml` - actual sitemap.
1 change: 1 addition & 0 deletions examples/integrations-playground/astro.config.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,6 @@ import solid from '@astrojs/solid-js';

// https://astro.build/config
export default defineConfig({
site: 'https://example.com',
integrations: [lit(), react(), tailwind(), turbolinks(), partytown(), sitemap(), solid()],
});
207 changes: 206 additions & 1 deletion packages/integrations/sitemap/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,41 @@ export default {
}
```

Now, [build your site for production](https://docs.astro.build/en/reference/cli-reference/#astro-build) via the `astro build` command. You should find your sitemap under `dist/sitemap.xml`!
Now, [build your site for production](https://docs.astro.build/en/reference/cli-reference/#astro-build) via the `astro build` command. You should find your _sitemap_ under `dist/sitemap-index.xml` and `dist/sitemap-0.xml`!

Generated sitemap content for two pages website:

**sitemap-index.xml**

```xml
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://stargazers.club/sitemap-0.xml</loc>
</sitemap>
</sitemapindex>
```

**sitemap-0.xml**
<?xml version="1.0" encoding="UTF-8"?>

```xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:news="http://www.google.com/schemas/sitemap-news/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
<url>
<loc>https://stargazers.club/</loc>
</url>
<url>
<loc>https://stargazers.club/second-page/</loc>
</url>
</urlset>
```

All pages generated during build will contain in `<head>` section a link to sitemap:

```html
<link rel="sitemap" type="application/xml" href="/sitemap-index.xml">
```

You can also check our [Astro Integration Documentation][astro-integration] for more on integrations.

Expand Down Expand Up @@ -111,5 +145,176 @@ export default {
}
```

### entryLimit

Non-negative `Number` of entries per sitemap file. Default value is 45000. A sitemap index and multiple sitemaps are created if you have more entries. See explanation on [Google](https://developers.google.com/search/docs/advanced/sitemaps/large-sitemaps).

__astro.config.mjs__

```js
import sitemap from '@astrojs/sitemap';

export default {
site: 'https://stargazers.club',
integrations: [
sitemap({
entryLimit: 10000,
}),
],
}
```

### createLinkInHead

`Boolean`, default is `true`, create a link on sitemap in `<head>` section of generated pages.

__astro.config.mjs__

```js
import sitemap from '@astrojs/sitemap';

export default {
site: 'https://stargazers.club',
integrations: [
sitemap({
// disable create links to sitemap in <head>
createLinkInHead: false,
}),
],
}
```

### changefreq, lastmod, priority

`changefreq` - How frequently the page is likely to change. Available values: `always` \| `hourly` \| `daily` \| `weekly` \| `monthly` \| `yearly` \| `never`.

`priority` - The priority of this URL relative to other URLs on your site. Valid values range from 0.0 to 1.0.

`lastmod` - The date of page last modification.

`changefreq` and `priority` are ignored by Google.

See detailed explanation of sitemap specific options on [sitemap.org](https://www.sitemaps.org/protocol.html).


:exclamation: This integration uses 'astro:build:done' hook. The hook exposes only generated page paths. So with present version of Astro the integration has no abilities to analyze a page source, frontmatter etc. The integration can add `changefreq`, `lastmod` and `priority` attributes only in a batch or nothing.

__astro.config.mjs__

```js
import sitemap from '@astrojs/sitemap';

export default {
site: 'https://stargazers.club',
integrations: [
sitemap({
changefreq: 'weekly',
priority: 0.7,
lastmod: new Date('2022-05-28'),
}),
],
}
```

### serialize

Async or sync function called for each sitemap entry just before writing to disk.

It receives as parameter `SitemapItem` object which consists of `url` (required, absolute URL of page) and optional `changefreq`, `lastmod`, `priority` and `links` properties.

Optional `links` property contains a `LinkItem` list of alternate pages including a parent page.
`LinkItem` type has two required fields: `url` (the fully-qualified URL for the version of this page for the specified language) and `hreflang` (a supported language code targeted by this version of the page).

`serialize` function should return `SitemapItem`, touched or not.

__astro.config.mjs__

```js
import sitemap from '@astrojs/sitemap';

export default {
site: 'https://stargazers.club',
integrations: [
sitemap({
serialize(item) {
if (/special-page/.test(item.url)) {
item.changefreq = 'daily';
item.lastmod = new Date();
item.priority = 0.9;
}
return item;
},
}),
],
}
```

### i18n

To localize sitemap you should supply the integration config with the `i18n` option. The integration will check generated page paths on presence of locale keys in paths.

`i18n` object has two required properties:

- `defaultLocale`: `String`. Its value must exist as one of `locales` keys.
- `locales`: `Record<String, String>`, key/value - pairs. The key is used to look for a locale part in a page path. The value is a language attribute, only English alphabet and hyphen allowed. See more about language attribute on [MDN](https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/lang).


Read more about localization on Google in [Advanced SEO](https://developers.google.com/search/docs/advanced/crawling/localized-versions#all-method-guidelines).

__astro.config.mjs__

```js
import sitemap from '@astrojs/sitemap';

export default {
site: 'https://stargazers.club',
integrations: [
sitemap({
i18n: {
defaultLocale: 'en', // All urls that don't contain `es` or `fr` after `https://stargazers.club/` will be treated as default locale, i.e. `en`
locales: {
en: 'en-US', // The `defaultLocale` value must present in `locales` keys
es: 'es-ES',
fr: 'fr-CA',
},
},
}),
],
};
...

```

The sitemap content will be:

```xml
...
<url>
<loc>https://stargazers.club/</loc>
<xhtml:link rel="alternate" hreflang="en-US" href="https://stargazers.club/"/>
<xhtml:link rel="alternate" hreflang="es-ES" href="https://stargazers.club/es/"/>
<xhtml:link rel="alternate" hreflang="fr-CA" href="https://stargazers.club/fr/"/>
</url>
<url>
<loc>https://stargazers.club/es/</loc>
<xhtml:link rel="alternate" hreflang="en-US" href="https://stargazers.club/"/>
<xhtml:link rel="alternate" hreflang="es-ES" href="https://stargazers.club/es/"/>
<xhtml:link rel="alternate" hreflang="fr-CA" href="https://stargazers.club/fr/"/>
</url>
<url>
<loc>https://stargazers.club/fr/</loc>
<xhtml:link rel="alternate" hreflang="en-US" href="https://stargazers.club/"/>
<xhtml:link rel="alternate" hreflang="es-ES" href="https://stargazers.club/es/"/>
<xhtml:link rel="alternate" hreflang="fr-CA" href="https://stargazers.club/fr/"/>
</url>
<url>
<loc>https://stargazers.club/es/second-page/</loc>
<xhtml:link rel="alternate" hreflang="es-ES" href="https://stargazers.club/es/second-page/"/>
<xhtml:link rel="alternate" hreflang="fr-CA" href="https://stargazers.club/fr/second-page/"/>
<xhtml:link rel="alternate" hreflang="en-US" href="https://stargazers.club/second-page/"/>
</url>
...
```

[astro-integration]: https://docs.astro.build/en/guides/integrations-guide/
[astro-ui-frameworks]: https://docs.astro.build/en/core-concepts/framework-components/#using-framework-components
12 changes: 10 additions & 2 deletions packages/integrations/sitemap/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,20 +13,28 @@
},
"keywords": [
"astro-component",
"seo"
"seo",
"sitemap"
],
"bugs": "https://github.com/withastro/astro/issues",
"homepage": "https://astro.build",
"exports": {
".": "./dist/index.js",
"./package.json": "./package.json"
},
"files": [
"dist"
],
"scripts": {
"build": "astro-scripts build \"src/**/*.ts\" && tsc",
"build:ci": "astro-scripts build \"src/**/*.ts\"",
"dev": "astro-scripts dev \"src/**/*.ts\""
},
"dependencies": {},
"dependencies": {
"node-html-parser": "^5.3.3",
"sitemap": "^7.1.1",
"zod": "^3.17.3"
},
"devDependencies": {
"astro": "workspace:*",
"astro-scripts": "workspace:*"
Expand Down
1 change: 1 addition & 0 deletions packages/integrations/sitemap/src/constants.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
export const changefreqValues = ['always', 'hourly', 'daily', 'weekly', 'monthly', 'yearly', 'never'] as const;
55 changes: 55 additions & 0 deletions packages/integrations/sitemap/src/generate-sitemap.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import { SitemapItemLoose } from 'sitemap';

import type { SitemapOptions } from './index';
import { parseUrl } from './utils/parse-url';

const STATUS_CODE_PAGE_REGEXP = /\/[0-9]{3}\/?$/;

/** Construct sitemap.xml given a set of URLs */
export function generateSitemap(pages: string[], finalSiteUrl: string, opts: SitemapOptions) {
const { changefreq, priority: prioritySrc, lastmod: lastmodSrc, i18n } = opts || {};
// TODO: find way to respect <link rel="canonical"> URLs here
const urls = [...pages].filter((url) => !STATUS_CODE_PAGE_REGEXP.test(url));
urls.sort((a, b) => a.localeCompare(b, 'en', { numeric: true })); // sort alphabetically so sitemap is same each time

const lastmod = lastmodSrc?.toISOString();
const priority = typeof prioritySrc === 'number' ? prioritySrc : undefined;

const { locales, defaultLocale } = i18n || {};
const localeCodes = Object.keys(locales || {});

const getPath = (url: string) => {
const result = parseUrl(url, i18n?.defaultLocale || '', localeCodes, finalSiteUrl);
return result?.path;
};
const getLocale = (url: string) => {
const result = parseUrl(url, i18n?.defaultLocale || '', localeCodes, finalSiteUrl);
return result?.locale;
};

const urlData = urls.map((url) => {
let links;
if (defaultLocale && locales) {
const currentPath = getPath(url);
if (currentPath) {
const filtered = urls.filter((subUrl) => getPath(subUrl) === currentPath);
if (filtered.length > 1) {
links = filtered.map((subUrl) => ({
url: subUrl,
lang: locales[getLocale(subUrl)!],
}));
}
}
}

return {
url,
links,
lastmod,
priority,
changefreq, // : changefreq as EnumChangefreq,
} as SitemapItemLoose;
});

return urlData;
}
Loading