Skip to content

Commit

Permalink
feat: improved sitemap (#3579)
Browse files Browse the repository at this point in the history
* feat: extended sitemap functionality

* docs: del samples

* docs: readme

* feat: new sitemap

* feat: createLinkInHead removed

* docs: updated changeset text

* refactor: 'zod' function() instead of self made refine()

* Revert "refactor: 'zod' function() instead of self made refine()"

This reverts commit 036bac7.

undo function()
  • Loading branch information
alextim authored Jun 16, 2022
1 parent 44ba4e1 commit 1031c06
Show file tree
Hide file tree
Showing 15 changed files with 607 additions and 76 deletions.
18 changes: 18 additions & 0 deletions .changeset/popular-cherries-float.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
'@astrojs/sitemap': minor
---

# Key features

- Split up your large sitemap into multiple sitemaps by custom limit.
- Ability to add sitemap specific attributes such as `lastmod` etc.
- Final output customization via JS function.
- Localization support.
- Reliability: all config options are validated.

## Important changes

The integration always generates at least two files instead of one:

- `sitemap-index.xml` - index file;
- `sitemap-{i}.xml` - actual sitemap.
1 change: 1 addition & 0 deletions examples/integrations-playground/astro.config.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,6 @@ import solid from '@astrojs/solid-js';

// https://astro.build/config
export default defineConfig({
site: 'https://example.com',
integrations: [lit(), react(), tailwind(), turbolinks(), partytown(), sitemap(), solid()],
});
183 changes: 182 additions & 1 deletion packages/integrations/sitemap/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,35 @@ export default {
}
```

Now, [build your site for production](https://docs.astro.build/en/reference/cli-reference/#astro-build) via the `astro build` command. You should find your sitemap under `dist/sitemap.xml`!
Now, [build your site for production](https://docs.astro.build/en/reference/cli-reference/#astro-build) via the `astro build` command. You should find your _sitemap_ under `dist/sitemap-index.xml` and `dist/sitemap-0.xml`!

Generated sitemap content for two pages website:

**sitemap-index.xml**

```xml
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://stargazers.club/sitemap-0.xml</loc>
</sitemap>
</sitemapindex>
```

**sitemap-0.xml**
<?xml version="1.0" encoding="UTF-8"?>

```xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:news="http://www.google.com/schemas/sitemap-news/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
<url>
<loc>https://stargazers.club/</loc>
</url>
<url>
<loc>https://stargazers.club/second-page/</loc>
</url>
</urlset>
```

You can also check our [Astro Integration Documentation][astro-integration] for more on integrations.

Expand Down Expand Up @@ -111,5 +139,158 @@ export default {
}
```

### entryLimit

Non-negative `Number` of entries per sitemap file. Default value is 45000. A sitemap index and multiple sitemaps are created if you have more entries. See explanation on [Google](https://developers.google.com/search/docs/advanced/sitemaps/large-sitemaps).

__astro.config.mjs__

```js
import sitemap from '@astrojs/sitemap';

export default {
site: 'https://stargazers.club',
integrations: [
sitemap({
entryLimit: 10000,
}),
],
}
```

### changefreq, lastmod, priority

`changefreq` - How frequently the page is likely to change. Available values: `always` \| `hourly` \| `daily` \| `weekly` \| `monthly` \| `yearly` \| `never`.

`priority` - The priority of this URL relative to other URLs on your site. Valid values range from 0.0 to 1.0.

`lastmod` - The date of page last modification.

`changefreq` and `priority` are ignored by Google.

See detailed explanation of sitemap specific options on [sitemap.org](https://www.sitemaps.org/protocol.html).


:exclamation: This integration uses 'astro:build:done' hook. The hook exposes generated page paths only. So with present version of Astro the integration has no abilities to analyze a page source, frontmatter etc. The integration can add `changefreq`, `lastmod` and `priority` attributes only in a batch or nothing.

__astro.config.mjs__

```js
import sitemap from '@astrojs/sitemap';

export default {
site: 'https://stargazers.club',
integrations: [
sitemap({
changefreq: 'weekly',
priority: 0.7,
lastmod: new Date('2022-02-24'),
}),
],
}
```

### serialize

Async or sync function called for each sitemap entry just before writing to a disk.

It receives as parameter `SitemapItem` object which consists of `url` (required, absolute page URL) and optional `changefreq`, `lastmod`, `priority` and `links` properties.

Optional `links` property contains a `LinkItem` list of alternate pages including a parent page.
`LinkItem` type has two required fields: `url` (the fully-qualified URL for the version of this page for the specified language) and `hreflang` (a supported language code targeted by this version of the page).

`serialize` function should return `SitemapItem`, touched or not.

The example below shows the ability to add the sitemap specific properties individually.

__astro.config.mjs__

```js
import sitemap from '@astrojs/sitemap';

export default {
site: 'https://stargazers.club',
integrations: [
sitemap({
serialize(item) {
if (/your-special-page/.test(item.url)) {
item.changefreq = 'daily';
item.lastmod = new Date();
item.priority = 0.9;
}
return item;
},
}),
],
}
```

### i18n

To localize a sitemap you should supply the integration config with the `i18n` option. The integration will check generated page paths on presence of locale keys in paths.

`i18n` object has two required properties:

- `defaultLocale`: `String`. Its value must exist as one of `locales` keys.
- `locales`: `Record<String, String>`, key/value - pairs. The key is used to look for a locale part in a page path. The value is a language attribute, only English alphabet and hyphen allowed. See more about language attribute on [MDN](https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/lang).


Read more about localization on Google in [Advanced SEO](https://developers.google.com/search/docs/advanced/crawling/localized-versions#all-method-guidelines).

__astro.config.mjs__

```js
import sitemap from '@astrojs/sitemap';

export default {
site: 'https://stargazers.club',
integrations: [
sitemap({
i18n: {
defaultLocale: 'en', // All urls that don't contain `es` or `fr` after `https://stargazers.club/` will be treated as default locale, i.e. `en`
locales: {
en: 'en-US', // The `defaultLocale` value must present in `locales` keys
es: 'es-ES',
fr: 'fr-CA',
},
},
}),
],
};
...

```

The sitemap content will be:

```xml
...
<url>
<loc>https://stargazers.club/</loc>
<xhtml:link rel="alternate" hreflang="en-US" href="https://stargazers.club/"/>
<xhtml:link rel="alternate" hreflang="es-ES" href="https://stargazers.club/es/"/>
<xhtml:link rel="alternate" hreflang="fr-CA" href="https://stargazers.club/fr/"/>
</url>
<url>
<loc>https://stargazers.club/es/</loc>
<xhtml:link rel="alternate" hreflang="en-US" href="https://stargazers.club/"/>
<xhtml:link rel="alternate" hreflang="es-ES" href="https://stargazers.club/es/"/>
<xhtml:link rel="alternate" hreflang="fr-CA" href="https://stargazers.club/fr/"/>
</url>
<url>
<loc>https://stargazers.club/fr/</loc>
<xhtml:link rel="alternate" hreflang="en-US" href="https://stargazers.club/"/>
<xhtml:link rel="alternate" hreflang="es-ES" href="https://stargazers.club/es/"/>
<xhtml:link rel="alternate" hreflang="fr-CA" href="https://stargazers.club/fr/"/>
</url>
<url>
<loc>https://stargazers.club/es/second-page/</loc>
<xhtml:link rel="alternate" hreflang="es-ES" href="https://stargazers.club/es/second-page/"/>
<xhtml:link rel="alternate" hreflang="fr-CA" href="https://stargazers.club/fr/second-page/"/>
<xhtml:link rel="alternate" hreflang="en-US" href="https://stargazers.club/second-page/"/>
</url>
...
```

[astro-integration]: https://docs.astro.build/en/guides/integrations-guide/
[astro-ui-frameworks]: https://docs.astro.build/en/core-concepts/framework-components/#using-framework-components
11 changes: 9 additions & 2 deletions packages/integrations/sitemap/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,20 +13,27 @@
},
"keywords": [
"astro-component",
"seo"
"seo",
"sitemap"
],
"bugs": "https://github.com/withastro/astro/issues",
"homepage": "https://astro.build",
"exports": {
".": "./dist/index.js",
"./package.json": "./package.json"
},
"files": [
"dist"
],
"scripts": {
"build": "astro-scripts build \"src/**/*.ts\" && tsc",
"build:ci": "astro-scripts build \"src/**/*.ts\"",
"dev": "astro-scripts dev \"src/**/*.ts\""
},
"dependencies": {},
"dependencies": {
"sitemap": "^7.1.1",
"zod": "^3.17.3"
},
"devDependencies": {
"astro": "workspace:*",
"astro-scripts": "workspace:*"
Expand Down
5 changes: 5 additions & 0 deletions packages/integrations/sitemap/src/config-defaults.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
import type { SitemapOptions } from './index';

export const SITEMAP_CONFIG_DEFAULTS: SitemapOptions & any = {
entryLimit: 45000,
};
9 changes: 9 additions & 0 deletions packages/integrations/sitemap/src/constants.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
export const changefreqValues = [
'always',
'hourly',
'daily',
'weekly',
'monthly',
'yearly',
'never',
] as const;
55 changes: 55 additions & 0 deletions packages/integrations/sitemap/src/generate-sitemap.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import { SitemapItemLoose } from 'sitemap';

import type { SitemapOptions } from './index';
import { parseUrl } from './utils/parse-url';

const STATUS_CODE_PAGE_REGEXP = /\/[0-9]{3}\/?$/;

/** Construct sitemap.xml given a set of URLs */
export function generateSitemap(pages: string[], finalSiteUrl: string, opts: SitemapOptions) {
const { changefreq, priority: prioritySrc, lastmod: lastmodSrc, i18n } = opts || {};
// TODO: find way to respect <link rel="canonical"> URLs here
const urls = [...pages].filter((url) => !STATUS_CODE_PAGE_REGEXP.test(url));
urls.sort((a, b) => a.localeCompare(b, 'en', { numeric: true })); // sort alphabetically so sitemap is same each time

const lastmod = lastmodSrc?.toISOString();
const priority = typeof prioritySrc === 'number' ? prioritySrc : undefined;

const { locales, defaultLocale } = i18n || {};
const localeCodes = Object.keys(locales || {});

const getPath = (url: string) => {
const result = parseUrl(url, i18n?.defaultLocale || '', localeCodes, finalSiteUrl);
return result?.path;
};
const getLocale = (url: string) => {
const result = parseUrl(url, i18n?.defaultLocale || '', localeCodes, finalSiteUrl);
return result?.locale;
};

const urlData = urls.map((url) => {
let links;
if (defaultLocale && locales) {
const currentPath = getPath(url);
if (currentPath) {
const filtered = urls.filter((subUrl) => getPath(subUrl) === currentPath);
if (filtered.length > 1) {
links = filtered.map((subUrl) => ({
url: subUrl,
lang: locales[getLocale(subUrl)!],
}));
}
}
}

return {
url,
links,
lastmod,
priority,
changefreq, // : changefreq as EnumChangefreq,
} as SitemapItemLoose;
});

return urlData;
}
Loading

0 comments on commit 1031c06

Please sign in to comment.