From 6ec2607c5cd47b7efa8ba5931210a87dcfe73bf4 Mon Sep 17 00:00:00 2001 From: Chris Kirk Date: Fri, 22 Jun 2018 09:57:30 -0400 Subject: [PATCH] Adds sitemap docs (#78) --- README.md | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 86 insertions(+) diff --git a/README.md b/README.md index 9ad3102..17c8c3d 100644 --- a/README.md +++ b/README.md @@ -66,3 +66,89 @@ Once instantiated Amphora Search will begin managing it's own internal indices, ## Contributing Want a feature or find a bug? Create an issue or a PR and someone will get on it. + +## Sitemaps + +If the `sitemaps` configuration option is set to `true`, Amphora Search will automatically stream basic text and xml sitemaps at `/_sitemaps/sitemap.txt` and `/_sitemaps/sitemap.xml` using data from the `pages` index. + +By default these endpoints will show pages first published in the current year; you can specify a different year with the `year` query param e.g. `?year=2017`. + +### Custom Sitemap + +If you would like to customize your sitemaps, add a mapping and handler for a `sitemap-entries` index to your Clay installation. Amphora Search will automatically use this index to populate the sitemaps. For the XML sitemap, Amphora Search simply converts the Elasticsearch docs to XML using `xmljs`, so a mapping like this: + +``` +_doc: + properties: + loc: + type: keyword + site: + type: keyword + lastmod: + type: date +``` + +...might result in a document like this: + +``` +{ + "site": "foo", + "loc": "http://bar.com/some-article", + "lastmod": "2018-03-15T14:15:04.726Z" +} +``` + +...which would produce a sitemap entry like this: + +``` + + + http://bar.com/some-article + + 2018-03-15T14:15:04.726Z + +``` + +Note that the `site` mapping property is reserved for the slug of the site that the entry should appear in and doesn't appear in the resulting XML. + +Currently, there is no endpoint for listing all the sitemaps from each year. + +### News Sitemap + +If you create a mapping and handler for a `news-sitemap-entries` index, Amphora Search will also stream news sitemap entries at `/_sitemaps/news.xml` using data from that index. Per Google's guidelines, it will only stream articles from the last two days. + +Example mapping: + +``` +_doc: + properties: + site: + type: keyword + loc: + type: keyword + lastmod: + type: date + news:news: + type: object + dynamic: false + properties: + news:publication: + type: object + properties: + news:name: + type: keyword + news:language: + type: keyword + news:publication_date: + type: date + news:title: + type: keyword + news:language: + type: keyword + news:keywords: + type: keyword + news:tags: + type: keyword +``` + +Again, the `site` mapping property is reserved for the slug of the site that the entry should appear in and does not appear in the resulting XML.