-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automate updates to guidance search sitemap #3804
Comments
Knowledge sharing Sitemaps are a way to highlight pages and content that are being skipped for whatever reason. Our search.gov data is boosted by data we feed it through sitemap files, though those sitemaps don't actually go to our website—they're only fed into search.gov for its purposes. Sitemaps don't include information about files other than url and, optionally, a date, change frequency, and/or importance. There are no titles, descriptions, keywords, etc. /sitemap.xml is the only sitemap file name / path that matters—the standard location for if one exists. Sitemaps can link to other sitemaps. My original goal was to have /sitemap.xml only include links to other sitemap files, and those sitemaps would be named/divided however made most sense to we human reviewers. Discovery (@dorothyyeager @kathycarothers @djgarr @zsmith-fec) For Guidance, For other PDFs Would there be a use for a kind of admin interface to update data that gets included in sitemaps? e.g. URL-date pairs for resources that live outside Wagtail If we're going to automate some sitemaps, we could assign a priority to some pages that are already known but whose importance we'd like to emphasize. Possible complications Right now, sitemap_xml and sitemap_html are used for search.gov's data. Would search.gov scream if we renamed those files or made it so they're not, say, saved on our local drives, but generated by the website itself? @patphongs ? |
For Guidance,
For other PDFs
|
Answers in bold for @rfultz
For other PDFs
Would there be a use for a kind of admin interface to update data that gets included in sitemaps? e.g. URL-date pairs for resources that live outside Wagtail If we're going to automate some sitemaps, we could assign a priority to some pages that are already known but whose importance we'd like to emphasize. |
Summary
What we are after:
As a user, I want to make sure the latest version of a guidance document appears in the results for the sitemap. And as a content team person, it's hard to remember to update the sitemap manually each time a revised Form or Guide PDF is updated and uploaded.
Background: Right now the guidance docs have two sitemaps - one for html files and one for PDF files. Whenever a file is altered, we need to manually edit the code for the sitemap and then re-upload the code into Wagtail. Since this happens on an occasional and not regular basis, we risk the content team forgetting to do this step whenever it uploads a new or replacement PDF or edits an html page that is included in the guidance search.
Related issues
#3793 - Update guidance sitemap for updated date of one of the documents (example of what the content team has to remember to do each time)
How tos: https://docs.google.com/document/d/1hfvlYhGVNF0Km5vrAYtJak2cdAYLYQPbiPUfLxbRN1E/edit#
Completion criteria
Tech steps or considerations (optional)
List any considerations the tech team should know. Additionally, any specific tech steps can be included here.
The text was updated successfully, but these errors were encountered: