Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-official URLs have been indexed by search engines #4237

Closed
1 task
Tracked by #154 ...
rfultz opened this issue Dec 3, 2020 · 1 comment
Closed
1 task
Tracked by #154 ...

Non-official URLs have been indexed by search engines #4237

rfultz opened this issue Dec 3, 2020 · 1 comment
Assignees
Milestone

Comments

@rfultz
Copy link
Contributor

rfultz commented Dec 3, 2020

Summary

Our cloud.gov URLs have been indexed by Google. They shouldn't be.

  • Seeing listings for non-official-looking URLs can lower confidence in us, our information
  • It dilutes our SEO—we're basically competing with ourselves with identical content
  • As those cloud.gov URLs change, those pages will 404

Possible solutions

  • Use Search Console to tell Google to remove those results and Webmaster Tools for Bing
    • Good: would be a fairly quick result
    • Drawbacks: This doesn't affect other search engines and I don't know how long the removal would stick—a month? a year?
  • Adjust robots.txt file based on hostname
    • Good: pretty quick, would work with every search engine
    • Drawbacks: Not every search engine honors the robots.txt file, but that's the case with all of these options. Could be complicated.
  • Change "noindex" meta tags based on host name
    • Good: Pretty solid solution
    • Drawbacks: I'm not sure how complicated it may be for Django, etc., to adjust page content based on current hostname when the environment is still 'prod'
  • Do a redirect from .cloud.gov/ to www.fec.gov/* if the agent is a search engine or social media
    • Good: Seems pretty solid, except…
    • Drawbacks: Not sure it's possible. We'd need to maintain the list of search engine (and social media) user agents.
  • Add canonical meta tags
    • Good: may be simple to implement, would address social media shares, too
    • Bad: ?

Considerations

  • Are there any files that we want to stay available or indexed at the cloud.gov url?
  • Do we need to update the og: and twitter: tags, too? I'm pretty sure they both honor the robots.txt file
  • After we've updated tags, I'd like to tell Google and Bing to re-crawl the site
  • Check our other cloud.gov domains to see where else we should apply this
  • How would each approach affect sharing content through social media? I wouldn't want to dump those clicks into a 404. This item feels like it could be a different ticket, like how to handle when people click a link to a cloud.gov page from a non-cloud.gov page? (I wouldn't want to send someone to www if they're intentionally working inside fec-*.cloud.gov

Screenshot

image.png

Completion criteria:

  • Old production routes, stage, and dev routes are now removed from search engines
@rfultz
Copy link
Contributor Author

rfultz commented Jan 26, 2021

After research and a few emails, need to wait for changes and requests to go into effect. Edited #4338 to include following up on this ticket. Research and tracking are at https://docs.google.com/spreadsheets/d/1UfawtDX7M6CNmdv_PeiFTr5Nar_VVtFdCAeyJOSWDO0/edit?usp=sharing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants