-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue #2841] Make the robots.txt dynamic so that it can mutate per environment #3379
Conversation
add7af3
to
917576d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed this builds dynamic locally and has the right cache control headers so that it should be stored in the CDN (isn't dynamic for each bot).
frontend/src/app/robots.ts
Outdated
export default function robots(): MetadataRoute.Robots { | ||
return { | ||
rules: [ | ||
process.env.AWS_ENVIRONMENT === "dev" // switching this temporarily to ensure that the variable is being set at AWS runtime as expected, will make it "prod" after confirming in Dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a big deal, but all the other env vars being used by the Next app are sourced out of https://github.com/HHS/simpler-grants-gov/blob/main/frontend/src/constants/environments.ts. In some way this is to avoid the potentially expensive access to process.env any more than necessary, but Next seems to not have much of a problem with that, so this is more of an organization thing than anything else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: made this comment not seeing the update
process.env is defined on the C side an exposed as a getter to the runtime. Every access of process.env needs to drop down to C to get the value out - that is the performance penalty.
Yeah, not sure what the penalty is here in real life but should use the same pattern of the cached versions in environments.ts.
@@ -7,6 +7,8 @@ locals { | |||
NEW_RELIC_ENABLED = "true" | |||
# see https://github.com/newrelic/node-newrelic?tab=readme-ov-file#setup | |||
NODE_OPTIONS = "-r newrelic" | |||
# expose the current AWS Env to the FE Next Node Server at Runtime | |||
AWS_ENVIRONMENT = var.environment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to call this AWS_ENVIRONMENT
vs ENVIRONMENT
? We would want to be able to test this locally, so the AWS_
prefix feels odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought, though not strongly held, was that it really only matters in AWS, you don't have to specify it locally cause no robots will ever hit us there, as opposed to other Env Vars where stuff wouldn't work without specifying it even when running locally. I was also trying to sus-out a name that wouldn't get confused with other reasons we might specify Environment, like NODE_ENV, other concepts of Production like our "Production" build we run in every AWS environment but not laptop environments, etc. But I'm happy to drop the AWS_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -31,4 +32,5 @@ export const environment: { [key: string]: string } = { | |||
FEATURE_OPPORTUNITY_OFF, | |||
FEATURE_SEARCH_OFF, | |||
NEXT_BUILD, | |||
AWS_ENVIRONMENT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above we are using NODE_ENV
to determine the legacy host, could use the actual env instead of NODE_ENV
for consistency since we are introducing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
1f2d3c3
to
3a279df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
Summary
Fixes #2841
Time to review: 5 mins
Changes proposed
We need to make the robots.txt dynamic per environment so that we can ban crawling of the lower environments but still support nuanced rules for upper environments.
Context for reviewers
As a follow up change we will also make the Sitemap dynamic as well since we don't want to point to Sitemaps outside of Production. Our Sitemap isn't currently ready for being utilized so not including it for now.
Additional information
Tested this locally and confirmed that we do have a dynamic robots.txt that reacts to changes to the environment variable at runtime, not build time. Not sure if this will all hook together as expected when it tries to run in AWS, so having this test for "dev" temporarily until we prove it is all likely to work in Prod.