Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚚 Convert Kube2IAM-based Airflow pod to using IRSA, and capture process #4319

Closed
jhpyke opened this issue May 13, 2024 · 5 comments · Fixed by #4316
Closed

🚚 Convert Kube2IAM-based Airflow pod to using IRSA, and capture process #4319

jhpyke opened this issue May 13, 2024 · 5 comments · Fixed by #4316
Assignees
Labels

Comments

@jhpyke
Copy link
Contributor

jhpyke commented May 13, 2024

Describe the bug.

The current Kube2IAM deployment does not have support for IMDSv2, which means any new nodes created cause Kube2IAM to fail, due to the new default being IMDSv2 mandated. Instead, we should be converting our pods to use IRSA, so they can successfully run under the new standard.

To Reproduce

  1. Launch a new node instance in Airflow without IMDSv2 optional
  2. Attempt to schedule a pod
  3. Watch Kube2IAM fail in pod logs to supply role

Expected Behaviour

User pods should recieve roles correctly with IMDSv2 enabled.

Additional context

AWS's guide on setting up IRSA for an EKS Cluster can be found here. We already have an OIDC provider for each cluster, so instead we should be looking to create a service account that is assumable by one of our pods, and prove that a DAG can be written that successfully picks it up.

Acceptance Criteria

  • A service account should be created that can assume the role used by our 'example' DAGs in dev.
  • examples.use_high_memory and examples.daily_test should be able to get roles using IRSA alone.
  • Any modifications required to the current role policies are known.
  • A writeup on what would be required to extend this process to all DAGs should be produced.

Out of Scope

  • Prod (for this ticket)
  • The work on the airflow repo required to change how the roles are created.
@jhpyke jhpyke added the bug Something isn't working label May 13, 2024
@jhpyke jhpyke moved this from 👀 TODO to 🚀 In Progress in Analytical Platform May 13, 2024
@jhpyke jhpyke self-assigned this May 13, 2024
@jhpyke
Copy link
Contributor Author

jhpyke commented May 13, 2024

Progress as of Sign-off:

  • PR Created with correct chart for release in dev
  • Plan shows changes to node groups that need to be backported into code before deployment to avoid disruption to other environments.
  • Updated chart adds support for IMDSv2, but due to unexpected changes in plan have not been able to test deployment.

Next Steps:

  • Backport changes into terraform until plan shows only changes to helm chart
  • Import existing release to ensure settings are retained correctly (note - no import block for helm charts so will temporarily block stack)
  • Make backup of values in existing chart (aws-vault into data-prod, run aws eks --region eu-west-1 update-kubeconfig --name airflow-<env> subbing for correct environment to auth with cluster, then helm -n kube2iam-system get all kube2iam to get the current config)
  • Deploy updated chart
  • Run examples.use_high_memory_node. If it fails the first time, this may be due to it scheduling faster than Kube2IAM. As long as it passes on a second try, then kube2IAM is behaving normally. If pods fail to schedule, then use kubectl logs -n kube2iam-system <name of kube2iam pod> to see the logs and diagnose what's not working.

@github-project-automation github-project-automation bot moved this from 🚀 In Progress to 🎉 Done in Analytical Platform May 16, 2024
@julialawrence julialawrence moved this from 🎉 Done to 🚀 In Progress in Analytical Platform May 22, 2024
@jhpyke jhpyke changed the title 🐞 Kube2IAM needs IMDSv2 support via Chart Update 🐞 Convert Kube2IAM-based Airflow pod to using IRSA, and capture process Jun 3, 2024
@Emterry Emterry mentioned this issue Jun 6, 2024
@AntFMoJ
Copy link
Contributor

AntFMoJ commented Jun 10, 2024

airflow-monitoring DAG successfully migrated from Kube2IAM to IRSA in airflow dev. Next step is to make changes to migrate the rest of the DAGs in dev.

@jacobwoffenden jacobwoffenden added story and removed bug Something isn't working labels Jun 10, 2024
@jacobwoffenden jacobwoffenden changed the title 🐞 Convert Kube2IAM-based Airflow pod to using IRSA, and capture process 🚚 Convert Kube2IAM-based Airflow pod to using IRSA, and capture process Jun 10, 2024
@jacobwoffenden jacobwoffenden self-assigned this Jun 13, 2024
@jacobwoffenden
Copy link
Member

APC OIDC added to APDP

@AntFMoJ
Copy link
Contributor

AntFMoJ commented Jun 27, 2024

IRSA has been successfully tested in multiple DAGs in Airflow Development, including a DAG owned by the CJS dashboard team. A test will be performed by COP 28/06 on a data engineering DAG in Airflow Development.

Update to mojap-airflow-tools required to allow IRSA to work in Airflow Prodiction.

Beginning to write up instructions, which will be added to the Airflow user guidance once it has been reviewed by Airflow users.

@jhpyke
Copy link
Contributor Author

jhpyke commented Jul 1, 2024

Todo: Update validation script to ensure that env variable matches environment folder used in airflow repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants