Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ci): calculate parallel jobs based on infrastructure needs #1475

Merged
merged 5 commits into from
Aug 24, 2022

Conversation

heitorlessa
Copy link
Contributor

Issue number: #1435

Summary

This addresses the unbalance work split across CPU Cores to coordinate CloudFormation stacks seen on: https://github.com/awslabs/aws-lambda-powertools-python/actions/runs/2917670742

Successful test after the change: https://github.com/awslabs/aws-lambda-powertools-python/actions/runs/2919001216

Background

The issue was that we automatically detected the number of CPU Cores and distributed jobs across them. It worked fine when we had a single interpreter when running locally. When we introduced a matrix CI with multiple interpreters, we had a mismatch of job queue per core - here's the scenario:

  • Pytest-xdist detects 2 Core available in the GitHub Actions Runner container
  • Worker 1 and Worker 2 are spawned for interpreter Python 3.7, 3.8, and 3.9
  • Worker 1 receives Lambda Layer Stack infra job to deploy and block any extra work on Worker 1 and 2 to proceed
  • Lambda Layer Stack completes and additional jobs in the queue are released for Worker 1 and Worker 2
  • Worker 1 completes all of its jobs (Logger, Metrics) for Python 3.7 and believe it's fine to scale down
  • However..... Worker 1 receives a new job (Tracer stack) as soon as it tries to delete Lambda Layer Stack (mistaken by test session being complete)

This causes the last jobs to fail for each additional interpreter because Layer stacks gets deleted and they can't complete creating Lambda Function, since the Layer no longer exists.

Changes

Please provide a summary of what's being changed

I created parallel_run_e2e.py to calculate how many infrastructure jobs must be scheduled per worker regardless of how many CPU Cores a machine has. It uses our infrastructure.py convention - 1 (root e2e).

I've also included tox.ini should we ever run into a similar situation with multiple interpreters and need to reproduce it locally. I didn't add dependencies but added notes on how to run if anyone ever needs it.

User experience

Please share what the user experience looks like before and after this change

Checklist

If your change doesn't seem to apply, please leave them unchecked.

Is this a breaking change?

RFC issue number:

Checklist:

  • Migration process documented
  • Implement warnings (if it can live side by side)

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Disclaimer: We value your time and bandwidth. As such, any pull requests created on non-triaged issues might not be successful.

@heitorlessa heitorlessa requested a review from a team as a code owner August 24, 2022 13:30
@heitorlessa heitorlessa requested review from leandrodamascena and removed request for a team August 24, 2022 13:30
@boring-cyborg boring-cyborg bot added github-actions Pull requests that update Github_actions code internal Maintenance changes labels Aug 24, 2022
@pull-request-size pull-request-size bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 24, 2022
@github-actions github-actions bot added the bug Something isn't working label Aug 24, 2022
@heitorlessa heitorlessa requested review from rubenfonseca and removed request for leandrodamascena August 24, 2022 13:31
@heitorlessa heitorlessa merged commit 48b504f into develop Aug 24, 2022
@heitorlessa heitorlessa deleted the fix/parallel-lock-gh-actions branch August 24, 2022 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working github-actions Pull requests that update Github_actions code internal Maintenance changes size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants