Skip to content
This repository has been archived by the owner on Jul 9, 2024. It is now read-only.

[FEAT] Distribution Strategy (WIP) #10

Open
ronnetzer opened this issue Jan 17, 2022 · 0 comments
Open

[FEAT] Distribution Strategy (WIP) #10

ronnetzer opened this issue Jan 17, 2022 · 0 comments
Labels
enhancement New feature or request
Milestone

Comments

@ronnetzer
Copy link
Member

ronnetzer commented Jan 17, 2022

I'm submitting a...


[ ] Regression (a behavior that used to work and stopped working in a new release)
[ ] Bug report  
[ ] Performance issue
[x] Feature request
[ ] Documentation issue or request
[ ] Support request
[ ] Other... Please describe:

Current behavior

Currently, there's no logic behind the distribution of projects into groups. Combining with the fact that each project in the group can run the target with dependencies as well, each group will probably have some overlapping projects.
There are 2 main issues with the current distribution:

  1. Projects with similar dependencies are not grouped together.
  2. Groups always run in parallel, which means they cant reuse a shared group's cache
    Unfortunately its impossible in GH to dynamically define needs in a matrix, it is also impossible to dynamically create jobs outside of a matrix/parallel jobs

Expected behavior

When projects are distributed into buckets, they should be first grouped by dependencies:

  1. first group should be projects that don't have any deps and are being used by projects that are in more than one bucket, this group should be ran first so the rest of the groups will already have them built from cache.
  2. The next group is projects that depend on any of the projects from the previous group.
  3. The last group is projects that depends on everything and nothing depends on them.
  4. In order to use the cache of the previous bucket, it should need it and so the next bucket will run only after the previous one finished (hence sacrificing some of the parallelization).
  5. projects should be ordered by deps which means each project will be after its deps
  6. Parallel groups should be further splitted into buckets while making sure not to split a project from its deps

it should be checked if there's a way to know beforehand if this optimization will really make the CI shorter and not apply it if not (in case of small project or something)
*also check if there's a way to remove the needs if cache already exists

Unfortunately all of the above is impossible in GH due to its architecture. its impossible to dynamically define needs for a single job in the matrix, it is also impossible to dynamically create/update job outside of a matrix/parallel type of jobs

When projects are distributed into jobs, we should try to "squeeze" all of the dependencies of a top-level project (a project which nothing depends on & it depends on most of the other projects in the graph) into that job in order to avoid overlapping top & mid-level projects being built across multiple jobs, as much as possible (as in most cases a large monorepo will have some base projects that are being used by most of the top/mid-level projects and it will be impossible not to build them across multiple jobs).
In order to mark a project as a 'top-level' project, its dependent list will need to be larger than some sort of a threshold, it can be calculated as total projects / some arbitrary number.

  • User should be able to set the threshold or the divisor
  • If there are unused projects (projects without deps and nothing depend on them) they'll be distributed evenly between the jobs
  • The amount of projects in a job is calculated as affected projects / maxDistribution, it should be possible to add more projects to a job even if its above the max (up to certain limit) in order to fully add all the needed deps of a top-level project

This should replace the default behavior as it can't make it any worse 🤷‍♂️ (currently the distribution is random)

@ronnetzer ronnetzer changed the title [FEAT] Matrix optimizations (WIP) [FEAT] Distribution Strategy (WIP) Jan 25, 2022
@ronnetzer ronnetzer added the enhancement New feature or request label Jan 30, 2022
@ronnetzer ronnetzer added this to the v3 milestone Feb 6, 2022
@ronnetzer ronnetzer modified the milestones: v3, v2 Feb 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant