-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should PhET transition to a monorepo? #1242
Comments
To be frank... A single repo for everything is not at all practical. I can't recommend against it strongly enough. Based on past experience working with diverse organizations (startups to Forture 500 companies), here's what I see: PhET is a growing project, experiencing growing pains. And modularity becomes MORE important, not LESS important, as a project grows. Moving to a mono repo is a giant step to LESS modularity. That said... Before putting everything in one big repo, I'd consider consolidating common code into a smaller number of repos (possibly even 1 repo), while continuing to keep sims and other "products" in separate repos. See how that goes, if it addresses issues that PhET is having, etc. If there are still issues.... Experiment with using versioned framework repos, instead of always working in master. That's the typical approach used by organizations that have many separate products supported by a common framework. There are definitely costs, merges are probably the biggest cost/hassle. It will feel inconvenient for devs who are used to working in master. And it requires robust project management and scheduling (which PhET currently lacks) to roll out new versions of framework libraries. These costs are the trade-offs of keeping a large/growing development team running smoothly. Other cons of a mono repo:
|
To clarify what a monorepo might look like, I was picturing a structure like this:
To manage issues + milestones, we would still have GitHub repos for each common code repo and sim repo, like "axon", "scenery", "energy-skate-park", etc. But they would have no code. The monorepo would be a solution to the problems "how can we easily push/pull multiple repos at once?" and "how can we create cross-cutting feature branches easily?" and "how can we prevent CT from running on a test where it only pulled half of the pushes?"
We currently have 27 common code repos. It's difficult for me to imagine releasing each of those as a separate versioned product with our team scale, management and priorities. Is it one of our long-term goals to release each of those independently under semantic versioning?
I don't see how bugs/lint errors/type errors/etc in one sim could impact an unrelated sim in the monorepo structure described above. I agree with many of the points in the preceding comment, just wanted to jot down that polyrepos have triggered a number of pain points. It is not win/win to go to monorepo, but I wanted to start tracking the discussion and pros and cons. And maybe we just need to keep building better tooling (like a tool that makes it easy to manage feature branches across N repos). |
I was interested enough in this idea after a push failed for me last night on 1 of 2 repos and caused CT to fail all night. I read https://www.perforce.com/blog/vcs/what-monorepo because I remembered that Google famously uses one with 86 terabytes of data. Reading this article I am not convinced it is right for our project. It highly recommends against it when using It also feels like a step in the wrong direction for open source. If we get the POSE grant, I feel like having a mono repo would be a step in the wrong direction for creating reusable libraries that could be called an open source environment. Can you imagine the "README" for a sim saying "to run example sim locally, please download the entirety of all simulations." How would PhET-iO work? It seems like the majority of examples are based in fully proprietary codebases where every having access to all code is acceptable. I don't see a solution where PhET-iO can exist in a mono repo, so aren't we immediately saying duo-repo, one for the open source stuff, and one for the private stuff. I read #1242 (comment), but can you actually have private section of a repository, because I don't think you can, at least not with git/github. (I read https://24ways.org/2013/keeping-parts-of-your-codebase-private-on-github/ and it seems unwieldy to host 2 remotes or have private code in branches). In general over the last 20 minutes I have convinced myself that it isn't worth the time or energy personally to investigate this further. Sometimes with these larger issues it is harder to know how much discussion is enough to come to a consensus. @samreid I'm happy to be convinced, let me know if you want to discuss further. |
Re questions in #1242 (comment) ...
I'm not advocating versioning 27 repos. I'm suggesting that, as a scalable alternative to a mono repo, common-code repos could be combined into a manageable number of repos that could be versioned. PhET-iO repos could be combine into 1 private repo and versioned. And sims could remain in their own repos, versioned as they currently are.
I have the entire mono repo checked out. I want to run lint or tsc for my sim and its dependencies. If you create new build tools that know how to do that (for only my sim and its dependencies) then great. If not, then I'll have to lint/tsc everything, and I'm going to see lint and tsc errors in code that's unrelated to my sim. |
My responses will be somewhat brainstormy and I'm not strongly advocating this, just thinking it through. Mainly I'm trying to ask "can we get atomic pushes and pulls without causing too many other problems"?
The article also says things like: "Using a mono repository is a good idea for many companies. " and lists many advantages of the monorepo.
Agreed!
Yes, but I can also imagine that it would be easier for third parties to clone one repo instead of a 2 dozen.
I think we would have one 100% repo with everything, including phet-io, then mirror it using filter-branch. But this would make contributions from 3rd parties difficult or impossible?
I agree this is probably not going to be in our best interest. But it seems good to understand (a) what are the costs of the multi-repos, and why that is preferable to the alternative.
That makes sense and may work well for the POSE grant.
Our existing tools will do that nicely. We won't put code for circuit construction kit and geometric optics in the same directory or anything. |
In phetsims/rosetta#283 (comment), I wrote up some notes on Yarn that might be helpful here: Overview
Killer Features of Yarn
Yarn V1
Yarn V2
|
This has been a good discussion, and I agree we should not transition to a monorepo. There are side issues related to other levels of consolidation, and other issues about dealing with the atomicity of commits, and versioning common code repos together. Closing. |
Today Quick CT showed a false positive error because it pulled code between 2 consecutive pushes. If we did all development within one repo, commits/pushes/pulls would be truly atomic and this kind of problem wouldn't happen. We have discussed monorepos every now and then in developer meeting but I thought it would be nice to have an issue to track that discussion, and to enumerate the pros and cons.
Dev meeting: March 17, 2022
@jonathanolson: It’s hard to do the feature branch method with as many repos as we have
@samreid: Other companies have gotten around this by having all code under one big mono repo
@chrisklus: Google does this
@pixelzoom: Never worked somewhere that uses one repo for everything, but how about trimming down our repos by grouping common code, phet-io, etc
Pros of a Mono Repo
Cons of a Mono Repo
Also, please be aware that there are many articles about "monorepo vs polyrepo" with many good points.
I don't think we should take any action on this at the moment, but may be good if @jonathanolson wants to chime in to round out the pros/cons. If we ever undertake this, it would be epic level proportions.
The text was updated successfully, but these errors were encountered: