-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Repos, repos, repos #2446
Comments
Aegir to the rescue |
I also see this as an opportunity to start using labels instead. IMHO, one single repo with issues, which we can filter through tags is more efficient to maintain |
I can see that both ways, and it's been a sticking point for adoption in the past, but I do notice that |
On: You can pass a http-api-client today as an argument:
https://github.com/ipfs/js-ipfsd-ctl/#ipfsfactory---const-f--ipfsfactorycreateoptions So it should just be a question of upgrading the tests to do it |
N.b 'you can pass' is not the same as 'require an implementation to be passed'. Sometimes you've got to be cruel to be kind. |
How does it differ for the goal you described? |
We'd remove |
@achingbrain what about the 260 repos using it -- https://github.com/ipfs/js-ipfsd-ctl/network/dependents?package_id=UGFja2FnZS0xNDQ1MDU1Mw%3D%3D -- plus the ones in the future? Is it worth adding one more step for them in trade for us having a angry test script shouting so that we don't forget a good practice? I'm sure we can do this in a better way. |
Most of those repos seem to use This is (as I see it) the least contentious part of this conversation, shall we take it to a dedicated issue? |
An example of a contributor who is confused by our module setup: ipfs-inactive/js-ipfs-http-client#1034 If this were one repo with one PR-per-change, he would find it easier to contribute. |
An example of the problem: right now there are changes merged into Other PRs to One solution is to have the deps in your |
@achingbrain that seems to be a lapse. A JS IPFS in process node inside the browser has no HTTP API and therefore needs no IPFSApi. As you can see, it doesn't even get used https://github.com/ipfs/js-ipfsd-ctl/blob/master/src/factory-in-proc.js#L87 |
I suppose we could spin this round and ask if anyone thinks it actually helps developer productivity and new contributor onboarding to have to submit/review/update (potentially) three or four PRs to fix an issue or add a new feature? |
@achingbrain on your study, make sure to not fall into the trap of running a uni-variate analysis. Avoid it by including the multiple requirements, goals and why we have the current set up that we have today, how that affects the multiple types of users and contributors and how important each is. Note that I don't suggest this with the intent of adding a block on your path, quite the contrary, more of a stair. This is a discussion that has been going on for years and it was never carefully reviewed beyond some venting here and there. Personally, the option to get the http-client and js-ipfs in the same repo is growing on me, specifically because then the interface tests are all in one place. That said, I don't think lerna is the way to go. |
Which alternatives to lerna do you think are the way to go? |
As a relative newcomer and advocate for the community, just confirming that this feels like a really big hurdle to contributions. I wasn't involved in the history that got us to this state, so I'll leave it others to define and evaluate the the trade-offs, but a huge +1 from me from the perspective of supporting new contributors. |
Another sub-con of "Tools that assume one-module-per-repo will need fixing" that I have stumbled upon recently is that you are not able to consume forked/in-process packages using It might not be such a problem as the proposed packages to be merged are "on top of the chain", yet still it is a potential inconvenience. |
As an example of the complexity we have wrought, here's a graph of (maybe) all the PRs required to land UnixFSv1.5 metadata (the ones with #? after haven't had PRs opened yet but will need them). That's adding two fields to the The arrow shows a 'depends on' relationship for each PR. |
Don't feel shy of coalescing all the modules indicated by the yellow arrows. It was what was majorly proposed by #2222 Everyone is in support of the changes highlighted by the yellow arrows :) The pink boxes was just a way for me to highlight that there might be some circular dependencies there that are not required. |
Modules like @textile/ipfs-lite and ipfs-only-hash use the importer and/or exporter. I like to think there will be more experiments one day that are based on our stack - these would be harder to produce if everything is part of the main ipfs module (though not if we go down the proposed Lerna mono-repo route since everything will still be published as a separate module). Merging the importer, exporter and unixfs modules makes a lot of sense though - these are low-churn modules that are related in functionality and tend to change at the same time. Folding mfs into core is also a good idea, but we should go all the way and include the utils, the rpc client and interface tests too as these are high-churn modules which also typically change at the same time and even after collapsing our tree as suggested, we still can't deliver one feature with one PR. At our session on this at lab week everyone agreed that this is a good idea. I'd love to hear from @alanshaw though. |
Great, dooooo eeet :D
Awesome, 🚢
To be reviewed/revisited. This should not block the other coalescing to happen though. Can we agree to make the coalescing in what we have agreement and evaluate the remaining after? PS: When merging/coalescing, use the same technique that @jacobheun used for js-libp2p so that commit history is preserved |
I would prefer to permanently solve this problem that haunts us every single day or else we are just delaying a thing that needs to happen, exactly like async/await we finally did it and everyone is super happy and ipfs better for it. We just need to agree on which packages we want to publish standalone, merge the others and put all in one repo so we can finally make just one PR for a changeset. |
I feel that if we do do this then it needs to be a lerna mono repo (or similar) i.e. multiple npm packages in one repo. I think module isolation is the best way to ensure developers build and test their work independently from other aspects of the system. Lumping everything in one repo breaks the isolation and gives developers rope to build highly connected tightly coupled, and interdependent code that is a nightmare to maintain and adapt over time. We have been bitten again and again by changesets spanning multiple repos that become blocked by releases of partial changesets. At worst this completely blocks all work and at best it creates additional work for both the contributor and maintainer - i.e. unblocking requires the contributor to backport their changes to an earlier release and have the maintainer the review and release those changes. Not only that, but the process to land a changeset requires a lot of admin work: We have to open and orchestrate releases of multiple PRs in repos maintained by different people and temporarily depend on git dependencies to prove CI passes upstream. All this negatively effects our pace of development and for me this is one of the biggest problems I think we could solve by doing this. You could argue that better communication between contributor and maintainer on when to release a PR would improve this situation but for multiple maintainers across timezones we can’t always guarantee merges and releases will happen in a short enough time to not cause blockages elsewhere. We also can’t really assume new contributors will do this correctly and it doesn’t solve the admin problem of having to open many PRs and thread them all together. Caveat: My assertions on the benefits are not backed by experience. I’ve never used or contributed to a monorepo before...and so this is part of the reason why I’ve taken so long to respond on this. Another part is that my gut feeling is that we shouldn’t do this. I have concerns around discoverability (only one entry point for many modules), PR size (I’d rather review multiple small PRs), interspersed commit history, the fact that the OSS community in the majority expects one repo per module, and a few other minor things...but ATM I don’t think that these are enough reasons to not do this and given that the rest of the team backs it already I feel like we should just try it and be sure to retrospect on it. That said, I think we need to work out some details before we can initiate this though. I’ve made a small list:
|
I think another option would be to request a new feature from Github that allows you to link PRs together (using branch name?). I don't know how useful this could be in practice but I think it should be something to consider. |
@alanshaw's questions about repo collapsing are really good. I spent a long time straddling monorepo world (mozilla-central) and many external/adjacent projects in module or sub-project repos, so have a few thoughts about pros/cons and have seen how they were played out, and have felt the pain of both ways 😅 In my experience having single repo for large project really helps reduce the cognitive, workflow and coordination overhead we're trying to address here. But not for free, also brings new challenges ;) But things like group norms are easier to be consistent about, whole group activity is highly visible as a default (a new and different signal-to-noise challenge though!), and discoverability is much easier. And none of that prevents shipping modules separately a la Lerna.
What type of discoverability? As a consumer of individual modules, looking on Github, discoverability will be harder. As a consumer of modules on NPM, things like Lerna might close the gap. As a core contributor I think discoverability is far better in collapsed repos. I went from monorepo to module-per-repo, and visibility and discoverability both feel soooo difficult now. Used to be that all the modules were in one place that I could navigate with local tooling. Now I'm always having to traverse boundaries that feel artificial, and always result in overhead - network, tooling, workflow to work across loads of repos.
This was a real problem for a long time in gecko and Firefox development. As a "module peer" (kinda like "lead maintainer" - a code reviewer across the equivalent of loads and loads of repos), it was a nightmare. However, the advent of six week cadenced release process made mega-patch style of development untenable and forced a change. Reviewers started demanding patches be broken up into smaller pieces and stacked, landed iteratively even if it meant things like interspersed commit histories and other trade-offs (landed but commented out, disabled tests, etc etc). Now that's the default for large changes, and the developer productivity tooling supports it via patch queues per issue, and other features. Definitely takes intentional practice from reviewers, and communication of those practices to people submitting PRs.
No silver bullet. Can be a big downside to collapsing repos, depending on what the history is used for. I'd begin by identifying exactly why we find the history desirable for this project in particular, and then exploring mitigations and alternate solutions for those needs. Previously I've seen things like globally searchable history + commit message requirements help with history tracing, and also strong use of feature flags and/or backout-ability required (iow, transferring the contiguous part of a change from the commit history to the filesystem), which also helps enforce better in-repo modularization and avoiding the tight coupling that module-per-repo is so great at defending against.
Should challenge this assumption. Repo per module is a fairly new phenomenon and doesn't seem at all settled as a best practice for all projects. It's super useful for some types of JS projects, but even that OSS sub-community is bumping up against the limitations of it as time goes by, as we're seeing here in our case. I wonder if Github has stats around this... |
Given that
I don't think we need more than we have right now. Green CI & peer review conquers
Lerna's release tools are pretty good - it's clever enough to work out which modules depend on which so which need new releases for a given changeset so we'll hopefully sunset some of AEgir's functionality. We might need to extend or wrap it to do the automated contributors bit, off the top of my head. For testing/linting etc, we just use npm scripts, that invoke aegir, the same as today.
After collapsing things like
In short, in the source repo you
Per module or system component? Most people just open an issue on
I believe so - it should be possible given that everything is under the
Given a change that touches core, the http client and the interface test suite the total CI time will not change, because you have to run the tests of all those modules to get a changeset through anyway. It might feel like it takes longer but that's just because you are running them all together instead of individually over a period of days or weeks 😁
We can empty the repos, put a README file in describing the new location of the files and archive the repo. Over time no-one will look at them any more.
Not that I can think of. At least nothing that outweighs the gains we'll receive from reduced complexity and PR busywork.
I think this is key. We should continuously question our tools and practices and ask if they are making our lives harder or easier. Nothing is forever, we can always split the modules out again if it creates more problems than it solves. |
Overall, +1 The only thing I’d like to see is a simple and clear explanation of why something would be in the monorepo and when it would not. There’s always going to be external dependencies that won’t be in the mono repo, some of which will probably still be in the IPFS org as their own repo, so providing clarity to everyone one why something is in or out would be super helpful.
The characterization that “the majority” expect this is totally fair. Of the more than million modules in the npm ecosystem more than 99% of them have a 1-to-1 relationship between the package and a repo. However, Vue.js and a couple other large and notable projects use a mono repo and as a result all of the most popular tooling that would assume this has been updated to handle it. This wasn’t the case a few years ago but it’s better now. Monorepo is not typical or expected by any means, but anyone who has reached the point where a monorepo looks like a solution to their problems is at a size and complexity level where it’s hard to argue that a monorepo layout is going to confuse things more than they already are. |
Important qualifying words. NPM isn't yet the entire open source software universe, far from it. As Hamlet said:
Yeah, +1 to this either way:
|
This is a JS project, JS developers (backend and frontend) use npm and the modules there 😁 Monorepo’s are quite common in Go but I don’t know of any other ecosystems where they are the norm. |
If you're looking for examples, Maven has had multi-module projects pretty much forever. Anyway, once We might keep the markdown api docs in |
We have merged these repos, so I'm going to close this issue. We should validate the decision/see what we can improve next time we do a retro. |
Making changes in
js-ipfs
typically involves changing this repo, theipfs-http-client
,interface-js-ipfs-core
and alsoipfds-ctl
(because of it's dependency onipfs-http-client
.We have these as separate modules but they are really modular in name only due to the cross-dependencies indicated above.
This has had a number of consequences:
js-ipfs-utils
is helping with the utility methods but is also compounding the problem).Since this project does not include the libp2p and IPLD codebases it should be a relatively thin layer on top of them, yet development is very complex.
We've started talking about re-integrating some core modules in #2222 but we should go further than that.
⚒️ Break the
ipfsd-ctl
/ipfs-http-client
dependencyPossible solutions:
a. Make
ipfs-http-client
a peer dependency ofipfsd-ctl
b. Require an implementation to be passed in the same way as with in-proc
js-ipfs
daemons💍 One PR to rule them all
Combine the repos. One way to do this might be adding the client/ipfsd to core:
✅ Pro: Simple to use
✅ Pro: Works with existing unspecified tooling that assumes one-module-per-repo
❌ Con: If you only consume the client you'll end up installing lots of dependencies you don't need
❌ Con: Separate PRs to
interface-js-ipfs-core
still required unless you add that in too❌ Con: Poor separation of concerns
❌ Con: All-or-nothing releases
Another way might be a lerna-style mono-repo a la babel:
✅ Pro: One change, one PR
✅ Pro: Existing dependant code does not break because module names do not change
✅ Pro: Still able to release individual modules independently
✅ Pro: Easier to make, review and release changes
✅ Pro: Easier to contribute to
✅ Pro: The tooling figures out cross-package dependencies, no more
npm link
required✅ Pro: Clever enough to figure out which package depends on which and only test/release what's changed
❌ Con: In the worst case tests take as long as the slowest module
❌ Con: Cannot group GH issues by package
❌ Con: Tools that assume one-module-per-repo will need fixing
❌ Con: When adding new dependencies, it's easy to forget to add them to the right package due to dependency hoisting
cc @ipfs/wg-js-core and @ipfs/repos-javascript
The text was updated successfully, but these errors were encountered: