Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Yarn with Rosetta 2.0? #283

Closed
liammulh opened this issue Jun 23, 2022 · 15 comments
Closed

Use Yarn with Rosetta 2.0? #283

liammulh opened this issue Jun 23, 2022 · 15 comments
Assignees

Comments

@liammulh
Copy link
Member

liammulh commented Jun 23, 2022

John and I are wondering whether it would be a good idea to use Yarn with the new version Rosetta because it would be nice to separate the client-side code and the server-side code. This would entail a monorepo structure like:

rosetta/
    |_.git/
    |_client/
        |_package.json
    |_server/
        |_package.json

John and I are in agreement that it would be best to keep the code for the client and the server in one Git repository as opposed to splitting the code into two Git repositories.

It seems like the disadvantages to two separate repositories are myriad. One of the potential disadvantages to a monorepo for Rosetta is integrating PhET's Gruntfile.js and working smoothly with AQUA would probably be difficult.

John and I would like to discuss this with the rest of the devs at the next dev meeting.

@liammulh
Copy link
Member Author

If we do use Yarn, should we use version 1 or version 2?

@liammulh
Copy link
Member Author

Re V1 vs. V2, according to the Yarn docs, you don't have to use the new "Zero Install"/"Plug'n'Play" scheme that gets rid of node_modules, i.e. you can still have node_modules and use V2.

@samreid
Copy link
Member

samreid commented Jun 30, 2022

John and I are in agreement that it would be best to keep the code for the client and the server in one Git repository as opposed to splitting the code into two Git repositories.

Yes, one repo sounds good. How does this relate to yarn?

@jessegreenberg
Copy link
Contributor

jessegreenberg commented Jun 30, 2022

From 6/30/22 dev meeting -

It seems best for a subgroup to discuss involving @samreid @jonathanolson @liam-mulhall and @jbphet who have knowledge of rosetta and yarn.

@jonathanolson voiced that he believes yarn has several advantages for the PhET code base at large.
@zepumph agrees.

@liam-mulhall is going to move forward with Yarn v2 and it can be changed if the subgroup decides something else would be better.

@zepumph: Why do we need two package.jsons?
@liam-mulhall: There are technical reasons that make having two better/required.

@samreid
Copy link
Member

samreid commented Jul 1, 2022

I'd like to better understand why we need 2 package.jsons, why yarn is better than npm, the costs (overhead) of adopting yarn in one place and whether we can jump over yarn and go straight to deno phetsims/chipper#1206

@liammulh
Copy link
Member Author

liammulh commented Jul 8, 2022

I'd like to better understand why we need 2 package.jsons

We don't necessarily need two package.jsons. That being said, I think having two package.jsons makes sense for two reasons:

  1. We can clearly delineate dependencies for client-side code and server-side code. This has the advantage of making future maintainers' lives a bit easier.
  2. We can use a meta-framework like Create React App (CRA) without needing to reconfigure it to support adding server-side code. CRA (and other meta-frameworks / auto-scaffolds) don't anticipate server-side code.

why yarn is better than npm

The main advantage of Yarn for my use case is that it supports a "monorepo" directory structure, where you have two (or more) repos in one directory. Yarn makes it so that dependencies aren't duplicated. For example, I use the Axios package in the client-side code and the server-side code. If I were to use NPM, I would have two copies of Axios, one for the client and one for the server. With Yarn, there's just one copy of Axios that is used on both the client and the server.

the costs (overhead) of adopting yarn in one place

The latest version of Yarn can be configured to behave similarly to NPM. That is, we can have the familiar package.json files and node_modules directory. Thus, I think the maintenance burden is not significant.

One cost of using Yarn is that AQUA will have to be configured to ask if a repository is an NPM repository or a Yarn repository before it pulls the most recent changes and runs ESLint.

whether we can jump over yarn and go straight to deno phetsims/chipper#1206

Deno is appealing, but given that learning Deno would cost me more time, I would prefer to stick to Node.


Let me know if you have more questions, @samreid! :)

@samreid
Copy link
Member

samreid commented Jul 8, 2022

Thanks for the elaboration. Just wanted to ask a few clarifying questions:

One cost of using Yarn is that AQUA will have to be configured to ask if a repository is an NPM repository or a Yarn repository before it pulls the most recent changes and runs ESLint.

Does this mean each developer and test machine (CT, phettest, build server, etc) will also need to do the equivalent of yarn update instead of npm update whenever there are changes to the dependencies in this repo?

We can clearly delineate dependencies for client-side code and server-side code. This has the advantage of making future maintainers' lives a bit easier.

Could this be nearly accomplished by having a package.json structure like so?

{
  "name": "rosetta",
  "version": "1.00",
  "devDependencies": {
    
    // Used in client and server
    "axios": "^7.15.0",
    
    // Used only in client
    "handlebars": "^7.15.0",
    
    // Used only in server
    "bun": "7.16.0"
  }
}

Of course this wouldn't be enforced the way yarn could enforce it, but maybe it is "good enough" if we decide that tooling change is too much of a hassle. (And we would need a way to put comments or equivalent in package.json files).

Just trying to understand all the ramifications before we move forward, thanks for discussing!

@liammulh
Copy link
Member Author

liammulh commented Jul 8, 2022

Does this mean each developer and test machine (CT, phettest, build server, etc) will also need to do the equivalent of yarn update instead of npm update whenever there are changes to the dependencies in this repo?

Ah, that's something I hadn't considered. Yes, I think any scripts that perform install, update, etc. commands would need to use the equivalent Yarn commands for this repo. If a developer who doesn't normally work on Rosetta needed to do something with it, they would need to use Yarn commands. The Yarn CLI is slightly different, but not so different that a short explanation in the README along with a list of common Yarn commands wouldn't suffice.

Could this be nearly accomplished by having a package.json structure like so?

If we found a reasonable way of putting comments in JSON, I suppose we could do it this way.

Just trying to understand all the ramifications before we move forward

This is actually sort of a blocking issue for me. Right now, I'm moving forward with using the latest version of Yarn with node_modules instead of the new "Plug'n'Play" scheme. I can't really make progress on Rosetta without a package manager! 😅

I wish I could use Deno, but I think it would involve some non-trivial changes to the already-mature Rosetta 2.0 codebase that would require (probably?) too much time for me to implement. (My goal is to finish Rosetta 2.0 well before the end of Q3.)

For example, I couldn't use Express in Deno, I'd have to use something like oak or Opine. Although I suppose I could use Node compatibility mode, but that seems like it might not be the best long-term solution.

I suppose I could fiddle with Deno for an hour or two to see how difficult the transition would be. What are your thoughts on this, @samreid?

Just trying to understand all the ramifications before we move forward, thanks for discussing!

Of course! I'm happy to answer questions. Thanks for your feedback.

@zepumph
Copy link
Member

zepumph commented Jul 8, 2022

Would it be easiest to just create a new repo that is specifically for the client? rosetta-client. Then we could continue to use npm and wouldn't need any more discussion about process/tech changes?

I also seems like it is most likely cheapest (depending on how much research and work has already gone into yarn). Do whatever you think is best, I just thought I'd say this out loud.

@liammulh
Copy link
Member Author

liammulh commented Jul 8, 2022

Would it be easiest to just create a new repo that is specifically for the client?

Yes, it would definitely be easiest! However, I've discussed this with JB and SR, and we agreed that separate repos would add an unreasonable amount of maintenance burden.

I just thought I'd say this out loud

Thanks, I appreciate your input! :)


Sam and had a fairly in-depth discussion over Zoom where I clarified my motivation for using Yarn, and the plan is to move forward with Yarn. I'm going to do some reading about the new Plug'n'Play scheme available in version 2 before I decide between it and good old node_modules.

@liammulh liammulh changed the title Use Yarn with new version of Rosetta? Use Yarn with Rosetta 2.0? Jul 8, 2022
@samreid
Copy link
Member

samreid commented Jul 8, 2022

The critical parts of my zoom conversation with @liam-mulhall were:

  • We would like to reap the benefits of create-react-app, which can be run with npm, npx or yarn.
  • We would like to be able to have one repo contain both client code and server code.
  • We would like the artifact from create-react-app to not necessarily be the top-level directory (it should be nestable).
  • Repos should be able to contain more than one artifact from create-react-app.
  • We do not want the client dependencies to bleed into the create-react-app artifacts.
  • Create-react-app with deno looks immature at the moment.
  • @liam-mulhall will decide between npm, npx or yarn, and update the instructions in rosetta accordingly. Once that is done, we will check for other parts of our process (other devs, bayes, CT, phet-test, build process, deploy process, etc) that may need to be updated with the new steps.

@liammulh
Copy link
Member Author

liammulh commented Jul 8, 2022

Okay, I did a pretty deep dive on Yarn. Here's what I found out:

Overview

  • Started in 2016 at Facebook as replacement for NPM.
  • Goal was to create more secure, stable, and efficient package manager.
  • Initially added features NPM didn't have.
  • NPM has since implemented some of these features.

Killer Features of Yarn

  • Generally much faster than NPM.
  • Much better support for de-duplicating packages in monorepos.

Yarn V1

  • Yarn V1 is more akin to NPM than Yarn V2. It uses a node_modules directory and one or more package.json files.

Yarn V2

  • By default, V2 abandons node_modules in favor of .yarn/cache.
    • Q: Why does it do this?
    • A: node_modules is huge, and it negatively impacts the performance of the package manager.
    • Q: How does Yarn V2 resolve dependencies?
    • A: It has a file called .pnp.cjs that contains two maps: one map links package names and versions to their location on the disk, and the other links package names and versions to their list of dependencies.
  • This new scheme allows for "Zero Install".
    • Q: What?
    • A: Configure PNP to resolve dependencies via the .yarn/cache directory rather than the node_modules directory, and check .yarn/cache into version control.
    • Q: Wait, isn't that the same thing as checking node_modules into version control?
    • A: No! To give you an idea, a node_modules folder of 135k uncompressed files (for a total of 1.2GB) gives a Yarn cache of 2k binary archives (for a total of 139MB). The .yarn/cache directory contains exactly one (compressed) file per package, as opposed to node_modules, which contains a gigantic amount of files.

@liammulh
Copy link
Member Author

liammulh commented Jul 8, 2022

In doing my deep dive on Yarn, I found out that NPM also supports "workspaces" (a feature Yarn is known for), which enable support for a directory structure like the following:

foo-project/
    |_.git/
    |_package.json
    |_workspace-a/
        |_package.json
    |_workspace-b/
        |_package.json

In the root package.json, you would see something along the lines of:

{
    "name": "foo-project",
    "private": true,
    "workspaces": ["workspace-a", "workspace-b"]
}

So I think this will allow me to achieve all of my goals. It allows me to separate the client dependencies and the server dependencies, it allows me to use Create React App (or some other auto-scaffold thing), and using NPM instead of Yarn means I don't have to update AQUA, for example, to handle a repo that uses Yarn.

@jbphet
Copy link
Contributor

jbphet commented Jul 26, 2022

@liam-mulhall - I'm good with the decision to continue with NPM. Can this issue be closed?

@liammulh
Copy link
Member Author

Yes. @samreid and I came to the conclusion that it wouldn't make sense to introduce Yarn as a package manager because it would require us to add code to repos like AQUA that interact with an active repo's package manager.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants