Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scoped gems proposal #40

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Scoped gems proposal #40

wants to merge 5 commits into from

Conversation

mullermp
Copy link

@mullermp mullermp commented Apr 5, 2022

rendered proposal

Hello RubyGems team, and all those that come across this.

In this PR, I have included a proposal for a feature called "scoped gems". In short, the proposal is to widen the gem naming specification to include a new character @ to group related gems together under a specific organization reserved suffix. The naming pattern follows gem_name@scope. On the first gem push (new record), if the gem is scoped (follows the pattern), the gem's scope will be validated to have been created by a user from an organization that reserved the scope. A scoped gem can be installed and required as normal gems are today.

For example, consider aws-sdk-s3, the S3 gem for AWS. If this gem were scoped, it could be published as s3@aws-sdk (or more generically, <service>@aws-sdk). This gem can only be created by a user in the AWS organization (which has reserved the aws-sdk scope). A user can install this gem with gem install s3@aws-sdk and require it with require 's3@aws-sdk'.

The main benefits of this feature are that organizations can publish their own groups of gems (i.e. multiple organizations can have a "configuration" gem), and organizations are able to reserve gem names (via @scope suffix, similar to a reserved prefix). A developer can be reasonably sure that any new gem such as new-cool-feature@rails is an official Rails gem, or new-s3-service@aws-sdk is an official AWS SDK client, or even socket@ruby to be an official stdlib gem! This reservation system combats "fake" "similarly named" gems that are branded as official that attempt to steal personal information.

Please leave any feedback and I would be happy to amend the approach/design.

@indirect
Copy link
Member

indirect commented Apr 5, 2022

I feel like namespaces with the syntax @scope/gemname would be less surprising to users given that is the way NPM implements them. What are your feelings for/against that syntax?

(I also added a link to the rendered doc to the top of the PR description for easy access.)

@mullermp
Copy link
Author

mullermp commented Apr 5, 2022

The / portion gave issues, especially around gem commands (because it assumes a file path), though I suppose that can be parsed correctly. Aside from that, I actually think we should NOT use the same format as it could be especially confusing when working with both languages (i.e. Rails website).

Edit (5/5) - I had updated the RFC explaining the technical challenges/limitations of using @scope/gemname. I would prefer to use @scope/gemname too but the risk/reward is not desirable in my opinion.

@ioquatix
Copy link

ioquatix commented Apr 5, 2022

If we can't get proper organisations, this might be a good first step.

I'd be a tiny bit concerned this works us into a position where supporting proper organisations is harder.

It's definitely concerning when you create a gem like async and then someone else clobbers your namespace by releasing a gem async-thing-you-care-about. Not sure if this proposal solves that problem?

@mullermp
Copy link
Author

mullermp commented Apr 5, 2022

That is valid feedback. I would think that a user's scope can be easily transferred to an organization concept in the future assuming the scoped username shares the future organization name (i.e. s3@aws-sdk is scoped to the aws-sdk user today, and then scoped to the aws-sdk organization tomorrow - it doesn't change how that gem is installed/required). I have an open question related to migration, that makes use of an "alias" feature. If we go that route, then to support organizations, I suppose you could even alias a gem like s3@aws-sdk to s3@new-organization-for-aws and not require the changing of code for developers either.

Though what do you mean by "proper" organizations? I assume you mean some Rails modeling that groups a bunch of owners together under some entity. Since gems can have multiple owners (already an "organization" if you will), I thought this solution fit in nicely. This proposal can certainly have a concept of such an organization entity and use the same gem naming pattern.

Also, regarding the async example, I believe that same issue exists today regardless of gem name. My gem foo can have an Async module, and your gem async could also have the same module. I don't think we can realistically solve for that, if I'm understanding your concern.

@ioquatix
Copy link

ioquatix commented Apr 5, 2022

Part of my concern is asserting that "async-foo" is official and "async-foob" is not.

@mullermp
Copy link
Author

mullermp commented Apr 5, 2022

Ah. I want to be clear that async@foob wouldn't be more or less "official" than async@foo in that example. The scope signals to developers a trusted source/organization. Both foo and foob can be valid users with valid gems that solve use cases.

@mensfeld
Copy link
Member

mensfeld commented Apr 5, 2022

At first glimpse, I like this idea. It does not break too many things and seems relatively simple in implementation. My only worry is, that it won't be compatible with PURL format: https://github.com/package-url/purl-spec convention where the namespace is defined before the name:

scheme:type/namespace/name@version?qualifiers#subpath

though it could be fixed by the routing itself in the rubygems.

@mullermp
Copy link
Author

mullermp commented Apr 5, 2022

Interesting, I have never heard of PURL. Though looking at npm's example pkg:npm/%40angular/[email protected] it looks like @ is escaped in the package's scope. So I imagine s3@aws-sdk version 1.2.3 could become pkg:gem/s3%[email protected].

@mullermp
Copy link
Author

mullermp commented Apr 7, 2022

Summon @simi @hsbt, I'd love to hear your feedback on this!

@simi
Copy link
Member

simi commented Apr 7, 2022

I was thinking about this for long time. My biggest concern is transition time. I was thinking about some kind of backward compatible naming scheme (at least for fallback). For example if we decide to do namespacing using "rails/activerecord" scheme, we can fallback for older RubyGems/Bundler by using "rails--activerecord" (or similar, double dash is used only in few gems we can yank currently according to DB dump).

@Fryguy
Copy link

Fryguy commented Apr 7, 2022

Similar to that concern, I have a concern that people might squat transitional names if that's what we decide on...for example, it would be bad if I could create a rails--activerecord gem now anticipating the real gem moving to the new format

@mullermp
Copy link
Author

mullermp commented Apr 7, 2022

I think "anticipated" squatting can be mitigated (simi mentioned yanking).

@mullermp
Copy link
Author

mullermp commented Apr 7, 2022

@simi Are you suggesting scoped gems ALSO reserve Ruby namespaces? I.e. s3@aws-sdk MUST use an AWS::SDK Ruby namespace?

@simi
Copy link
Member

simi commented Apr 7, 2022

Similar to that concern, I have a concern that people might squat transitional names if that's what we decide on...for example, it would be bad if I could create a rails--activerecord gem now anticipating the real gem moving to the new format

We can reject gems with -- at RubyGems.org and add it to RubyGems specification policy as a warning.

@simi Are you suggesting scoped gems ALSO reserve Ruby namespaces? I.e. s3@aws-sdk MUST use an AWS::SDK Ruby namespace?

No, we don't check anything in the code and I think it would be super complex to start doing that.

@deivid-rodriguez
Copy link
Member

Hi!

I like this feature proposal, and I think there's one extra benefit that hasn't been mentioned: it makes "soft-forking" easier. Say Rails wants to temporarily fork the "mail" gem, to provide a better experience until mail gem owners can get to addressing some important issues. Right now, the only way is to come up with a new name, and it's not clear how to properly communicate that the fork is only something temporary, and not meant to completely sunset the forked gem. Releasing mail@rails and depending on it temporarily makes this intention more clear I believe.

My main concern is namespace squatting too. I'm not sure I understand the current proposal and how do we prevent it. Can rubygems.org users create "custom scopes" (different from their usernames)? If that's the case, what prevents any random user to create the @rails namespace? If not, then it seems aws-sdk would be already squatted? Maybe there should be a transitional period where new scopes need to be explicitly approved to avoid this?

Regarding old clients, is the scope--name notation meant so that the feature works as is in old clients? I'm not sure we should choose a weird naming scheme just to support old clients. I think Bundler with a Gemfile.lock file would handle this pretty well since it's able to trampoline to the version that created the lockfile.

Personally, my preferred naming scheme is the one suggested by @indirect, although I understand it would require more work due to the ambiguous "/".

I'm not too sure about how to migrate to the new scheme, it seems quite complicated. I guess duplicate pushing would be best, maybe enhancing the clients to ease it, for example, something like gem build s3@aws-sdk --alias aws-sdk-s3 that builds a "duplicated gem" with the proper legacy naming.

@mullermp
Copy link
Author

mullermp commented Apr 7, 2022

@deivid-rodriguez Thanks for the feedback!

My main concern is namespace squatting too. I'm not sure I understand the current proposal and how do we prevent it. Can rubygems.org users create "custom scopes" (different from their usernames)? If that's the case, what prevents any random user to create the @rails namespace? If not, then it seems aws-sdk would be already squatted? Maybe there should be a transitional period where new scopes need to be explicitly approved to avoid this?

I reserved aws-sdk user immediately prior to posting this RFC :D. I think rails user is also owned by the Rails team. I made the assumption that your username is your scope. I think we can certainly use an "organization" here although it requires more thought/design. An "organization" can simply be a group of users on Rubygems, who have access to 1 or more "scopes". Alternatively, the organization name can be the scope itself. I think ultimately there may need to be some initial enforcement.. some users will go and squat some names but we can root them out - I think we can reliably assume users aren't building new software on most of those published gems.

Personally, my preferred naming scheme is the one suggested by @indirect, although I understand it would require more work due to the ambiguous "/".

Yeah, I can see reasons to want it. I started with this approach first but it introduced some complications. Specifically, we'd have to handle these cases: No such file or directory @ rb_sysopen - @mullermp/hola-0.0.0.gem where the gemspec's name assumes a path. Even when fixed, when it's installed, it may also create another nested directory in your gem install location, and that may or may not play nicely with existing tooling? I went down a rabbit hole and decided it was more effort than it's worth, but I could be wrong.

I'm not too sure about how to migrate to the new scheme, it seems quite complicated. I guess duplicate pushing would be best, maybe enhancing the clients to ease it, for example, something like gem build s3@aws-sdk --alias aws-sdk-s3 that builds a "duplicated gem" with the proper legacy naming.

I'm ok with duplication as it would certainly be the safest option. In practice, as a gem maintainer with 300+ gems, it's not as feasible and probably causes some customer confusion. I think a one-way one-level alias might make sense, but we'd have to handle gem install locations too, perhaps a symlink between s3@aws-sdk (real source) to aws-sdk-s3 (symlink folder). If it's handled by Rubygems via alias, it prevents a lot of duplicative work by maintainers.

@hsbt
Copy link
Member

hsbt commented Apr 7, 2022

👋 I'm positive to add this feature. But I'm not sure what the best syntax about gem_name@scope same as @indirect

We can choose:

  • gem_name@scope
  • scope/gem_name
  • @scope/gem_name
  • etc.

Does anyone summaries scoped namespace feature of other package manager?

@ioquatix
Copy link

ioquatix commented Apr 8, 2022

I think one of these is the most reasonable:

scope/gem_name
@scope/gem_name

the latter seems to be the format used by npm IIRC, but I think the former can be slightly better (what's the reason for/motivation of @ character?)

@indirect
Copy link
Member

indirect commented Apr 8, 2022

I think the motivation for the at-sign is to clearly distinguish the scope (@scope) from the package name (gem_name). It also makes it possible to talk about the scope separately from the package with the same name. For example, npm hosts a webpacker package in the @rails scope, named @rails/webpacker. Without the @, it's impossible to tell if rails means the scope or the top-level package rails.

@indirect
Copy link
Member

indirect commented Apr 8, 2022

A tricky question about scoped packages: in Node, it is completely fine to have both @rails/mail and mail in a single project. They do not conflict.

In RubyGems, it would be (presumably) impossible to have both @rails/mail and mail in a single project, because they would both define the Mail constant. That means either a lot of weird errors if someone adds both packages, or a lot of extra work in RubyGems and Bundler to prevent that kind of conflict.

Can a dependency on mail be satisfied by @rails/mail? If not, it's probably impossible to have a Gemfile that resolves. If yes, that's even more work that needs to be done inside RubyGems/Bundler.

@ioquatix
Copy link

ioquatix commented Apr 8, 2022

@indirect if I'm understanding the original proposal:

# @rails/mail
module Rails::Mail
# mail
module Mail

Is that correct?

@mensfeld
Copy link
Member

mensfeld commented Apr 8, 2022

Interesting, I have never heard of PURL. Though looking at npm's example pkg:npm/%40angular/[email protected] it looks like @ is escaped in the package's scope. So I imagine s3@aws-sdk version 1.2.3 could become pkg:gem/s3%[email protected].

yes but my points was, that the notion of "scope" is not part of purl because of namespaces. However I understand the reasoning here and backwards compatibility.

@hsbt

Few registries slowly drift towards purl: https://github.com/package-url/purl-spec

@indirect on top of that, for node you can have two versions of the same package in the same repo.

About the namespaces: would be good to estimate how many gems actually use their correct namespace vs patching other things or using completely different namespaces.

@deivid-rodriguez
Copy link
Member

@mullermp I guess you're right, we would need to deal with squatting as just another type of possible abuse, and define clear rules about it.

@indirect I think in the particular case of mail, it would mostly work because most Rails users don't use mail directly, and Rails is also in control of the dependency and how it's used, so Rails would change their dependency on mail to @rails/mail and they could also choose to namespace the Mail in their scoped gem to stay on the safe side and avoid any conflicts with people using the mail gem directly.

Anyways, this small benefit is just something that came to my mind as a potential extra benefit, but it's not even mentioned in the RFC. I think this RFC only proposes a way to allow gem name collision, but does not impose anything on what top level module a given gem should define.

@simi
Copy link
Member

simi commented Apr 8, 2022

Another question is how to resolve dependencies? Should we use full (including namespace) gem identifier everywhere (including Gemfile.lock and gemspec dependencies specification)? I see the problem in compatibility again in here. We need to ensure transition is as smooth as possible.

@deivid-rodriguez
Copy link
Member

Yes, I think the new name should be used everywhere, otherwise it's impossible to differentiate differently scoped gems with the same name, no? I don't think there's much that can be done about compatibility, unfortunately, except for going down the route you suggested before: choosing a name scheme that's currently valid, just mostly unused. It would definitely make things smoother, although not so nice. I'm not fully sure which option is best.

To elaborate on what I said before about Bundler being able to "trampoline", the idea is that, say, support for the new naming scheme is added on Bundler 2.4. And someone upgrades to Bundler 2.4.0 to be able to bundle @rails/mail in their Gemfile. Then a Gemfile.lock file will be generated including the new incompatible naming scheme, and the BUNDLED WITH 2.4.0 marker. If someone bundles this Gemfile{.lock} file using Bundler 2.3.0, Bundler 2.3.0 will automatically detect that the application needs Bundler 2.4 and automatically upgrade itself, so things should just work. Unfortunately this feature is only very recent and versions older than 2.3.0 would still fail hard.

@simi
Copy link
Member

simi commented Apr 8, 2022

@deivid-rodriguez my idea was to support both old/new name scheme (at least for some time). At RubyGems.org it could just check the client version (or we can make some additional header to ask for new scheme) and respond with given scheme.

gem 'rails--activerecord'
gem 'rails/activerecord' # PURL compatible if I understand it well (or any new scheme incompatible with old gem naming could be here)

Would work the same for some time. The latter one just will not be compatible with older RubyGems and Bundler.

I can ask about PURL and plans of other packing repo maintainers at next OSSF meeting.

@deivid-rodriguez
Copy link
Member

Mmmm... I think we should provide a migration mechanism so that gem authors can opt in to the new naming scheme, while still being compatible with the old naming scheme. So even if the activerecord gem starts using rails/activerecord naming, plain activerecord in Gemfiles should still work (and by work I mean it should also pick up newer releases using the new scheme). Once users are ready (they are using up to date clients) they can start using the new scheme. And once Rails chooses to do so, it can stop providing releases with legacy naming.

I'm not really sure how we can make the above work, but if we could do it, what would be the purpose of rails--activerecord?


A scoped gem is any gem that is named following the pattern `gem@scope`, where “gem” is the name of the gem, and “scope” is an organization's reserved gem scope. Gem scopes are globally unique across all organizations.

As an example, consider a scoped gem defined as `[email protected]`. Note that both the gemspec’s `name` and the required file `lib/[email protected]` follows the `gem@scope` pattern:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my comment about "weird"... I just thought that: [email protected] looked a lot like an email, especially if gemspec were a TLD (seems that it isn't, nor is .rb).

I'm not saying the gem@scope pattern isn't good (and other symbols have other problems), but just something to be aware of. 🤔

@cyclotron3k
Copy link

cyclotron3k commented May 6, 2022

When I download a gem named qnap-download_station, I can usually guess that it's going to provide a module or class called Qnap::DownloadStation. I know it's not guaranteed, but it's a common practice, and it's definitely the recommended naming convention.

Adherence to this naming convention, combined with the unique constraint on gem names helps prevent/reduce namespace collisions.

So if config@foo-company, config@bar-widgets, config@acme all provide a Config class, things are going to get messy very quickly. It now becomes incumbent upon the gem developer to manually check for anyone else using that namespace?

Wouldn't it make more sense to put the owner organisation in the Gem's metadata (a new, dedicated field, not the actual metadata field perhaps?) - treat it as a first-class concept instead of overloading the gem name?

@djberg96
Copy link

djberg96 commented May 6, 2022

I'm late to the conversation, but couldn't we add an organization attribute to Gem::Specification? And then modify gem install to allow an organization attribute?

I'm really not following the purpose of the '@' for scoping, and why the org name wouldn't be enough scoping.

I'm also trying to remember what Perl did for package management. I thought they had some way to scope them already, but my memory is fuzzy.

@ioquatix
Copy link

ioquatix commented May 6, 2022

How would I structure:

socketry/async -> async.gemspec -> async@socketry? async@async?
socketry/async-http -> async-http.gemspec -> http@async? async-http@socketry?
socketry/db -> db.gemspec -> db@socketry? db@async?
socketry/db-postgres -> db-postgres.gemspec -> db-postgres@socketry? db-postgres@async? postgres@db?

It would be good to understand what is best practice to see if the proposed model fits real world use cases (or what real world use cases fit the proposed model).

@halostatue
Copy link

I believe that the proposal would have you create (to pick one of them) [email protected] (although I would prefer [email protected] as I do prefer scope@gem to gem@scope, despite the clear explanation provided). I would likely do core@mime-types (mime-types@core) and data@mime-types (mime-types@data) for the mime-types gems…although I’m also not entirely sure that I would stop publishing just mime-types.

@djberg96 I think we should add an organization (or, if you prefer, scope) field to the gemfile, but I think that there is value in making the organization/scope part of the scoping capabilities at the command-line and in Bundler without getting excessively verbose.

I would prefer to do any one of the following

$ gem install @mime-types/core
$ gem install mime-types@core
$ gem install core@mime-types

Over:

$ gem install mime-types --organization mime-types
$ gem install core --organization mime-types

Similarly, I would prefer for Bundler gem 'core@mime-types' over gem 'core', organization: 'mime-types'.

In general, I think that this is a cogent RFC, but it feels incomplete to me, as I think that it needs to have a clearer indication that scoped packages (either personal @halostatue/diff-lcs or organizational @mime-types/core, to use NPM-style notation) are going to end up making the transition into the gemspec and rubygems.org (and other implementations like geminabox) fairly seamlessly.

I’m not sure we should wait for perfect before we act on this, if we act on this. I think that there are several issues here of which the scoping of gem names is only one. This gets into:

  • name scoping (obviously)
  • fighting typo squatting (partially)
  • (partially) enabling "verified" packages (by the use of organizations)

In some ways, a lot of this (except name scoping) could be fixed if we had a better way of implementing signed gems (and verifying those signatures), rather than using namespaces and Rubygems.org as pure sources of truth. But, as someone who did sign gems early on and found the process excruciating (and no one verified them anyway), I don’t think that’s going to be a solution to any of these problems.

Not quite sure how to move forward on this. It’s a good RFC. I’m not sure it is good enough, but I also don’t know that waiting for a better RFC is going to give us anything this decade.

@cyclotron3k
Copy link

Similarly, I would prefer for Bundler gem 'core@mime-types' over gem 'core', organization: 'mime-types'.

But if organisation was part of the gem metadata instead of the gem name, then there would be no ambiguity in the gem name (due to the uniqueness constraint), and therefore specifying the organisation would be unnecessary, no?

@zarqman
Copy link

zarqman commented May 6, 2022

I really like the idea of adding scopes.

Like many others, I just find that the gem_name@scope naming feels backwards. I think it's because we're very used to the order of less-specific/more-specific. Some examples:

Ruby modules: Module::Class
File system directories: parent/child
GitHub organizations/scopes: github.com/org/project
npm: @scope/package
PHP: scope/package
Java: com.scope.name

Go back in Rubygems' own history to the original gemcutter and we had username-gemname too.

Consider also that Ruby gems themselves often live in GitHub, where they might be named github.com/scope/gem_name.

Or look at the already discussed example of aws-sdk-s3, which exposes the Aws::S3 namespace. While Aws::S3 doesn't exactly match aws-sdk-s3, it'd be even more unnatural if the naming is reversed to s3@aws or s3@aws-sdk. Likewise for @ioquatix's example that if async-http becomes http@async, that's reversed from the included module Async::HTTP.

For those who care about code aesthetics and are prone to aligning related lines (including myself):

gem 'async/async'  # feels normal
gem 'async/http'
gem 'async/websocket'
# vs
gem     'async@async'  # awkward!
gem      'http@async'
gem 'websocket@async'

I'd strongly prefer to see the Ruby ecosystem choose a path that's consistent with everyone else and go with scope/name. Or scope.name. Or even scope@nameor scope--name. Or anything that puts the scope first.

I'll also suggest that consistency with other languages and ecosystems keeps things more accessible to new Ruby developers. If we invert the common ordering, then it's one more thing that blog posts and books will have to address. It adds mental overhead.

@ghost

This comment was marked as abuse.

@djberg96
Copy link

djberg96 commented May 6, 2022

I agree that this won't prevent typo squatting since, as @andy-tycho says, you could just as well typo-squat an organization, unless we add some sort of auth hook, which is outside the scope of the project.

Where an org/scope attribute would come in handy would be for plugins that could hook into it as they see fit. Any sort of auth could be handled by the plugin author instead of us, and then users could choose to use that plugin or not.

It's not a huge deal to me either way, and it can potentially be handled by metadata of course, but it's more likely to be used IMO if it's a Specification attribute.

@Fryguy
Copy link

Fryguy commented May 6, 2022

With respect to gem@scope vs scope@gem, they are both awkward in different ways and I think we are arguing the order without also pointing out that the @ symbol itself is part of the awkwardness. For me, when I "read" the @ symbol, I say "at", which implies the thing following it is a larger container (e.g. location, domain, scope, etc). So gem@scope makes much more sense than scope@gem when only considering the @. That being said the argument made by @zarqman of bigger->smaller being more consistent in Ruby is a really good one...if we go with bigger->smaller, then I think @ is the wrong symbol choice. Unfortunately, I'm not sure what a good symbol choice would be considering that the other expected choices of /, : have downsides as @halostatue mentioned.

@halostatue
Copy link

Similarly, I would prefer for Bundler gem 'core@mime-types' over gem 'core', organization: 'mime-types'.

But if organisation was part of the gem metadata instead of the gem name, then there would be no ambiguity in the gem name (due to the uniqueness constraint), and therefore specifying the organisation would be unnecessary, no?

I’m not sure what you’re saying here, @cyclotron3k. I think that there is potential value in having scoping be part of a name to expand the "universe" of available package names. The points raised about how namespacing works in Ruby (e.g., when you use mime-types, you can generally expect that the top-level namespace is MIME, Mime, or MimeTypes (it’s MIME::Types) and that @mime-types/core might offer either MIME::Types::Core or just Core — but I think that this is a social problem to be handled).

While I don’t think that Rubygems should be enforcing any names inside the gems (after all, there can be value in publishing a gem that monkey patches another gem to fix a bug in the original gem that hasn’t been updated…), I think that we (the Ruby community) could consider how scoped gems should expose their namespaces.

I know that I haven’t really been part of any of the previous discussions on this topic (mostly because I haven’t been aware of them), but I think that they’re fascinating. @mullermp, thank you for writing this RFC and getting the discussion going to a wider viewpoint.

Even though this is really a proposed Rubygems RFC, I wonder if this might not be something that should be raised on ruby-core. It’s not part of MRI implementation as such, but Rubygems and Bundler are so deeply part of the overall Ruby development experience at this point, it probably needs additional eyes on it.

@qrush
Copy link
Member

qrush commented May 6, 2022

Hi all-

I'd like to leave an idea here that perhaps should be a separate RFC: if/when rubygems.org starts to add scoped gems support, it feels like that should be on an annual subscription service that goes directly to supporting the costs of running/maintaining rubygems.org. RubyCentral and many sponsors (see the footer on rubygems.org) covers these costs now, but this feels like a great way for the community to actively buy-in and support the central piece of the ecosystem we rely on, and especially with a feature that is geared towards organizations that have the ability to pay for services they use.

I don't know what the right amount to charge is - maybe a community survey should be held to determine that $ amount. Alternatives to gem scoping would still work fine: dashes in gem names, or a company hosting their own private gem server, so I am hoping that will be seen a reasonable workaround if an organization cannot afford the charge or refuses to pay.

❤️

@zarqman
Copy link

zarqman commented May 6, 2022

Under Alternative 3, the original proposal hints at "implementation and legacy complications" arising from the use of / as the scope/name separator.

While I like the aesthetic of /, I do think that these complications are substantial and worth outlining more extensively.

Let's use async-http as our example. Further, let's assume that it would become async/http or @async/http. (I will use ~ to indicate the root directory/git repos of the gem itself.)

async-http presently:

  1. Assumes the gemspec is located at ~/async-http.gemspec.

  2. Installs:
    GEM_HOME/cache/async-http-1.2.3.gem
    GEM_HOME/gems/async-http-1.2.3/
    GEM_HOME/specifications/async-http-1.2.3.gemspec
    GEM_HOME/extensions/x86_64/3.0.0/async-http-1.2.3/ (if applicable)

  3. Inside Ruby:
    Adds GEM_HOME/gems/async-http-1.2.3/lib to $:.
    require 'async-http' looks for GEM_HOME/gems/async-http-1.2.3/lib/async-http.rb by convention.

  4. gem build outputs ~/pkg/async-http-1.2.3.gem

Now, comparing to async/http and creating directories for each scope (comparable to node_modules/):

  1. Should the .gemspec now be stored in ~/http.gemspec or ~/async/http.gemspec?
    If the latter, gem build would no longer be able to just search for *.gemspec, but would also have to include */*.gemspec.

    How do either of these choices work with Gemfile commands like:
    gem 'async/http', github: 'async/http'
    gem 'async/http', path: '/some/arbitrary/path'

    It would seem that :git or :github would need to parse the scope from the .gemspec during install and then deliberately install into gems/scope/gemname-x.y.z. This might create a chicken-and-egg problem of needing to know gem.name before checking out the tree, but needing to check out the tree before reading the .gemspec (I'm unsure how Rubygems handles this now).

    However, for :path, it seems likely that the scope has to be somewhat internally discarded as the express path has already been provided. See (3) below for implications of this.

  2. Now installs:
    GEM_HOME/cache/async/http-1.2.3.gemspec
    GEM_HOME/gems/async/http-1.2.3/
    GEM_HOME/specifications/async/http-1.2.3.gemspec
    GEM_HOME/extensions/x86_64/3.0.0/async/http-1.2.3/ (if applicable)

  3. $: now contains GEM_HOME/gems/async/http-1.2.3/lib.
    This makes require 'async/http' ambiguous.
    Should it be looking for async/http.rb inside the async/http gem? Or should it be looking for async/http.rb (yes, identical) inside the async gem? What happens when both exist?

    Perhaps require is modified to recognize scopes by prefixing the scope with @ (same reason npm does it, I believe):
    require 'async/http' doesn't have @ and so filters $: to only look at unscoped paths for ~/lib/async/http.rb.
    require '@async/http' parses out the @async, filters $: for only GEM_HOME/gems/async/*, then treats it like require 'http' and looks for ~/lib/http.rb.

    But, as noted in (1), use of gem '@async/http', path: '...' won't reliably have scope/ as part of the pathname in $:, so filtering on $: won't work. This suggests that $: itself would need to be reworked, possibly as a hash: {'@async'=>[..], nil=>[..unscoped paths..]}. This creates backward compatibility concerns.

  4. Do gem build and related commands output pkg/async/http-1.2.3.gem now? This seemingly changes how rubygems.org handles uploads, file storage, routes, etc. (as already noted in Alternative 3.)

Alternative 3 also mentions escaping /. I suggest that while this preserves a flat namespace, it makes a mess trying to ensure every tool gets the escaping/unescaping correct.

Further, it still doesn't resolve ambiguity with require 'async/http'. Should the / be escaped or not? If escaped, Ruby looks in $: for http.rb and hopefully finds it at GEM_HOME/gems/async%2Fhttp-1.2.3/lib. If not escaped, Ruby looks for async/http.rb and perhaps finds it in GEM_HOME/gems/async-1.2.3/lib.

That's a lot of change and seems to me like it would require bumping Rubygems to 4.x as it's pointing towards breaking backward compatibility. If it requires changing the behavior of $:, then it's also a major change within Ruby itself.

@zarqman
Copy link

zarqman commented May 6, 2022

I agree with @halostatue that scopes should be part of the name, not a separate scope field on the spec. If there are two gems with the same name, but different scopes, a separate field creates some of the same ambiguities as organizing scoped gems into subdirectories. The only way to avoid this is to join scope+name together in nearly all usage, and if that's the case, it seems better to treat them as one from the start.

Potential places for naming collisions (with . as the example separator):
Gem.loaded_specs['async.http'] = ...
~/async.http.gemspec
GEM_HOME/cache/async.http-1.2.3.gem
GEM_HOME/gems/async.http-1.2.3/
GEM_HOME/specifications/async.http-1.2.3.gemspec
GEM_HOME/extensions/x86_64/3.0.0/async.http-1.2.3/ (if applicable)
Gem entrypoint: lib/async.http.rb

In all cases, if async. is missing, async.http cannot be differentiated from faraday.http, etc. since they'd all become simply http.

@andrewhavens
Copy link

I think scoped gems is an interesting idea, but I'm not sure if there is a way to support it without changing the Ruby language itself. In other languages, you can import different packages with the same name and scope them within the file you are working in. In Ruby, gems are essentially globally namespaced.

The problem that was raised in this proposal about wanting to use a forked version is already achieved through the use of bundler:

# Gemfile
gem 'mail', github: 'rails/mail'

This makes it clear that we are using a forked version of a gem. Thus all gems that have a dependency on mail will be forced to use this specific fork.

If this definition were pushed down to the gemspec level, this would make things very complicated, and even dangerous. Let's say Rails wants to depend on a forked version of a gem using something like gem.dependency '@rails/mail', but what happens when another gem also has a dependency on the same gem? Does Rails get to decide that it has priority simply because it specified a specific username/org? This would open up the possibility of a gem specifying a malicious version as a dependency that takes precedence over the normal version.

So, I think this is already achieved in a reasonable way using Bundler. Might be nice to have an easy way to be able to download a gem from GitHub without having to clone it. Like gem install @rails/mail but that seems like a separate issue. I agree though that @username/gemname should be the format since it is the most intuitive.

@ioquatix
Copy link

ioquatix commented May 6, 2022

I think the value as a gem maintainer I see in scopes is two things:

  • I want to prevent people from releasing gems into the namespace I document to my users as being official, e.g. I want to have a namespace like socketry and prevent other users from releasing socketry/async-hax
  • I want to retain control over gem names within my own namespace, to avoid running into issues where I create async-x async-y and then finally find someone 10 years ago released async-z and won't give it up and now I have to release async-z2 or async-less-good-than-z.

I've run into both of the above problems. Both of them are about predictability and risk management.

The biggest problem I see is:

  • Top level gems are more valuable but it feels to me to be a bit of a graveyard at times with lots of good gem names essentially squatted e.g. I want to use this gem X but someone last released a gem 5-10 years ago and won't give up the name, OR as a user I'm trying to find a gem for X but find 10 gems but most/all of them are stale/old/unmaintained - how do I find the relevant one (as suggested earlier things like Ruby Toolbox help a lot here).
  • Nested gems like socketry/async don't have a clear relationship to top level names like async, i.e. what's the migration process? Do I use both? Release both? How does gem install async work when I have socketry/async, async and someone else might have created socketey/async (hax).

My feeling is, organisations or scopes should not change the name resolution process we already have, but instead provide a better more secure way for users to procure gems.

If we are thinking big picture, I'd suggest:

  • Every user has a username specific scope, e.g. ioquatix -> ~ioquatix/my-gem-name -> my/gem/name.rb. Everyone gets this.
  • Non-username scopes, e.g. organisations / scopes, e.g. @socketry/async -> async.rb. Should be paid or gifted to major open source groups e.g. rails, rack, puma, socketry, etc.
  • Top level gem namespace exists but we should not encourage it's use by default, i.e. new rails projects should pull in explicitly and deliberate gem sources.
  • People can specify a resolution order in gem files, e.g.
source "https://rubygems.org/@rack` do
  gem "rack"
end

source "https://rubygems.org/@rails` do
  gem "rails" # -> depends on "rack" which is satisfied only by the current listed sources, e.g. [@rack, @rails]
end

source "https://rubygems.org" # general global index

gem "rando-whatever" # can pull in from [@rack, @rails, global index] in that order.

The good thing about this model is it allows you to fork a gem (as rails did with mail) and plug it in as a named dependency without breaking dependency resolution (because you'd need a different name to push it to rubygems.org).

This design don't require any changes to name handling and I don't think we should change the name handling because it will break every system that depends on name-based dependency resolution etc.

Based on my above suggestions, it would not be possible to install both ~ioquatix/async and @socketry/async and that's by design because it's super confusing and I don't think scopes should be involved in final name resolution, but they are more of a feature of how to organise dependency management and gem fetch/installation.

@ghost

This comment was marked as abuse.

@indirect

This comment was marked as off-topic.

@indirect
Copy link
Member

indirect commented May 7, 2022

I don't think we should provide a scope mechanism that allows differently-named gems to provide the same global Ruby constants. I survived GitHub's original gem server, and I still have scars from trying to use an app whose gems depended on both tenderlove-nokogiri and nokogiri, which both claimed the constant Nokogiri. Bundler also can't help in that situation, because the gems have different names, different versions, and different dependency trees. In my opinion, this RFC needs a clear solution to that problem to move forward.

To me, @ioquatix's proposal to treat orgs as additional gem sources sounds like the most likely to work under the constraints we have today. For example, gems that depend on mail will continue to work whether mail comes from the global source or the @rails source, and Bundler can ensure there is only one gem named mail claiming the Mail constant.

@bkuhlmann
Copy link

Hey everyone. 👋

After reading through the RFC and this discussion, I want to add some thoughts/observations in hopes that this enriches the discussion (although I might be somewhat counter to André's concerns above -- maybe because I'm missing context to earlier days with GiHub's original gem server):

Gem Specification Scopes

In terms of gem specifications, I want to focus specifically on the use of scope -- as described in the RFC. I'd like to emphasize the importance of this within the gem specification as a new field:

Gem::Specification.new do |spec|
  spec.scope = "dry"  # This is important for many reasons which I'll highlight shortly.
  spec.name = "monads"
  # Truncated for brevity.
end

As the author and maintainer of Gemsmith -- a gem for building gems -- this would allow organizations and individual contributors to configure this information once via Gemsmith's XDG configuration. This equates to being able to build a gem as follows:

# Uses global scope as exists today or pulls scope from XDG configuration (if configured).
# This is a nice productivity boost when building multiple gems within the same scope.
gemsmith --build demo

# Uses custom scope which overrides any XDG configured local or global scope.
# Definitely tedious when creating multiple gems within the same scope -- if not using an XDG configuration -- but handy for one-time overrides.
gemsmith --build monads --scope dry

This also means that Gemsmith -- and Bundler -- wouldn't have to add special logic for parsing a gem name -- at creation -- by splitting dry@monads into dry (gem scope) and monads (gem name). Even better, we improve the developer experience for creating new gems by not forcing someone to have to type this:

gemsmith --build dry@monads
At (@) Symbol Avoidance

Building upon what I've demonstrated above, I'd like to push for avoiding the use of the at symbol (@) within the gem name, package, and URL altogether for the following reasons:

  • Use of <gem>@<scope> is backwards, awkward, and not intuitive which many have pointed out already.
  • Use of <scope>@<gem> is better but -- as Dan pointed out earlier -- feels more like an email address which confuses me as well.
  • As Maciej mentioned earlier, use of @ is not entirely compatible with the Package URL Specification and would best be reserved for version information.
  • Use of @ in the pathname also feels awkward and non-intuitive to me. Example: $HOME/<truncated>/3.1.2/lib/ruby/gems/3.1.0/gems/[email protected].
  • Use of @ in the URL doesn't make sense either. Example: https://rubygems.org/gems/dry@monads

I'd like to suggest as an alternative which is to keep scope information in the gem specification. Then both Bundler and RubyGems would be able to do the following:

Paths (gem installation and management)

# Global scope as exists today.
$HOME/<truncated>/3.1.2/lib/ruby/gems/3.1.0/gems/dry-monads-1.4.0

# Scoped as being proposed.
# NOTE: `@` is removed in favor of using `dry` as a scoped directory structure.
$HOME/<truncated>/3.1.2/lib/ruby/gems/3.1.0/gems/dry/monads-1.4.0

⚠️ There are definitely complications with this approach that I'm glossing over as Thomas has detailed here but I think they are surmountable.

URLs (gem lookup)

# Global scope as exists today.
https://rubygems.org/gems/dry-monads

# Scoped as being proposed.
# NOTE: `@` is removed in favor of using `dry` as a scoped directory structure.
https://rubygems.org/gems/dry/monads

ℹ️ In all of the above use cases -- and as emphasized in the RFC -- the gem namespace would remain the same regardless of using global or specialized scope. Example:

module Dry
  module Monads
  end
end

The only difference is how Bundler finds and resolves the gem locally (i.e. either using the scope if defined or falling back to global if not) and how RubyGems lists the gem in the URL (which also depends upon the gem specification).

Graceful Degradation, Soft Forking, and Migration

So far everything I've been proposing allows for graceful degradation, soft forking, and gem transition support. By this, I mean gems can exist as they are today with support for scoped coexistence while falling back to the existing and established format. To summarize:

Gem Specification

# Valid
Gem::Specification.new do |spec|
  spec.name = "dry-monads"
  # Truncated for brevity.
end

# Valid
Gem::Specification.new do |spec|
  spec.scope = "dry"
  spec.name = "monads"
  # Truncated for brevity.
end

Paths

# Valid (global)
$HOME/<truncated>/3.1.2/lib/ruby/gems/3.1.0/gems/dry-monads-1.4.0

# Valid (scoped)
$HOME/<truncated>/3.1.2/lib/ruby/gems/3.1.0/gems/dry/monads-1.4.0

⚠️ Keep in mind two formats of the same gem version would not be allowed. I'm only showing the same version for path comparison purposes.

URLs

# Valid (global)
https://rubygems.org/gems/dry-monads

# Valid (scoped)
https://rubygems.org/gems/dry/monads

All of this means that you can do the following:

  • Gracefully degrade to global scope if a custom scope isn't provided.
  • Allow gems to be soft forked by using my_scope/monads as a temporary quick fix while the main gem catches up.
  • Allow existing gems to migrate to the new scoped format by releasing a new version which adds the scope to their gemspec.

None of what I've written above addresses the name squatting problem, though. That is still a complication which has been mentioned in this discussion but probably warrants a different proposal.

@indirect
Copy link
Member

indirect commented May 8, 2022

@bkuhlmann I think your suggestion is aligned on the end goals: scopes need a way to avoid global namespace conflicts. 👍🏻 I might have missed it, but l didn’t see anything in your post to address “scoped forks”, like the Rails org creating their own Mail gem that is an alternative/replacement for the global Mail gem. How would you handle that?

@ghost

This comment was marked as abuse.

@indirect

This comment was marked as off-topic.

@schmijos
Copy link

schmijos commented May 9, 2022

I'm late to the conversation, but couldn't we add an organization attribute to Gem::Specification? And then modify gem install to allow an organization attribute?

I'm really not following the purpose of the '@' for scoping, and why the org name wouldn't be enough scoping.

@halostatue I agree with @djberg96 mainly for the reason that we already have got possibilities to "scope" gems. We can already do git, github, ref, branch and whatsoever.

@mullermp
Copy link
Author

mullermp commented May 9, 2022

I must as well note that I am fairly surprised that such a discussion happens without @matz in sight. And @dhh. And @ko1. And @tenderlove, @hone, @amatsuda, @tmm1, @kobaltz, and all the other prominent locomotives and shapers of the Ruby world.

I would love for the "prominent locomotives and shapers of the Ruby world" to comment and contribute to this RFC!

@kobaltz
Copy link

kobaltz commented May 9, 2022

I think that this could be a good change for the future of Ruby as a whole. My main concern is backwards compatibility with existing applications. Though, they're probably running an older version of Ruby and rubygems anyways, so it likely wouldn't matter as long as the API was backwards compatible.

Ultimately, what is the goal of scoped gems and what problem is it solving? Based on the conversations above, it looks clear that the scoped gems is providing "confidence" that a consumed gem is from a certain organization. If this is the goal, then sure, it is moving in the right direction.

However, there was other mentions of consuming potentially malicious gems and squatting on names. Sure, this would help combat the squatting on gem names as things are now scoped. However, I'd push back on the malicious gems bit. Someone could create a fake org scope like hotwire instead of hotwired and then publish something malicious there. At a glance, it may look legit. If this is a main reason for the scoped gems, I don't think it will solve the problem that it is aiming to solve. Although, if there is a requirement for an organization to be verified in order to gain access to the scoped gems, then we may be on track to having more legitimacy to the new convention.

As far as the naming convention. I'll go with the flow. I don't have a preference on gem@scope or scope@gem or @scope/gem. gem@scope does give a more normal feel as it is similar to user@server.

@mullermp
Copy link
Author

I want to thank everyone for providing their feedback and perspectives on this. I think the next step here is to consolidate/parse the feedback and determine what changes are needed. At a glance, the idea seems to be overwhelmingly positive, but the approach (naming and usage) is mixed. I understand that we can't please everyone. Hopefully we can strike a happy medium here.

@bkuhlmann
Copy link

André: l didn’t see anything in your post to address “scoped forks”, like the Rails org creating their own Mail gem that is an alternative/replacement for the global Mail gem. How would you handle that?

Yeah, fair point. I don't address that very well and I'm not sure I have a good answer other than what I commented on earlier and what Samuel mentions in his comment (i.e. the Rails Mail gem example) without thinking through the directory and URL path design a bit more (as well as eliciting more feedback). I agree there are caveats to think through and address better. Something that would be a huge help is to see the RFC be brought up-to-date with the current discussion so, at a high level, everyone is back on the same page and can help progress the design even further.

Matt: Maybe you can update your RFC -- if you are not already in the process of doing this -- to detail the directory path design as mentioned in these comments and discussion? If your RFC was brought up-to-date with the current discussion then it'd be easier to iterate on this a bit more?

@ioquatix
Copy link

Would it be helpful for me to write a RFC for the proposal I outlined too? Even if it's just as a counter point?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.