Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export PATH_SEGMENT and other percent encoding AsciiSets #529

Closed
sdroege opened this issue Jul 24, 2019 · 9 comments
Closed

Export PATH_SEGMENT and other percent encoding AsciiSets #529

sdroege opened this issue Jul 24, 2019 · 9 comments

Comments

@sdroege
Copy link
Contributor

sdroege commented Jul 24, 2019

As these are defined by the URL spec, it would be convenient to have them available from either the url crate or the percent-encoding crate.

@SimonSapin
Copy link
Member

PATH_SEGMENT is private in the url crate on purpose. Previously there was a series of issues open in this tracker requesting adding sets to the percent_encoding crate for various use cases, although there was already the ability to define new sets for various use cases that I consider out of scope.

I hope that going the opposite direction and having a minimal amount of built-in sets will reinforce the point that defining your own is normal and expected. NON_ALHPANUMERIC is the (overly) conservative default for users who might insist they can’t be bothered.

@sdroege
Copy link
Contributor Author

sdroege commented Jul 24, 2019

I thought it might be useful to have pre-defined sets for the ones given in the URL spec. If you disagree that's also OK with me, I already have my own defined for the specific case I need :)

@SimonSapin
Copy link
Member

Yes it would be useful to have them. And it would be useful to have one for HTTP Content-Disposition. And one for other HTTP headers (it’s different apparently?). And one for MySQL. And…

So yes, closing since not doing that is on purpose :)

@sdroege
Copy link
Contributor Author

sdroege commented Jul 24, 2019

Yes but those HTTP specific ones would be in some HTTP crate, the MySQL ones in a MySQL crate, etc.. I don't mean to add it to percent-encoding but to url :) Anyway, works for me!

@dead10ck
Copy link

dead10ck commented Aug 3, 2019

@SimonSapin I definitely understand not adding every possible encoded set for every use case under the sun, but surely the few that were defined in 1.x are some of the most common. If they are defined internally anyway, why not export them? Why make every consumer repeat the same sets? I just don't understand what the maintainers or users gained from removing sets from the exports.

@SimonSapin
Copy link
Member

surely the few that were defined in 1.x are some of the most common

But are they really?

The fact that there is more than one set used only for the purpose of parsing URLs, and that the specification for them has changed at least once in the last few years, shows that deciding which characters to encode is not obvious and highly context-dependent. (Unless you want to be conservative and encode more than might be necessary.)

why not export them

To nudge users toward taking the time to consider what set they actually need.

@dead10ck
Copy link

dead10ck commented Aug 5, 2019

@SimonSapin

I understand it is context-dependent, but we are talking about parsing and encoding things for URLs, which is arguably one of the most common and essential tasks in any code base involving transmission of data over HTTP. Paths, query strings, etc. are going to be super common contexts, are they not?

I definitely agree that it is good to nudge users to think about what encode set they actually need for their context, but in my opinion, this was already accomplished just by the mere existence of distinct encode sets, and the fact that you must choose one to do an encoding at all.

@SimonSapin
Copy link
Member

If you’re manipulating an URL’s path, why would you need to call percent_encode yourself? Url::set_path already does. Similarly for other URL components. If you’re manipulating something that is not an URL, in what cases does this happen to need the exact same set as one of the URL components?

@dead10ck
Copy link

dead10ck commented Aug 5, 2019

@SimonSapin if you're asking what my use case is, it's for a little CLI utility that URL-encodes its input. Its intended use was primarily a convenient tool for bash scripts that need to construct a URL (or set of URLs) to make requests to. It need not actually construct full URL objects, but its ultimate goal is nonetheless for use within URLs. To upgrade to 2.0, I will have to duplicate all of those encode sets.

My use case is admitedly for a small personal project, but I imagine there are other use cases where people need the URL encode sets without actually constructing full URL objects.

Let me be clear: I'm not trying to pester you here. It's not the end of the world if I have to duplicate those encode sets. I'm just not sure I understand what the downside is to re-exporting them, whereas there was a clear downside to users to remove the exports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants