Export `PATH_SEGMENT` and other percent encoding `AsciiSet`s #529

sdroege · 2019-07-24T10:35:41Z

As these are defined by the URL spec, it would be convenient to have them available from either the url crate or the percent-encoding crate.

The text was updated successfully, but these errors were encountered:

SimonSapin · 2019-07-24T10:55:52Z

PATH_SEGMENT is private in the url crate on purpose. Previously there was a series of issues open in this tracker requesting adding sets to the percent_encoding crate for various use cases, although there was already the ability to define new sets for various use cases that I consider out of scope.

I hope that going the opposite direction and having a minimal amount of built-in sets will reinforce the point that defining your own is normal and expected. NON_ALHPANUMERIC is the (overly) conservative default for users who might insist they can’t be bothered.

sdroege · 2019-07-24T11:15:22Z

I thought it might be useful to have pre-defined sets for the ones given in the URL spec. If you disagree that's also OK with me, I already have my own defined for the specific case I need :)

SimonSapin · 2019-07-24T11:27:19Z

Yes it would be useful to have them. And it would be useful to have one for HTTP Content-Disposition. And one for other HTTP headers (it’s different apparently?). And one for MySQL. And…

So yes, closing since not doing that is on purpose :)

sdroege · 2019-07-24T11:43:31Z

Yes but those HTTP specific ones would be in some HTTP crate, the MySQL ones in a MySQL crate, etc.. I don't mean to add it to percent-encoding but to url :) Anyway, works for me!

dead10ck · 2019-08-03T17:45:25Z

@SimonSapin I definitely understand not adding every possible encoded set for every use case under the sun, but surely the few that were defined in 1.x are some of the most common. If they are defined internally anyway, why not export them? Why make every consumer repeat the same sets? I just don't understand what the maintainers or users gained from removing sets from the exports.

SimonSapin · 2019-08-05T15:01:48Z

surely the few that were defined in 1.x are some of the most common

But are they really?

The fact that there is more than one set used only for the purpose of parsing URLs, and that the specification for them has changed at least once in the last few years, shows that deciding which characters to encode is not obvious and highly context-dependent. (Unless you want to be conservative and encode more than might be necessary.)

why not export them

To nudge users toward taking the time to consider what set they actually need.

dead10ck · 2019-08-05T17:16:37Z

@SimonSapin

I understand it is context-dependent, but we are talking about parsing and encoding things for URLs, which is arguably one of the most common and essential tasks in any code base involving transmission of data over HTTP. Paths, query strings, etc. are going to be super common contexts, are they not?

I definitely agree that it is good to nudge users to think about what encode set they actually need for their context, but in my opinion, this was already accomplished just by the mere existence of distinct encode sets, and the fact that you must choose one to do an encoding at all.

SimonSapin · 2019-08-05T17:34:20Z

If you’re manipulating an URL’s path, why would you need to call percent_encode yourself? Url::set_path already does. Similarly for other URL components. If you’re manipulating something that is not an URL, in what cases does this happen to need the exact same set as one of the URL components?

dead10ck · 2019-08-05T23:10:52Z

@SimonSapin if you're asking what my use case is, it's for a little CLI utility that URL-encodes its input. Its intended use was primarily a convenient tool for bash scripts that need to construct a URL (or set of URLs) to make requests to. It need not actually construct full URL objects, but its ultimate goal is nonetheless for use within URLs. To upgrade to 2.0, I will have to duplicate all of those encode sets.

My use case is admitedly for a small personal project, but I imagine there are other use cases where people need the URL encode sets without actually constructing full URL objects.

Let me be clear: I'm not trying to pester you here. It's not the end of the world if I have to duplicate those encode sets. I'm just not sure I understand what the downside is to re-exporting them, whereas there was a clear downside to users to remove the exports.

SimonSapin closed this as completed Jul 24, 2019

benesch mentioned this issue Jul 30, 2019

v2.0 upgrade path #532

Closed

SimonSapin mentioned this issue Aug 26, 2019

percent-encoding: make sets be values of one type, instead of types that implement a trait #519

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export `PATH_SEGMENT` and other percent encoding `AsciiSet`s #529

Export `PATH_SEGMENT` and other percent encoding `AsciiSet`s #529

sdroege commented Jul 24, 2019

SimonSapin commented Jul 24, 2019

sdroege commented Jul 24, 2019

SimonSapin commented Jul 24, 2019

sdroege commented Jul 24, 2019

dead10ck commented Aug 3, 2019

SimonSapin commented Aug 5, 2019

dead10ck commented Aug 5, 2019

SimonSapin commented Aug 5, 2019

dead10ck commented Aug 5, 2019 •

edited

Loading

Export PATH_SEGMENT and other percent encoding AsciiSets #529

Export PATH_SEGMENT and other percent encoding AsciiSets #529

Comments

sdroege commented Jul 24, 2019

SimonSapin commented Jul 24, 2019

sdroege commented Jul 24, 2019

SimonSapin commented Jul 24, 2019

sdroege commented Jul 24, 2019

dead10ck commented Aug 3, 2019

SimonSapin commented Aug 5, 2019

dead10ck commented Aug 5, 2019

SimonSapin commented Aug 5, 2019

dead10ck commented Aug 5, 2019 • edited Loading

Export `PATH_SEGMENT` and other percent encoding `AsciiSet`s #529

Export `PATH_SEGMENT` and other percent encoding `AsciiSet`s #529

dead10ck commented Aug 5, 2019 •

edited

Loading