fix userinfo in case of some special characters #144

kastakhov · 2024-08-17T22:28:27Z

Fix for #143

add some UT for this case

oalders · 2024-08-20T01:03:58Z

lib/URI/_server.pm

-	}
+    if ($str =~ m,^((?:$URI::scheme_re:)?)//(.*:.*@)?([^/?\#]*)(.*)$,os) {
+        my $scheme = $1;
+        my $ui = $2 || '';


Could we have a more descriptive name for this variable?

I realize this variable was not your creation. :)

Sure thing, I have renamed to $userinfo.
I decided to rename in whole file as it might be a little confusing when we're using $ui or $userinfo.

oalders · 2024-08-20T15:50:02Z

lib/URI.pm

@@ -123,7 +123,7 @@ sub _fix_uric_escape_for_host_part {
    return;
  }

-  if ($_[0] =~ m{^((?:$URI::scheme_re:)?)//([^/?\#]+)(.*)$}os) {
+  if ($_[0] =~ m{^((?:$URI::scheme_re:)?)//((.*:.*@)?[^/?\#]+)(.*)$}os) {


From a discussion with @genio on IRC:

It would be nice to avoid (.*@) or (.*:.*@) as it's greedy and could be prone to bugs.

We could take inspiration from https://metacpan.org/dist/Mojolicious/source/lib/Mojo/URL.pm#L52-64

After gathering the host and port part, ^([^@]+@) should be fine.

Hm, but what if user wanna use '@' in password, using [^@]+@ will cause that the everything before first '@' symbol will be matched and password parsed incorrectly.

like the similar approach is already used in LWP/Protocol/http.pm.
And I already mentioned what issue it might cause in my PR.

For my standpoint (.*:.*@) is pretty lazy, but okay, I agree that it could be better.

@oalders, what you think about ([^:]+:.*@) for matching user info?

This will match everything except : as username cannot contain colon, then match : as separator, match everything after first colon as password or nothing as according to rfc3986#section-3.2.1 it might be empty, and finally match @ as second separator.

By the way, the regex which Mojo/URL.pm#L52-64 is used also cannot parse username/password if it contains any of /#? symbols :)
I know, it's really rarely usecase, but unfortunately I faced with it already two times :d

For instance, this regex with username and password which contain all special characters in unescaped form.

Hm, but what if user wanna use '@' in password, using [^@]+@ will cause that the everything before first '@' symbol will be matched and password parsed incorrectly.

Then it should be URL-encoded as %40. I'm not sure if there's any way around that. Something has to serve as a stopping point. If the RE is too greedy, it might end up stopping with a @ inside of the path, like http://host/path/directory@foobar/file.

If the RE is too greedy, it might end up stopping with a @ inside of the path, like http://host/path/directory@foobar/file.

Hm.
Now I see, it'll match this path incorrectly.

Well, actually I can revert this change as the most important part is in lib/URI/_server.pm#14.

There are another issue, if password contain multiple '@' symbols before any of '/#?' symbol URI parsing fails again :d

There are another issue, if password contain multiple '@' symbols before any of '/#?' symbol URI parsing fails again :d

If the password was inserted directly into the URL that way without percent-encoding, that's not really the URI module's fault. The password has to be encoded properly first. That's the point of escaping characters.

oalders · 2024-08-20T15:50:56Z

Thanks very much for this @kastakhov. If we could tweak the regex in both places as suggested, that would be helpful.

kastakhov · 2024-08-22T11:01:00Z

Hi, @oalders, @genio, any another suggestions for these changes or now it looks good for you?

SineSwiper · 2024-08-26T19:13:55Z

This is very adjacent to this issue, and feel free to tell me to open a different issue/PR, but...

The user/password methods allow bad data to pollute the URL (lines 45 and the similar line 22), because it is only doing a very light set of manual URL-escaping. It should, at a minimum escape potential @ characters in the password ($new =~ s/@/%40/g), and at a maximum, just call uri_escape for both user/password sets.

oalders · 2024-08-26T19:15:16Z

Is there a good reason not to just call uri_escape?

genio · 2024-08-26T19:27:39Z

I still don't think this is a good idea, as you're supposed to URI-escape things. Here's an example with both Mojo and URI:

#!/usr/bin/env perl

use v5.36;
use strict;
use warnings;

use Mojo::URL ();
use Mojo::Util ();
use URI ();
use URI::Escape ();

my $unescaped_u = 'u1!"#$%&\'()*+,-./;<=>?@[\]^_`{|}~';
my $unescaped_p = 'p1!"#$%&\'()*+,-./;<=>?@[\]^_`{|}~';

{
    # mojolicious way:
    my $username = Mojo::Util::url_escape($unescaped_u);
    my $password = Mojo::Util::url_escape($unescaped_p);
    my $foo = "http://${username}:${password}\@example.com/pub/a/2001/08/27/bjornstad.html";
    # say "Mojo: ", $foo;
    my $url = Mojo::URL->new($foo);
    # say "Mojo: ", $url->to_string();
    say "Mojo: ", $url->userinfo();
}

{
    # URI way
    my $username = URI::Escape::uri_escape($unescaped_u);
    my $password = URI::Escape::uri_escape($unescaped_p);
    my $foo = "http://${username}:${password}\@example.com/pub/a/2001/08/27/bjornstad.html";
    # say "URI:  ", $foo;
    my $url = URI->new($foo);
    # say "URI:  ", $url->as_string();
    say "URI:  ", URI::Escape::uri_unescape($url->userinfo());
}

Yielding:

% perl url.pl
Mojo: u1!"#$%&'()*+,-./;<=>?@[\]^_`{|}~:p1!"#$%&'()*+,-./;<=>?@[\]^_`{|}~
URI:  u1!"#$%&'()*+,-./;<=>?@[\]^_`{|}~:p1!"#$%&'()*+,-./;<=>?@[\]^_`{|}~

SineSwiper · 2024-08-26T21:06:57Z

While I agree everybody should URI-escape the UN/PWs, it should still try to safely parse as much as it can. IMO, [^:]+:[^@]+@ should work to capture userinfo without escaping out to parts that it shouldn't.

Also, I'll just create a different issue for the user/password set problem.

SineSwiper · 2024-08-26T21:30:18Z

Well, disregard my user/password set problem. I realized that userinfo is running through URI::Escape::escape_char($1) and everything works correctly, as long as it didn't start off confused (ie: URI->new with reserved characters in the password).

kastakhov · 2024-08-26T21:48:37Z

While I agree everybody should URI-escape the UN/PWs, it should still try to safely parse as much as it can. IMO, [^:]+:[^@]+@ should work to capture userinfo without escaping out to parts that it shouldn't.

Also, I'll just create a different issue for the user/password set problem.

Just in case, originally I faced with this issue here..

make _uric_escape regex more specific for userinfo part update UT

kastakhov · 2024-08-26T22:04:49Z

Now the only one issue still here.... when password contain '@' unescaped symbol before '/#?' symbols, user info part matches incorrectly. 🙃

genio · 2024-08-26T22:22:21Z

The seemingly relevant RFC pieces are linked here:
https://datatracker.ietf.org/doc/html/rfc3986#section-3.2.1
https://datatracker.ietf.org/doc/html/rfc3986#section-7.3
https://datatracker.ietf.org/doc/html/rfc3986#section-7.5

I think my cursory reading of these and understanding thus far leads me to believe we should leave things as-is and maybe provide documentation for how to provide complex authentication using URI::Escape::uri_escape. I believe making changes to the behavior here could result in unexpected double percent encoding for some folks who are already doing it the way in the sample script above.

SineSwiper · 2024-08-27T14:08:40Z

I found Section 2.2 earlier, but didn't see where those declarations were used. Good find.

So, this means we should stay away from unnecessarily parsing gen-delims in the userinfo section. At the very least, .* is way too broad.

Now the only one issue still here.... when password contain '@' unescaped symbol before '/#?' symbols, user info part matches incorrectly.

Again, a delimiter has to exist unescaped. There's no such thing as a parser algorithm that can take in every ASCII or Unicode unescaped character as a sub-field of the whole string. At some point, we have to force the user to escape their data before it gets dropped into a URL.

fix userinfo in case of some special characters

631eb5f

add some UT for this case

oalders reviewed Aug 20, 2024

View reviewed changes

rename $ui variable to $userinfo

ab3ff48

oalders reviewed Aug 20, 2024

View reviewed changes

make regex in _uric_escape more specific

6113394

revert changes in URI.pm

837315c

make _uric_escape regex more specific for userinfo part update UT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix userinfo in case of some special characters #144

fix userinfo in case of some special characters #144

kastakhov commented Aug 17, 2024

oalders Aug 20, 2024

oalders Aug 20, 2024

kastakhov Aug 20, 2024

oalders Aug 20, 2024

kastakhov Aug 20, 2024 •

edited

Loading

kastakhov Aug 20, 2024 •

edited

Loading

SineSwiper Aug 26, 2024

kastakhov Aug 26, 2024

kastakhov Aug 26, 2024 •

edited

Loading

SineSwiper Aug 27, 2024

oalders commented Aug 20, 2024

kastakhov commented Aug 22, 2024

SineSwiper commented Aug 26, 2024 •

edited

Loading

oalders commented Aug 26, 2024

genio commented Aug 26, 2024

SineSwiper commented Aug 26, 2024

SineSwiper commented Aug 26, 2024

kastakhov commented Aug 26, 2024 •

edited

Loading

kastakhov commented Aug 26, 2024

genio commented Aug 26, 2024

SineSwiper commented Aug 27, 2024 •

edited

Loading

fix userinfo in case of some special characters #144

Are you sure you want to change the base?

fix userinfo in case of some special characters #144

Conversation

kastakhov commented Aug 17, 2024

oalders Aug 20, 2024

Choose a reason for hiding this comment

oalders Aug 20, 2024

Choose a reason for hiding this comment

kastakhov Aug 20, 2024

Choose a reason for hiding this comment

oalders Aug 20, 2024

Choose a reason for hiding this comment

kastakhov Aug 20, 2024 • edited Loading

Choose a reason for hiding this comment

kastakhov Aug 20, 2024 • edited Loading

Choose a reason for hiding this comment

SineSwiper Aug 26, 2024

Choose a reason for hiding this comment

kastakhov Aug 26, 2024

Choose a reason for hiding this comment

kastakhov Aug 26, 2024 • edited Loading

Choose a reason for hiding this comment

SineSwiper Aug 27, 2024

Choose a reason for hiding this comment

oalders commented Aug 20, 2024

kastakhov commented Aug 22, 2024

SineSwiper commented Aug 26, 2024 • edited Loading

oalders commented Aug 26, 2024

genio commented Aug 26, 2024

SineSwiper commented Aug 26, 2024

SineSwiper commented Aug 26, 2024

kastakhov commented Aug 26, 2024 • edited Loading

kastakhov commented Aug 26, 2024

genio commented Aug 26, 2024

SineSwiper commented Aug 27, 2024 • edited Loading

kastakhov Aug 20, 2024 •

edited

Loading

kastakhov Aug 20, 2024 •

edited

Loading

kastakhov Aug 26, 2024 •

edited

Loading

SineSwiper commented Aug 26, 2024 •

edited

Loading

kastakhov commented Aug 26, 2024 •

edited

Loading

SineSwiper commented Aug 27, 2024 •

edited

Loading