-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix userinfo in case of some special characters #144
base: master
Are you sure you want to change the base?
Conversation
add some UT for this case
lib/URI/_server.pm
Outdated
} | ||
if ($str =~ m,^((?:$URI::scheme_re:)?)//(.*:.*@)?([^/?\#]*)(.*)$,os) { | ||
my $scheme = $1; | ||
my $ui = $2 || ''; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have a more descriptive name for this variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this variable was not your creation. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, I have renamed to $userinfo
.
I decided to rename in whole file as it might be a little confusing when we're using $ui
or $userinfo
.
lib/URI.pm
Outdated
@@ -123,7 +123,7 @@ sub _fix_uric_escape_for_host_part { | |||
return; | |||
} | |||
|
|||
if ($_[0] =~ m{^((?:$URI::scheme_re:)?)//([^/?\#]+)(.*)$}os) { | |||
if ($_[0] =~ m{^((?:$URI::scheme_re:)?)//((.*:.*@)?[^/?\#]+)(.*)$}os) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a discussion with @genio on IRC:
It would be nice to avoid (.*@)
or (.*:.*@)
as it's greedy and could be prone to bugs.
We could take inspiration from https://metacpan.org/dist/Mojolicious/source/lib/Mojo/URL.pm#L52-64
After gathering the host and port part, ^([^@]+@) should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, but what if user wanna use '@' in password, using [^@]+@
will cause that the everything before first '@' symbol will be matched and password parsed incorrectly.
like the similar approach is already used in LWP/Protocol/http.pm.
And I already mentioned what issue it might cause in my PR.
For my standpoint (.*:.*@)
is pretty lazy, but okay, I agree that it could be better.
@oalders, what you think about ([^:]+:.*@)
for matching user info?
This will match everything except :
as username cannot contain colon, then match :
as separator, match everything after first colon as password or nothing as according to rfc3986#section-3.2.1 it might be empty, and finally match @
as second separator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, the regex which Mojo/URL.pm#L52-64 is used also cannot parse username/password if it contains any of /#?
symbols :)
I know, it's really rarely usecase, but unfortunately I faced with it already two times :d
For instance, this regex with username and password which contain all special characters in unescaped form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, but what if user wanna use '@' in password, using [^@]+@ will cause that the everything before first '@' symbol will be matched and password parsed incorrectly.
Then it should be URL-encoded as %40
. I'm not sure if there's any way around that. Something has to serve as a stopping point. If the RE is too greedy, it might end up stopping with a @
inside of the path, like http://host/path/directory@foobar/file
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the RE is too greedy, it might end up stopping with a
@
inside of the path, likehttp://host/path/directory@foobar/file
.
Hm.
Now I see, it'll match this path incorrectly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, actually I can revert this change as the most important part is in lib/URI/_server.pm#14.
There are another issue, if password contain multiple '@' symbols before any of '/#?' symbol URI parsing fails again :d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are another issue, if password contain multiple '@' symbols before any of '/#?' symbol URI parsing fails again :d
If the password was inserted directly into the URL that way without percent-encoding, that's not really the URI module's fault. The password has to be encoded properly first. That's the point of escaping characters.
Thanks very much for this @kastakhov. If we could tweak the regex in both places as suggested, that would be helpful. |
This is very adjacent to this issue, and feel free to tell me to open a different issue/PR, but... The user/password methods allow bad data to pollute the URL (lines 45 and the similar line 22), because it is only doing a very light set of manual URL-escaping. It should, at a minimum escape potential |
Is there a good reason not to just call |
I still don't think this is a good idea, as you're supposed to URI-escape things. Here's an example with both Mojo and URI: #!/usr/bin/env perl
use v5.36;
use strict;
use warnings;
use Mojo::URL ();
use Mojo::Util ();
use URI ();
use URI::Escape ();
my $unescaped_u = 'u1!"#$%&\'()*+,-./;<=>?@[\]^_`{|}~';
my $unescaped_p = 'p1!"#$%&\'()*+,-./;<=>?@[\]^_`{|}~';
{
# mojolicious way:
my $username = Mojo::Util::url_escape($unescaped_u);
my $password = Mojo::Util::url_escape($unescaped_p);
my $foo = "http://${username}:${password}\@example.com/pub/a/2001/08/27/bjornstad.html";
# say "Mojo: ", $foo;
my $url = Mojo::URL->new($foo);
# say "Mojo: ", $url->to_string();
say "Mojo: ", $url->userinfo();
}
{
# URI way
my $username = URI::Escape::uri_escape($unescaped_u);
my $password = URI::Escape::uri_escape($unescaped_p);
my $foo = "http://${username}:${password}\@example.com/pub/a/2001/08/27/bjornstad.html";
# say "URI: ", $foo;
my $url = URI->new($foo);
# say "URI: ", $url->as_string();
say "URI: ", URI::Escape::uri_unescape($url->userinfo());
}
Yielding: % perl url.pl
Mojo: u1!"#$%&'()*+,-./;<=>?@[\]^_`{|}~:p1!"#$%&'()*+,-./;<=>?@[\]^_`{|}~
URI: u1!"#$%&'()*+,-./;<=>?@[\]^_`{|}~:p1!"#$%&'()*+,-./;<=>?@[\]^_`{|}~
|
While I agree everybody should URI-escape the UN/PWs, it should still try to safely parse as much as it can. IMO, Also, I'll just create a different issue for the user/password set problem. |
Well, disregard my user/password set problem. I realized that |
Just in case, originally I faced with this issue here.. |
make _uric_escape regex more specific for userinfo part update UT
Now the only one issue still here.... when password contain '@' unescaped symbol before '/#?' symbols, user info part matches incorrectly. 🙃 |
The seemingly relevant RFC pieces are linked here: I think my cursory reading of these and understanding thus far leads me to believe we should leave things as-is and maybe provide documentation for how to provide complex authentication using |
I found Section 2.2 earlier, but didn't see where those declarations were used. Good find. So, this means we should stay away from unnecessarily parsing
Again, a delimiter has to exist unescaped. There's no such thing as a parser algorithm that can take in every ASCII or Unicode unescaped character as a sub-field of the whole string. At some point, we have to force the user to escape their data before it gets dropped into a URL. |
Fix for #143