Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Uri] Paths with Unicode/UTF-8 incorrectly parsed/reported by System.Uri #1061

Closed
g0dsCookie opened this issue Nov 15, 2018 · 15 comments · Fixed by #36429
Closed

[Uri] Paths with Unicode/UTF-8 incorrectly parsed/reported by System.Uri #1061

g0dsCookie opened this issue Nov 15, 2018 · 15 comments · Fixed by #36429
Assignees
Milestone

Comments

@g0dsCookie
Copy link

Paths with Unicode/UTF-8 incorrectly parsed/reported by System.Uri

General

As described by the title, paths with Unicode/UTF-8 characters are incorrectly parsed/reported by System.Uri resulting in an invalid path. For example the path "/üri" will result in an Uri like "file:///%C3%BCri/üri" (note the unescaped /üri at the end).

This also happens with other Unicode/UTF-8 characters like £, §, etc. So you can replace ü by any other Unicode/UTF-8 character in my example and see the same result, e.g. the path is doubled.

Expected Result

PathAndQuery = AbsolutePath = "/%C3%BCri"
AbsoluteUri = "file:///%C3%BCri"

Results collected using mono 5.14 on the same Linux machine.

Actual Result

PathAndQuery = AbsolutePath = "/%C3%BCri" // This seems to be correct
AbsoluteUri = "/%C3%BCri/%C3%BCri" // Note the additional "/%C3%BCri" at the end
_string = "/üri/üri" // Note the additinal "/üri" at the end

Using the .NET core version mentioned below.

System Informations

$ dotnet --info
.NET Core SDK (reflecting any global.json):
 Version:   2.1.302
 Commit:    9048955601

Runtime Environment:
 OS Name:     gentoo
 OS Version:
 OS Platform: Linux
 RID:         gentoo-x64
 Base Path:   /opt/dotnet_core/sdk/2.1.302/

Host (useful for support):
  Version: 2.1.2
  Commit:  811c3ce6c0

Code to reproduce

using System;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            var uri = new Uri("/üri");
            Console.WriteLine(uri.ToString()); // file:///%C3%BCri/üri
        }
    }
}

Above code prints "file:///%C3%BCri/üri" while "file:///%C3%BCri" is expected.

Using "/üri/üri" in the Uri ctor results in a path like "file:///%C3%BCri/%C3%BCri/üri/üri".

@g0dsCookie
Copy link
Author

Also DOS-like and UNC Paths are treated correctly on windows.

c:\üri -> file:///c:/üri
//computer/üri -> file://computer/üri

@karelz
Copy link
Member

karelz commented Nov 15, 2018

@g0dsCookie let's not mix multiple problems in the same issue. I don't think Windows paths are recognized in Uri on Linux. That is IMO by design - @rmkerr can confirm.

Regarding your original report: Can you please clarify (please edit the top post) what is the actual result on each platform and what is expected?

@g0dsCookie g0dsCookie changed the title Paths with umlauts incorrectly parsed/reported by System.Uri Paths with Unicode/UTF-8 incorrectly parsed/reported by System.Uri Nov 16, 2018
@g0dsCookie
Copy link
Author

@g0dsCookie let's not mix multiple problems in the same issue. I don't think Windows paths are recognized in Uri on Linux. That is IMO by design - @rmkerr can confirm.

Actually paths like "C:\üri" and "\localhost\üri" are correctly parsed on Linux with .NET Core.

Regarding your original report: Can you please clarify (please edit the top post) what is the actual result on each platform and what is expected?

I updated my original report. Hope it's clearer now. It seems like there's a problem when parsing UTF-8/Unicode characters which causes the path to be doubled for every UTF-8 character encountered.

@karelz karelz transferred this issue from dotnet/core Nov 16, 2018
@rmkerr
Copy link
Contributor

rmkerr commented Nov 19, 2018

Thanks for the detailed report @g0dsCookie. This looks really interesting. I'm not going to be able to take a look at it immediately, but I think we should try to get this fixed for 3.0.

@caesar-chen caesar-chen changed the title Paths with Unicode/UTF-8 incorrectly parsed/reported by System.Uri [Uri] Paths with Unicode/UTF-8 incorrectly parsed/reported by System.Uri Jan 8, 2019
@Anipik Anipik self-assigned this Apr 30, 2019
@wtgodbe
Copy link
Member

wtgodbe commented May 1, 2019

@tarekgh does this look like something that might be related to libicu?

@tarekgh
Copy link
Member

tarekgh commented May 1, 2019

@wtgodbe I cannot tell without looking :-) does this issue not repro on Windows?

@wtgodbe
Copy link
Member

wtgodbe commented May 1, 2019

It doesn't repro on Windows, I get the same results as https://github.com/dotnet/corefx/issues/33557#issuecomment-439075084

@tarekgh
Copy link
Member

tarekgh commented May 1, 2019

It looks on Windows this is by design as the Uri cannot start with one '/' char.

https://github.com/dotnet/corefx/blob/master/src/System.Private.Uri/src/System/Uri.cs#L3737

and on Linux this can be a valid Uri according to https://github.com/dotnet/corefx/blob/master/src/System.Private.Uri/src/System/Uri.cs#L3663

I didn't look at Linux yet to know why we are returning the result we are seeing here.

@tarekgh
Copy link
Member

tarekgh commented May 2, 2019

I have looked at the issue on Linux, the problem has nothing to do with icu. here is what is the problem:

The URI code detect that running on Linux and the string starts with '/' which means it could be a valid file path. and store the internal uri._string as the original value "/üri". Later, the code will call the method ParseRemaining which will call EscapeUnescapeIri.

https://github.com/dotnet/corefx/blob/f5b57382cd5fef53cf09e5fd6b9b9812dfddb953/src/System.Private.Uri/src/System/Uri.cs#L3394

EscapeUnescapeIri will return "/üri" and then code will concatenate this value to the original _string. that means _string now will be storing "/üri/üri"

Then later the code will try to get the host name. will detect the host name should be the first 4-characters "/üri" and will call EscapeString helper method to normalize this name which will return "/%C3%BCri"

https://github.com/dotnet/corefx/blob/f5b57382cd5fef53cf09e5fd6b9b9812dfddb953/src/System.Private.Uri/src/System/Uri.cs#L2520

that makes the whole uri as "file:///%C3%BCri/üri"

Let me know if I can help in anything more.

@karelz karelz assigned wtgodbe and unassigned Anipik May 22, 2019
@karelz
Copy link
Member

karelz commented Jun 17, 2019

@wtgodbe please check this is not regression against 2.x

@Livven
Copy link

Livven commented Nov 11, 2019

It doesn't repro on Windows, I get the same results as #33557 (comment)

Repros for me on Windows, new Uri("/üri") will throw a UriFormatException but new Uri("file:///üri").ToString() results in file:///üri/üri.

As an aside, I must say I'm really surprised that new Uri behaves differently depending on platform. It's not like every URI you would ever handle has to be for the same platform.

@davidsh
Copy link
Contributor

davidsh commented Nov 11, 2019

As an aside, I must say I'm really surprised that new Uri behaves differently depending on platform. It's not like every URI you would ever handle has to be for the same platform.

File uri's are special due to handling OS specific file path syntax. Ordinary schemes such as "http", etc. will behave consistently across platforms.

However, the behavior you see with UriFormatException vs. the repeated word pattern (üri) is something we will investigate since it seems like a bug.

@Livven
Copy link

Livven commented Nov 11, 2019

I understand file path syntax can differ across OSes, but you might still want to handle Linux-style paths on Windows and vice versa. Especially as this breaks e.g. serialization/deserialization across platforms.

@MihaZupan MihaZupan transferred this issue from dotnet/corefx Dec 19, 2019
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Net untriaged New issue has not been triaged by the area owner labels Dec 19, 2019
@MihaZupan MihaZupan added bug and removed untriaged New issue has not been triaged by the area owner labels Dec 19, 2019
@karelz karelz added this to the 5.0 milestone Feb 20, 2020
@lewing
Copy link
Member

lewing commented Apr 24, 2020

Just some more data points because I'm running into this right now

namespace UriTest
{
    class Program
    {
        static void Main(string[] args)
		{
            var uris = new [] {
                new Uri ("/Source/Test#our codedir/smile😟/Program.cs"),
                new Uri ("/Source/Test#our codedir/smile😟/Program.cs", UriKind.Absolute),
                new Uri (new Uri ("file://"), "/Source/Test#our codedir/smile😟/Program.cs"),
                new Uri (new Uri ("file://localhost"), "Source/Test#our codedir/smile😟/Program.cs"),
                new Uri (new Uri ("file://localhost"), "/Source/Test#our codedir/smile😟/Program.cs"),
                new Uri (new Uri ("file://localhost"), new Uri ("/Source/Test#our codedir/smile😟/Program.cs")),
                new Uri ("file:///Source/Test%23our%20codedir/smile%F0%9F%98%9F/Program.cs"),
                new Uri ("file://localhost/Source/Test%23our%20codedir/smile%F0%9F%98%9F/Program.cs"),
			    new Uri ("http://localhost/Source/Test%23our%20codedir/smile%F0%9F%98%9F/Program.cs"),
            };
            var i = 0;
            foreach (var uri in uris) {
                Console.WriteLine ($"{i++} XXXXXXXXXXXXXXXXXXXXXXXXXXXXX");
                Console.WriteLine ($"   AbsoluteUri: {uri.AbsoluteUri}");
                Console.WriteLine ($"  AbsolutePath: {uri.AbsolutePath}");
                Console.WriteLine ($"     LocalPath: {uri.LocalPath}");
                Console.WriteLine ($"    ToString(): {uri.ToString()}");
                Console.WriteLine ($"OriginalString: {uri.OriginalString}");
            }
        }
    }
}

Gives the output of:

dotnet run
0 XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
   AbsoluteUri: file:///Source/Test%23our%20codedir/smile%F0%9F%98%9F/Program.cs/Source/Test%23our%20codedir/smile%F0%9F%98%9F/Program.cs
  AbsolutePath: /Source/Test%23our%20codedir/smile%F0%9F%98%9F/Program.cs
     LocalPath: /Source/Test#our codedir/smile😟/Program.cs
    ToString(): file://%2FSource%2FTest%23our codedir%2Fsmile😟%2FProgram.cs/Source/Test%23our codedir/smile😟/Program.cs
OriginalString: /Source/Test#our codedir/smile😟/Program.cs
1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
   AbsoluteUri: file:///Source/Test%23our%20codedir/smile%F0%9F%98%9F/Program.cs/Source/Test%23our%20codedir/smile%F0%9F%98%9F/Program.cs
  AbsolutePath: /Source/Test%23our%20codedir/smile%F0%9F%98%9F/Program.cs
     LocalPath: /Source/Test#our codedir/smile😟/Program.cs
    ToString(): file://%2FSource%2FTest%23our codedir%2Fsmile😟%2FProgram.cs/Source/Test%23our codedir/smile😟/Program.cs
OriginalString: /Source/Test#our codedir/smile😟/Program.cs
2 XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
   AbsoluteUri: file:///Source/Test%23our%20codedir/smile%F0%9F%98%9F/Program.cs/Source/Test#our%20codedir/smile%F0%9F%98%9F/Program.cs
  AbsolutePath: /Source/Test
     LocalPath: /Source/Test
    ToString(): file:///Source/Test#our codedir/smile😟/Program.cs/Source/Test#our codedir/smile😟/Program.cs
OriginalString: file:///Source/Test#our codedir/smile😟/Program.cs
3 XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
   AbsoluteUri: file://localhost/Source/Test#our%20codedir/smile%F0%9F%98%9F/Program.cs
  AbsolutePath: /Source/Test
     LocalPath: \\localhost\Source\Test
    ToString(): file://localhost/Source/Test#our codedir/smile😟/Program.cs
OriginalString: file://localhost/Source/Test#our codedir/smile😟/Program.cs
4 XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
   AbsoluteUri: file://localhost//Source/Test#our%20codedir/smile%F0%9F%98%9F/Program.cs
  AbsolutePath: //Source/Test
     LocalPath: \\localhost\\Source\Test
    ToString(): file://localhost//Source/Test#our codedir/smile😟/Program.cs
OriginalString: file://localhost//Source/Test#our codedir/smile😟/Program.cs
5 XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
   AbsoluteUri: file:///Source/Test%23our%20codedir/smile%F0%9F%98%9F/Program.cs/Source/Test%23our%20codedir/smile%F0%9F%98%9F/Program.cs
  AbsolutePath: /Source/Test%23our%20codedir/smile%F0%9F%98%9F/Program.cs
     LocalPath: /Source/Test#our codedir/smile😟/Program.cs
    ToString(): file://%2FSource%2FTest%23our codedir%2Fsmile😟%2FProgram.cs/Source/Test%23our codedir/smile😟/Program.cs
OriginalString: /Source/Test#our codedir/smile😟/Program.cs
6 XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
   AbsoluteUri: file:///Source/Test%23our%20codedir/smile%F0%9F%98%9F/Program.cs/Source/Test%23our%20codedir/smile%F0%9F%98%9F/Program.cs
  AbsolutePath: /Source/Test%23our%20codedir/smile%F0%9F%98%9F/Program.cs
     LocalPath: /Source/Test#our codedir/smile😟/Program.cs
    ToString(): file://%2FSource%2FTest%23our codedir%2Fsmile😟%2FProgram.cs/Source/Test%23our codedir/smile😟/Program.cs
OriginalString: file:///Source/Test%23our%20codedir/smile%F0%9F%98%9F/Program.cs
7 XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
   AbsoluteUri: file://localhost/Source/Test%23our%20codedir/smile%F0%9F%98%9F/Program.cs
  AbsolutePath: /Source/Test%23our%20codedir/smile%F0%9F%98%9F/Program.cs
     LocalPath: \\localhost\Source\Test#our codedir\smile😟\Program.cs
    ToString(): file://localhost/Source/Test%23our codedir/smile😟/Program.cs
OriginalString: file://localhost/Source/Test%23our%20codedir/smile%F0%9F%98%9F/Program.cs
8 XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
   AbsoluteUri: http://localhost/Source/Test%23our%20codedir/smile%F0%9F%98%9F/Program.cs
  AbsolutePath: /Source/Test%23our%20codedir/smile%F0%9F%98%9F/Program.cs
     LocalPath: /Source/Test#our codedir/smile😟/Program.cs
    ToString(): http://localhost/Source/Test%23our codedir/smile😟/Program.cs
OriginalString: http://localhost/Source/Test%23our%20codedir/smile%F0%9F%98%9F/Program.cs

There doesn't appear to be any way to construct a valid file uri that you can actually get the path back out of on unix

@karelz
Copy link
Member

karelz commented May 6, 2020

@MihaZupan can you please take a look at this one?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.