-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving URI to be compatible with Windows #517
Conversation
Remove old documents: dart-tutorial, programmingGuide
- Note that input path must be absolute path
mPath = mPath->substr(1, mPath->size() - 1); | ||
} | ||
#endif | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it's a good idea to do this here. mPath
is supposed to be the "path" component of the URI as defined by the RFC standard. Putting this logic here makes that no longer true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Windows, a "path" component would include the first slash like /C:/foo/bar.txt
, and it wouldn't work if we pass it to some other functions that takes a raw path.
I couldn't think a better way to handle this problem. Do you have some idea on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to add a getter method that handles this conversion. Naming it getPath
would be confusing, since it sounds like it would just return mPath
. How about getFilesystemPath
?
The key issue is that the word "path" could refer to two different concepts: (1) "the path component of a URI" or (2) "a path in the local filesystem". The mPath
field on Uri
specifically refers to definition (1), but definition (2) is more useful if you are trying to access a local file. These are interchangable on Linux, but differ by the leading /
on Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea we have two distinct functions.
Btw, don't we need to encapsulate other member variables as well? If we do so, we might need to consider whether the getter methods return UriComponent or their strings.
We could have separate getter methods to make it clear like:
UriComponent getSchemeComponent()
std::string getScheme()
or
UriComponent getScheme()
std::string getSchemeString()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping to keep Uri
as close to a mutable struct as possible. Many ResourceRetreiver
s rewrite URIs to other types of URIs (e.g. resolve package://
URIs to file://
URIs), which is easier to implement if you have direct access to the URI components.
For example, a simple assignment like *old_uri.schema = "file"
would turn into a verbose (and error-prone) constructor call that looks like this:
Uri new_uri("file", old_uri.getAuthority(), old_uri.getPath(), old_uri.getQuery(), old_uri.getFragment()
We also can't replace UriComponent
with an std::string
. The RFC explicitly states that it's important to differentiate between "missing" components and "empty" components, because they may serialize to very different meanings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense. I'll add some comment on Uri class about the rationale of public members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, to follow the convention throughout DART, we should probably use struct
instead of class
for Uri.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I thought that too, but in general it would also make sense to keep the original value instead of set it to the default value on failure. For I actually don't have strong preference on this. If anyone don't have more strong idea on this I would make them to clear the contents on failure. |
I feel pretty strongly that when the user calls a function that is meant to alter the components of the Uri but that function fails to do so, it should clear all the Uri components. The reasoning here is that if a string fails to parse and the user fails to catch the failure, we would want that error to propagate forward, because if the Alternatively, we could have a second argument in each of the |
The parsing operation returns a success right? I would think it would be somewhat useful to not clear the Not strongly opposed, just don't see a strong reason to clear it if you already have a return value to indicate success. |
If there are reasonable use cases for not clearing, then I would strongly support the idea of having a boolean flag that defaults to having the Uri cleared. |
I am fine with leaving a Some parsing operations could fail with the |
If there is no option to leave the Uri in an unchanged state following a failed parse, then I agree with @mxgrey that it should be cleared. There's no real point in leaving it in an undefined state as far as I can tell. I'm also somewhat against adding a flag, in this case it's pretty code-bloaty and wouldn't actually make the code any easier to read, because you'd have to figure out what this flag was doing instead of just having an extra |
The only advantage to leaving it in an undefined state is to avoid the need for temporary variables, which is a valid performance consideration since we're using I think my preference would be having a flag in the |
I don't think it's worth adding a flag for this. Parsing a URI requires evaluating a regex and allocating string objects, which is already a quite expensive operation. Loading a resource from the URI, which typically shortly follows constructing it, is even more expensive. I'm in favor of always clearing the |
I don't prefer to have a function with undefined behavior. My preference is having a flag with options to clearing the content or keeping the original content. But if we don't want to have a flag then I would like to go with the option of clearing on failure. It seems a possible compromise is clearing the contents on failure without a flag, which meets our (minimum) preferences. I will change the functions so if we don't have further idea. I think we still can restore the original contents on failure using a temporary uri. |
The newline character '\n' is stored to a file differently depending on the platform. Especially, Windows uses two characters ('\r\n'), so the size of string and the size of file are not same, which makes the size comparison test fail. We can make this test platform independent later but I just quickly removed the newline character for now because it has nothing to do with the functionality that we want to test here.
/LTCG and /INCREMENTAL flags conflict each other. We use /LTCG for release build, and according to a post (http://blogs.msdn.com/b/vcblog/archive/2013/10/30/the-visual-c-linker-best-practices-developer-iteration.aspx) /INCREMENTAL is off by default for release mode. But it seems not (https://ci.appveyor.com/project/jslee02/dart/build/1893/job/eknf562lq4ybolyg).
bool Uri::fromStringOrPath(const std::string& _input) | ||
{ | ||
// Assume that any URI without a scheme is a path. | ||
static regex uriSchemeRegex(R"END(^(([^:/?#]+)://))END"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regex is not technically correct. URIs only have the //
delimiter if they have an "authority" component. Here are some example URIs that don't have an authority that I pulled directly out of RFC 3986:
mailto:[email protected]
news:comp.infosystems.www.servers.unix
tel:+1-816-555-1212
urn:oasis:names:specification:docbook:dtd:xml:4.1.2
I don't think it's possible to unambiguously differentiate between URIs and files on Windows, where :
is used to delimit the drive letter. So, begrudgingly, this may be the best we can do.
@psigen Any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possible solution I can think now is:
- For Unix systems, we use corrected regex (
R"END(^(([^:/?#]+):))END")
) so it really checks existence of scheme - For Windows, we use different regex that checks Windows style path as
R"END([a-zA-Z]:[/|\\])END"
. We might don't need scheme check here since we assume that incoming path is an absolute path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test to see if the regex works for URIs without authority component, and didn't work as expected. Then I tested with the above solution and it worked.
So now we assume a string without a scheme component is a path for Unix systems, and a string begin with Windows style path (e.g., C:/ or c:).
Sorry for rush but I'm planning to release 5.1 as soon as possible. We still have points to discuss for better implementation but it seems we can merge this since current implementation is at least passing all the tests. So if anyone have something to comment then please post them soon. |
I'm happy with the current state of the pull request. 😄 @psigen Anything to add? |
Improving URI to be compatible with Windows
This PR makes URI to work on Windows and resolve issues reported in #488 and #497.
Most changes are related to file URI (file://your/path) to make it compatible on Unix and Windows. Here is a summary of the changes:
Uri
.Uri::createFromPath(filename)
.Uri
can be created from file URI string (e.g., file:///foo/bar.txt and file:///C:/foo/bar.txt) or raw file path (e.g., /foo/bar.txt and C:/foo/bar.txt).You can find more details at the discussions in #497.
Note that this PR also makes the AppVeyor CI tests to pass on Windows so afterward PRs of this should pass AppVeyor CI tests as well.