Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving URI to be compatible with Windows #517

Merged
merged 36 commits into from
Oct 14, 2015
Merged

Improving URI to be compatible with Windows #517

merged 36 commits into from
Oct 14, 2015

Conversation

jslee02
Copy link
Member

@jslee02 jslee02 commented Oct 11, 2015

This PR makes URI to work on Windows and resolve issues reported in #488 and #497.

Most changes are related to file URI (file://your/path) to make it compatible on Unix and Windows. Here is a summary of the changes:

  • Every function that takes in a filename as a string (e.g., skel/urdf/sdf parsers) now takes Uri.
  • In order to pass filename to those functions as before, you can use Uri::createFromPath(filename).
  • A Uri can be created from file URI string (e.g., file:///foo/bar.txt and file:///C:/foo/bar.txt) or raw file path (e.g., /foo/bar.txt and C:/foo/bar.txt).
  • Every path should be absolute path.

You can find more details at the discussions in #497.

Note that this PR also makes the AppVeyor CI tests to pass on Windows so afterward PRs of this should pass AppVeyor CI tests as well.

Remove old documents: dart-tutorial, programmingGuide
- Note that input path must be absolute path
mPath = mPath->substr(1, mPath->size() - 1);
}
#endif
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it's a good idea to do this here. mPath is supposed to be the "path" component of the URI as defined by the RFC standard. Putting this logic here makes that no longer true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Windows, a "path" component would include the first slash like /C:/foo/bar.txt, and it wouldn't work if we pass it to some other functions that takes a raw path.

I couldn't think a better way to handle this problem. Do you have some idea on this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to add a getter method that handles this conversion. Naming it getPath would be confusing, since it sounds like it would just return mPath. How about getFilesystemPath?

The key issue is that the word "path" could refer to two different concepts: (1) "the path component of a URI" or (2) "a path in the local filesystem". The mPath field on Uri specifically refers to definition (1), but definition (2) is more useful if you are trying to access a local file. These are interchangable on Linux, but differ by the leading / on Windows.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea we have two distinct functions.

Btw, don't we need to encapsulate other member variables as well? If we do so, we might need to consider whether the getter methods return UriComponent or their strings.

We could have separate getter methods to make it clear like:

UriComponent getSchemeComponent() 
std::string getScheme()

or

UriComponent getScheme() 
std::string getSchemeString()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping to keep Uri as close to a mutable struct as possible. Many ResourceRetreivers rewrite URIs to other types of URIs (e.g. resolve package:// URIs to file:// URIs), which is easier to implement if you have direct access to the URI components.

For example, a simple assignment like *old_uri.schema = "file" would turn into a verbose (and error-prone) constructor call that looks like this:

Uri new_uri("file", old_uri.getAuthority(), old_uri.getPath(), old_uri.getQuery(), old_uri.getFragment()

We also can't replace UriComponent with an std::string. The RFC explicitly states that it's important to differentiate between "missing" components and "empty" components, because they may serialize to very different meanings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense. I'll add some comment on Uri class about the rationale of public members.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, to follow the convention throughout DART, we should probably use struct instead of class for Uri.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jslee02
Copy link
Member Author

jslee02 commented Oct 12, 2015

I thought that too, but in general it would also make sense to keep the original value instead of set it to the default value on failure. For Uri, empty component is the default.

I actually don't have strong preference on this. If anyone don't have more strong idea on this I would make them to clear the contents on failure.

@mxgrey
Copy link
Member

mxgrey commented Oct 12, 2015

I feel pretty strongly that when the user calls a function that is meant to alter the components of the Uri but that function fails to do so, it should clear all the Uri components. The reasoning here is that if a string fails to parse and the user fails to catch the failure, we would want that error to propagate forward, because if the Uri object was being recycled, then there's no telling what it might have been containing before the user attempted to change it, and it will be impossible for later code to recognize that the Uri it's being given is not actually the Uri that the user intended.

Alternatively, we could have a second argument in each of the Uri::from...(std::string) functions which would be a boolean for whether or not the Uri should be cleared upon failure. I would still want that argument to default to true (i.e. do clear upon failure) though.

@psigen
Copy link
Collaborator

psigen commented Oct 12, 2015

The parsing operation returns a success right?

I would think it would be somewhat useful to not clear the Uri on a parse failure for the case where one might want to implement "first-available" logic. You can simply run load commands until failure, and use whatever is in the URI. Might be useful for implementing local caches and so forth.

Not strongly opposed, just don't see a strong reason to clear it if you already have a return value to indicate success.

@mxgrey
Copy link
Member

mxgrey commented Oct 12, 2015

If there are reasonable use cases for not clearing, then I would strongly support the idea of having a boolean flag that defaults to having the Uri cleared.

@mkoval
Copy link
Collaborator

mkoval commented Oct 12, 2015

I am fine with leaving a Uri undefined after a failure. If you are against undefined behavior here, then I think clearing the Uri is a reasonable compromise.

Some parsing operations could fail with the Uri in a partially invalid state; i.e. some of the components are set and others are not. Restoring them to their original state would require introducing temporary variables.

@psigen
Copy link
Collaborator

psigen commented Oct 12, 2015

If there is no option to leave the Uri in an unchanged state following a failed parse, then I agree with @mxgrey that it should be cleared. There's no real point in leaving it in an undefined state as far as I can tell.

I'm also somewhat against adding a flag, in this case it's pretty code-bloaty and wouldn't actually make the code any easier to read, because you'd have to figure out what this flag was doing instead of just having an extra Uri object on the stack.

@mxgrey
Copy link
Member

mxgrey commented Oct 12, 2015

The only advantage to leaving it in an undefined state is to avoid the need for temporary variables, which is a valid performance consideration since we're using std::string which allocates to the heap.

I think my preference would be having a flag in the Uri::from... functions that will either (1) clear the Uri if the flag is true upon failure or (2) leave the Uri in an undefined state if the flag is false upon failure. And we would have the flag default to being true.

@mkoval
Copy link
Collaborator

mkoval commented Oct 12, 2015

I don't think it's worth adding a flag for this. Parsing a URI requires evaluating a regex and allocating string objects, which is already a quite expensive operation. Loading a resource from the URI, which typically shortly follows constructing it, is even more expensive.

I'm in favor of always clearing the Uri on failure.

@jslee02
Copy link
Member Author

jslee02 commented Oct 12, 2015

I don't prefer to have a function with undefined behavior. My preference is having a flag with options to clearing the content or keeping the original content. But if we don't want to have a flag then I would like to go with the option of clearing on failure.

It seems a possible compromise is clearing the contents on failure without a flag, which meets our (minimum) preferences. I will change the functions so if we don't have further idea.

I think we still can restore the original contents on failure using a temporary uri.

The newline character '\n' is stored to a file differently depending on the platform. Especially, Windows uses two characters ('\r\n'), so the size of string and the size of file are not same, which makes the size comparison test fail. We can make this test platform independent later but I just quickly removed the newline character for now because it has nothing to do with the functionality that we want to test here.
/LTCG and /INCREMENTAL flags conflict each other. We use /LTCG for release build, and according to  a post (http://blogs.msdn.com/b/vcblog/archive/2013/10/30/the-visual-c-linker-best-practices-developer-iteration.aspx) /INCREMENTAL is off by default for release mode. But it seems not (https://ci.appveyor.com/project/jslee02/dart/build/1893/job/eknf562lq4ybolyg).
bool Uri::fromStringOrPath(const std::string& _input)
{
// Assume that any URI without a scheme is a path.
static regex uriSchemeRegex(R"END(^(([^:/?#]+)://))END");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex is not technically correct. URIs only have the // delimiter if they have an "authority" component. Here are some example URIs that don't have an authority that I pulled directly out of RFC 3986:

mailto:[email protected]
news:comp.infosystems.www.servers.unix
tel:+1-816-555-1212
urn:oasis:names:specification:docbook:dtd:xml:4.1.2

I don't think it's possible to unambiguously differentiate between URIs and files on Windows, where : is used to delimit the drive letter. So, begrudgingly, this may be the best we can do.

@psigen Any ideas?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible solution I can think now is:

  • For Unix systems, we use corrected regex (R"END(^(([^:/?#]+):))END")) so it really checks existence of scheme
  • For Windows, we use different regex that checks Windows style path as R"END([a-zA-Z]:[/|\\])END". We might don't need scheme check here since we assume that incoming path is an absolute path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test to see if the regex works for URIs without authority component, and didn't work as expected. Then I tested with the above solution and it worked.

So now we assume a string without a scheme component is a path for Unix systems, and a string begin with Windows style path (e.g., C:/ or c:).

@jslee02
Copy link
Member Author

jslee02 commented Oct 14, 2015

Sorry for rush but I'm planning to release 5.1 as soon as possible. We still have points to discuss for better implementation but it seems we can merge this since current implementation is at least passing all the tests. So if anyone have something to comment then please post them soon.

@mkoval
Copy link
Collaborator

mkoval commented Oct 14, 2015

I'm happy with the current state of the pull request. 😄

@psigen Anything to add?

jslee02 added a commit that referenced this pull request Oct 14, 2015
Improving URI to be compatible with Windows
@jslee02 jslee02 merged commit 293b8fa into master Oct 14, 2015
@jslee02 jslee02 deleted the portable_uri branch October 18, 2015 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants