Improving URI to be compatible with Windows #517

jslee02 · 2015-10-11T05:18:08Z

This PR makes URI to work on Windows and resolve issues reported in #488 and #497.

Most changes are related to file URI (file://your/path) to make it compatible on Unix and Windows. Here is a summary of the changes:

Every function that takes in a filename as a string (e.g., skel/urdf/sdf parsers) now takes Uri.
In order to pass filename to those functions as before, you can use Uri::createFromPath(filename).
A Uri can be created from file URI string (e.g., file:///foo/bar.txt and file:///C:/foo/bar.txt) or raw file path (e.g., /foo/bar.txt and C:/foo/bar.txt).
Every path should be absolute path.

You can find more details at the discussions in #497.

Note that this PR also makes the AppVeyor CI tests to pass on Windows so afterward PRs of this should pass AppVeyor CI tests as well.

Remove old documents: dart-tutorial, programmingGuide

- Note that input path must be absolute path

…arser.cpp

mkoval · 2015-10-11T15:18:43Z

dart/common/Uri.cpp

+        mPath = mPath->substr(1, mPath->size() - 1);
+    }
+#endif
+  }


I don't know if it's a good idea to do this here. mPath is supposed to be the "path" component of the URI as defined by the RFC standard. Putting this logic here makes that no longer true.

On Windows, a "path" component would include the first slash like /C:/foo/bar.txt, and it wouldn't work if we pass it to some other functions that takes a raw path.

I couldn't think a better way to handle this problem. Do you have some idea on this?

I would prefer to add a getter method that handles this conversion. Naming it getPath would be confusing, since it sounds like it would just return mPath. How about getFilesystemPath?

The key issue is that the word "path" could refer to two different concepts: (1) "the path component of a URI" or (2) "a path in the local filesystem". The mPath field on Uri specifically refers to definition (1), but definition (2) is more useful if you are trying to access a local file. These are interchangable on Linux, but differ by the leading / on Windows.

I like the idea we have two distinct functions.

Btw, don't we need to encapsulate other member variables as well? If we do so, we might need to consider whether the getter methods return UriComponent or their strings.

We could have separate getter methods to make it clear like:

UriComponent getSchemeComponent() std::string getScheme()

or

UriComponent getScheme() std::string getSchemeString()

I was hoping to keep Uri as close to a mutable struct as possible. Many ResourceRetreivers rewrite URIs to other types of URIs (e.g. resolve package:// URIs to file:// URIs), which is easier to implement if you have direct access to the URI components.

For example, a simple assignment like *old_uri.schema = "file" would turn into a verbose (and error-prone) constructor call that looks like this:

Uri new_uri("file", old_uri.getAuthority(), old_uri.getPath(), old_uri.getQuery(), old_uri.getFragment()

We also can't replace UriComponent with an std::string. The RFC explicitly states that it's important to differentiate between "missing" components and "empty" components, because they may serialize to very different meanings.

It makes sense. I'll add some comment on Uri class about the rationale of public members.

Also, to follow the convention throughout DART, we should probably use struct instead of class for Uri.

jslee02 · 2015-10-12T02:41:27Z

I thought that too, but in general it would also make sense to keep the original value instead of set it to the default value on failure. For Uri, empty component is the default.

I actually don't have strong preference on this. If anyone don't have more strong idea on this I would make them to clear the contents on failure.

mxgrey · 2015-10-12T02:47:53Z

I feel pretty strongly that when the user calls a function that is meant to alter the components of the Uri but that function fails to do so, it should clear all the Uri components. The reasoning here is that if a string fails to parse and the user fails to catch the failure, we would want that error to propagate forward, because if the Uri object was being recycled, then there's no telling what it might have been containing before the user attempted to change it, and it will be impossible for later code to recognize that the Uri it's being given is not actually the Uri that the user intended.

Alternatively, we could have a second argument in each of the Uri::from...(std::string) functions which would be a boolean for whether or not the Uri should be cleared upon failure. I would still want that argument to default to true (i.e. do clear upon failure) though.

psigen · 2015-10-12T02:49:08Z

The parsing operation returns a success right?

I would think it would be somewhat useful to not clear the Uri on a parse failure for the case where one might want to implement "first-available" logic. You can simply run load commands until failure, and use whatever is in the URI. Might be useful for implementing local caches and so forth.

Not strongly opposed, just don't see a strong reason to clear it if you already have a return value to indicate success.

mxgrey · 2015-10-12T02:50:14Z

If there are reasonable use cases for not clearing, then I would strongly support the idea of having a boolean flag that defaults to having the Uri cleared.

mkoval · 2015-10-12T02:52:07Z

I am fine with leaving a Uri undefined after a failure. If you are against undefined behavior here, then I think clearing the Uri is a reasonable compromise.

Some parsing operations could fail with the Uri in a partially invalid state; i.e. some of the components are set and others are not. Restoring them to their original state would require introducing temporary variables.

psigen · 2015-10-12T02:54:12Z

If there is no option to leave the Uri in an unchanged state following a failed parse, then I agree with @mxgrey that it should be cleared. There's no real point in leaving it in an undefined state as far as I can tell.

I'm also somewhat against adding a flag, in this case it's pretty code-bloaty and wouldn't actually make the code any easier to read, because you'd have to figure out what this flag was doing instead of just having an extra Uri object on the stack.

mxgrey · 2015-10-12T02:56:44Z

The only advantage to leaving it in an undefined state is to avoid the need for temporary variables, which is a valid performance consideration since we're using std::string which allocates to the heap.

I think my preference would be having a flag in the Uri::from... functions that will either (1) clear the Uri if the flag is true upon failure or (2) leave the Uri in an undefined state if the flag is false upon failure. And we would have the flag default to being true.

mkoval · 2015-10-12T02:59:22Z

I don't think it's worth adding a flag for this. Parsing a URI requires evaluating a regex and allocating string objects, which is already a quite expensive operation. Loading a resource from the URI, which typically shortly follows constructing it, is even more expensive.

I'm in favor of always clearing the Uri on failure.

jslee02 · 2015-10-12T03:22:16Z

I don't prefer to have a function with undefined behavior. My preference is having a flag with options to clearing the content or keeping the original content. But if we don't want to have a flag then I would like to go with the option of clearing on failure.

It seems a possible compromise is clearing the contents on failure without a flag, which meets our (minimum) preferences. I will change the functions so if we don't have further idea.

I think we still can restore the original contents on failure using a temporary uri.

The newline character '\n' is stored to a file differently depending on the platform. Especially, Windows uses two characters ('\r\n'), so the size of string and the size of file are not same, which makes the size comparison test fail. We can make this test platform independent later but I just quickly removed the newline character for now because it has nothing to do with the functionality that we want to test here.

/LTCG and /INCREMENTAL flags conflict each other. We use /LTCG for release build, and according to a post (http://blogs.msdn.com/b/vcblog/archive/2013/10/30/the-visual-c-linker-best-practices-developer-iteration.aspx) /INCREMENTAL is off by default for release mode. But it seems not (https://ci.appveyor.com/project/jslee02/dart/build/1893/job/eknf562lq4ybolyg).

mkoval · 2015-10-12T16:23:59Z

dart/common/Uri.cpp

+bool Uri::fromStringOrPath(const std::string& _input)
+{
+  // Assume that any URI without a scheme is a path.
+  static regex uriSchemeRegex(R"END(^(([^:/?#]+)://))END");


This regex is not technically correct. URIs only have the // delimiter if they have an "authority" component. Here are some example URIs that don't have an authority that I pulled directly out of RFC 3986:

mailto:[email protected] news:comp.infosystems.www.servers.unix tel:+1-816-555-1212 urn:oasis:names:specification:docbook:dtd:xml:4.1.2

I don't think it's possible to unambiguously differentiate between URIs and files on Windows, where : is used to delimit the drive letter. So, begrudgingly, this may be the best we can do.

@psigen Any ideas?

One possible solution I can think now is:

For Unix systems, we use corrected regex (R"END(^(([^:/?#]+):))END")) so it really checks existence of scheme

For Windows, we use different regex that checks Windows style path as R"END([a-zA-Z]:[/|\\])END". We might don't need scheme check here since we assume that incoming path is an absolute path.

I added a test to see if the regex works for URIs without authority component, and didn't work as expected. Then I tested with the above solution and it worked.

So now we assume a string without a scheme component is a path for Unix systems, and a string begin with Windows style path (e.g., C:/ or c:).

…- failing

jslee02 · 2015-10-14T01:34:57Z

Sorry for rush but I'm planning to release 5.1 as soon as possible. We still have points to discuss for better implementation but it seems we can merge this since current implementation is at least passing all the tests. So if anyone have something to comment then please post them soon.

mkoval · 2015-10-14T03:48:57Z

I'm happy with the current state of the pull request. 😄

@psigen Anything to add?

Improving URI to be compatible with Windows

jslee02 added 20 commits October 9, 2015 03:33

Remove old documents: dart-tutorial, programmingGuide

d3648fb

Merge pull request #515 from dartsim/docs

6bf01b5

Remove old documents: dart-tutorial, programmingGuide

Fix bug in UriHelpers::getUri_InputIsPath_AppendsFileSchema test

28015cc

Change Uri::fromStringOrPath(~) to convert a path to a file URI

caabef0

- Note that input path must be absolute path

Fix bug in Uri::toString() and update Uri tests for Windows

8da05d1

Make ResourceRetriever to take Uri instead of std::string

f59420d

Make SkelParser to use Uri instead of raw file path

24e4b0e

Make urdf parser to use Uri instead of raw file path

d509dbe

Change behavior of Uri(string) to parse not only URI but also local path

2616b75

Update resource retriever tests for Windows paths

c731e99

Update tests to use Uri instead of raw path

e1dc055

Use path part instead of full file URI to open a file in urdf_world_p…

875de88

…arser.cpp

Make sdf parser to use Uri instead of raw file path

363d430

Add Uri::fromPath(~) and deprecate Uri::fromStringOrPath(~)

bd79ad3

Use Uri::createFromPath(filename) to pass file URI

968615e

More work on sdf parser to use Uri instead of raw file path

29dbdfc

Merge remote-tracking branch 'origin/master' into portable_uri

fccd529

Generate test file to avoid issue of inconsistency of newline over OSs

a1cc97b

More work on sdf parser to use Uri instead of raw file path

12c79ec

Revert previous commit and use \r\n to represent newline on Windows

d0a2a04

jslee02 added this to the DART 5.1.0 milestone Oct 11, 2015

jslee02 mentioned this pull request Oct 11, 2015

Enabling proper URI resolution for World URDF parsing #497

Merged

Lower verbosity of AppVeyor tests and enable parallel testing

a220ee1

mkoval reviewed Oct 11, 2015
View reviewed changes

Use Uri::getFilesystemPath() to get file path in urdf_world_parser.cpp

3390cac

jslee02 added 3 commits October 11, 2015 23:49

Clear Uri components when the URI object fails to parse URI

7352b35

mkoval reviewed Oct 12, 2015
View reviewed changes

jslee02 added 4 commits October 12, 2015 21:05

Fix typo

42515bd

Use Uri::createFromString() when we know the argument is an URI

ed369f5

Add Uri test to check fromStringOrPath can distinguish path and uri -…

c9b21ce

…- failing

Fix fromStringOrPath to correctly distinguish (absolute) path and URI

78557fd

jslee02 added a commit that referenced this pull request Oct 14, 2015

Merge pull request #517 from dartsim/portable_uri

293b8fa

Improving URI to be compatible with Windows

jslee02 merged commit 293b8fa into master Oct 14, 2015

mkoval mentioned this pull request Oct 16, 2015

Unit tests are failing on Windows #488

Closed

jslee02 deleted the portable_uri branch October 18, 2015 11:39

mkoval mentioned this pull request Oct 19, 2015

ResourceRetriever and Resource methods are now const #532

Closed

jslee02 mentioned this pull request Oct 21, 2015

Remove const qualifier of ResourceRetriever methods #534

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving URI to be compatible with Windows #517

Improving URI to be compatible with Windows #517

jslee02 commented Oct 11, 2015

mkoval Oct 11, 2015

jslee02 Oct 11, 2015

mkoval Oct 11, 2015

jslee02 Oct 11, 2015

mkoval Oct 11, 2015

jslee02 Oct 11, 2015

mxgrey Oct 11, 2015

jslee02 Oct 11, 2015

jslee02 commented Oct 12, 2015

mxgrey commented Oct 12, 2015

psigen commented Oct 12, 2015

mxgrey commented Oct 12, 2015

mkoval commented Oct 12, 2015

psigen commented Oct 12, 2015

mxgrey commented Oct 12, 2015

mkoval commented Oct 12, 2015

jslee02 commented Oct 12, 2015

mkoval Oct 12, 2015

jslee02 Oct 13, 2015

jslee02 Oct 13, 2015

jslee02 commented Oct 14, 2015

mkoval commented Oct 14, 2015

Improving URI to be compatible with Windows #517

Improving URI to be compatible with Windows #517

Conversation

jslee02 commented Oct 11, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jslee02 commented Oct 12, 2015

mxgrey commented Oct 12, 2015

psigen commented Oct 12, 2015

mxgrey commented Oct 12, 2015

mkoval commented Oct 12, 2015

psigen commented Oct 12, 2015

mxgrey commented Oct 12, 2015

mkoval commented Oct 12, 2015

jslee02 commented Oct 12, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jslee02 commented Oct 14, 2015

mkoval commented Oct 14, 2015