Fixed isURL regex to take account file urls. #20435

SergioEstevao · 2020-02-25T13:24:53Z

Description

I updated the isURL regex for RN to take account file urls.
Added unit test to validate isURL when parsing file URLs.

How has this been tested?

Using unit tests

Screenshots

Types of changes

Checklist:

My code is tested.
My code follows the WordPress code style.
My code follows the accessibility standards.
My code has proper inline documentation.
I've included developer documentation if appropriate.
I've updated all React Native files affected by any refactorings/renamings in this PR.

Added unit test to validate isURL when parsing file URLs.

github-actions · 2020-02-25T13:29:36Z

Size Change: +846 B (0%)

Total Size: 866 kB

Filename	Size	Change
`build/a11y/index.js`	1.01 kB	+1 B
`build/annotations/index.js`	3.43 kB	-1 B
`build/autop/index.js`	2.58 kB	+1 B
`build/block-directory/index.js`	6.02 kB	+2 B (0%)
`build/block-editor/index.js`	104 kB	+665 B (0%)
`build/block-editor/style-rtl.css`	10.5 kB	+152 B (1%)
`build/block-editor/style.css`	10.5 kB	+153 B (1%)
`build/block-library/editor-rtl.css`	7.66 kB	-9 B (0%)
`build/block-library/editor.css`	7.66 kB	-8 B (0%)
`build/block-library/index.js`	116 kB	+217 B (0%)
`build/block-library/style-rtl.css`	7.49 kB	+2 B (0%)
`build/block-library/style.css`	7.5 kB	+3 B (0%)
`build/blocks/index.js`	57.6 kB	+3 B (0%)
`build/components/index.js`	191 kB	+817 B (0%)
`build/data/index.js`	8.22 kB	-4 B (0%)
`build/edit-post/index.js`	90.9 kB	+8 B (0%)
`build/edit-post/style-rtl.css`	8.54 kB	-136 B (1%)
`build/edit-post/style.css`	8.54 kB	-133 B (1%)
`build/edit-site/index.js`	4.63 kB	+38 B (0%)
`build/edit-site/style-rtl.css`	2.51 kB	-257 B (10%)	👏
`build/edit-site/style.css`	2.51 kB	-255 B (10%)	👏
`build/edit-widgets/index.js`	4.42 kB	+66 B (1%)
`build/edit-widgets/style-rtl.css`	2.59 kB	-207 B (7%)	✅
`build/edit-widgets/style.css`	2.58 kB	-207 B (8%)	✅
`build/editor/index.js`	44.6 kB	-2 B (0%)
`build/format-library/index.js`	7.6 kB	-1 B
`build/i18n/index.js`	3.48 kB	+38 B (1%)
`build/priority-queue/index.js`	780 B	-99 B (12%)	👏
`build/redux-routine/index.js`	2.84 kB	-2 B (0%)
`build/rich-text/index.js`	14.3 kB	-2 B (0%)
`build/url/index.js`	4 kB	+2 B (0%)
`build/warning/index.js`	1.14 kB	+1 B

ℹ️ View Unchanged

Filename	Size	Change
`build/api-fetch/index.js`	3.39 kB	0 B
`build/blob/index.js`	620 B	0 B
`build/block-directory/style-rtl.css`	760 B	0 B
`build/block-directory/style.css`	760 B	0 B
`build/block-library/theme-rtl.css`	669 B	0 B
`build/block-library/theme.css`	671 B	0 B
`build/block-serialization-default-parser/index.js`	1.65 kB	0 B
`build/block-serialization-spec-parser/index.js`	3.1 kB	0 B
`build/components/style-rtl.css`	15.5 kB	0 B
`build/components/style.css`	15.5 kB	0 B
`build/compose/index.js`	5.76 kB	0 B
`build/core-data/index.js`	10.5 kB	0 B
`build/data-controls/index.js`	1.03 kB	0 B
`build/date/index.js`	5.37 kB	0 B
`build/deprecated/index.js`	772 B	0 B
`build/dom-ready/index.js`	568 B	0 B
`build/dom/index.js`	3.06 kB	0 B
`build/editor/editor-styles-rtl.css`	325 B	0 B
`build/editor/editor-styles.css`	327 B	0 B
`build/editor/style-rtl.css`	4.01 kB	0 B
`build/editor/style.css`	4 kB	0 B
`build/element/index.js`	4.45 kB	0 B
`build/escape-html/index.js`	733 B	0 B
`build/format-library/style-rtl.css`	502 B	0 B
`build/format-library/style.css`	502 B	0 B
`build/hooks/index.js`	1.92 kB	0 B
`build/html-entities/index.js`	622 B	0 B
`build/is-shallow-equal/index.js`	710 B	0 B
`build/keyboard-shortcuts/index.js`	2.3 kB	0 B
`build/keycodes/index.js`	1.68 kB	0 B
`build/list-reusable-blocks/index.js`	2.99 kB	0 B
`build/list-reusable-blocks/style-rtl.css`	226 B	0 B
`build/list-reusable-blocks/style.css`	226 B	0 B
`build/media-utils/index.js`	4.85 kB	0 B
`build/notices/index.js`	1.57 kB	0 B
`build/nux/index.js`	3.02 kB	0 B
`build/nux/style-rtl.css`	616 B	0 B
`build/nux/style.css`	613 B	0 B
`build/plugins/index.js`	2.54 kB	0 B
`build/primitives/index.js`	1.49 kB	0 B
`build/server-side-render/index.js`	2.54 kB	0 B
`build/shortcode/index.js`	1.7 kB	0 B
`build/token-list/index.js`	1.27 kB	0 B
`build/viewport/index.js`	1.61 kB	0 B
`build/wordcount/index.js`	1.18 kB	0 B

_{compressed-size-action}

aduth · 2020-02-26T19:05:32Z

packages/url/src/is-url.native.js

@@ -1,4 +1,4 @@
-const URL_REGEXP = /((([A-Za-z]{3,9}:(?:\/\/)?)(?:[\-;:&=\+\$,\w]+@)?[A-Za-z0-9\.\-]+|(?:www\.|[\-;:&=\+\$,\w]+@)[A-Za-z0-9\.\-]+)(?::\d{2,5})?((?:\/[\+~%\/\.\w\-_]*)?\??(?:[\-\+=&;%@\.\w_]*)#?(?:[\.\!\/\\\w]*))?)/i;
+const URL_REGEXP = /((([A-Za-z]{3,9}:(?:\/\/?\/)?)(?:[\-;:&=\+\$,\w]+@)?[A-Za-z0-9\.\-]+|(?:www\.|[\-;:&=\+\$,\w]+@)[A-Za-z0-9\.\-]+)(?::\d{2,5})?((?:\/[\+~%\/\.\w\-_]*)?\??(?:[\-\+=&;%@\.\w_]*)#?(?:[\.\!\/\\\w]*))?)/i;


In wondering whether the proposed revisions are in-fact technically valid, it eventually clicked with me that the extra slash is meant to represent the path name (e.g. file: is the protocol, then //, then /foo constituting a root-relative path). So in looking at how this is implemented, I wonder if this is something where we should be accounting for this as part of the pattern matching for the path, and not strictly as an optional extra slash following the //. The pattern is a bit unwieldy to untangle, so it's not immediately clear to me where this change would need to occur.

It's also a bit awkward that it's the second of three slashes which is made to be optional. Though I expect it works just the same.

Suggested change

const URL_REGEXP = /((([A-Za-z]{3,9}:(?:\/\/?\/)?)(?:[\-;:&=\+\$,\w]+@)?[A-Za-z0-9\.\-]+|(?:www\.|[\-;:&=\+\$,\w]+@)[A-Za-z0-9\.\-]+)(?::\d{2,5})?((?:\/[\+~%\/\.\w\-_]*)?\??(?:[\-\+=&;%@\.\w_]*)#?(?:[\.\!\/\\\w]*))?)/i;

const URL_REGEXP = /((([A-Za-z]{3,9}:(?:\/\/\/?)?)(?:[\-;:&=\+\$,\w]+@)?[A-Za-z0-9\.\-]+|(?:www\.|[\-;:&=\+\$,\w]+@)[A-Za-z0-9\.\-]+)(?::\d{2,5})?((?:\/[\+~%\/\.\w\-_]*)?\??(?:[\-\+=&;%@\.\w_]*)#?(?:[\.\!\/\\\w]*))?)/i;

Noting that the // is technically considered optional based on this pattern, I wonder if the following should be considered valid, as omitting the optional //, but still root-relative:

file:/foo

In the browser, this is considered valid, and evaluates the same as file:///foo:

new URL( 'file:/ok' ).pathname // "/ok"

Under this proposed implementation, I expect that would not be supported.

Possibly related contrary information for the above-given example:

If url’s scheme is "file", then:

If remaining does not start with "//", validation error.

https://url.spec.whatwg.org/#scheme-state

Noting that the // is technically considered optional based on this pattern, I wonder if the following should be considered valid, as omitting the optional //, but still root-relative:

At least based on relevant specification, it is optional:

If c is U+002F (/), then set state to authority state.

Otherwise, set state to path state, and decrease pointer by one.

https://url.spec.whatwg.org/#path-or-authority-state

The diagram here is a little easier to interpret, though technically not the specification we adhere to:

https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Generic_syntax

If the extra slash we're accounting for is relevant for the path, then the pattern should match it separately from the optional authority //.

Edit: Oof, this gets really confusing, when you consider that http: is a "special scheme", and thus it would be subject to special authority slashes state, and omitting // would be deemed invalid.

@aduth I did some changes to support the case you mention, have a look to see if you agree.

@SergioEstevao I'm not seeing any changes. Did you push them? GitHub was having an outage incident this morning, so possible that it was missed?

My previous comments were admittedly a bit of a winding brain dump. In short, it's not clear to me if we should consider file:/foo as "valid". It's tolerated by the URL constructor and thus would return true in the browser, so there's an argument in favor of consistency. But as I note in #20435 (comment), the specification is fairly clear that a double-slash should be required to be considered valid.

Did you come up with this regular expression or find it somewhere? I was trying to decipher it yesterday, and it felt to me like there's some redundancies around www., and user/pass matching. It made it hard for me to figure out how best to change it, but ultimately it felt like the problem is in-part because there's an expectation that one or more characters exist before the pathname can start (which is what the changes here are proposing is an incorrect assumption).

Notably, the occurrences of: [A-Za-z0-9\.\-]+

I've updated the expression that @etoledom used. I think he used the one for Javascript defined here: https://urlregex.com/

I'm tempted to replace it by the one defined for HTML 5:
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

I'm tempted to replace it by the one defined for HTML 5:
^(([^:/?#]+):)?(//([^/?#]))?([^?#])(?([^#]))?(#(.))?

I've tried this and it's no good, I think the current version of the commit is the best one that matches all the situations.

aduth · 2020-02-28T14:31:47Z

The example of file:/foo has been troubling me. Digging into it further, I have the following observations:

Per specification, it is technically invalid, in that the parser should trigger a validation error (source).
- If url’s scheme is "file", then: If remaining does not start with "//", validation error.
However, it is still parseable, despite the validation error (source).
Notably, the parser does not fail when given this input (source).
According to specification, a valid URL string is one which "would not trigger a validation error or failure" (source).
Thus, our contract of isURL as conforming to the standard definition of a valid URL string is incorrect, as we only appear to test whether a parse would fail, and not whether it triggers a validation error.
Pragmatically speaking, the URL constructor on the web does not give us low-level access to whether a validation error occurred, and thus we may just want to be content with this.

Action steps:

Since the tests added in this pull request are in line with what we expect would hold true in an implementation of validating parser failure, it is fine as proposed.
- Aside: I still might like to see some further changes to the implementation, but those could be considered future refactorings.
Separately, we may want to update or clarify isURL documentation to clarify that we do not account for validation errors.

aduth · 2020-02-28T14:33:55Z

I'm interested to see your commit at 7ddfe49, because I was wondering about this yesterday as well, given that we have test cases which include a substring of a valid URL, but that we assert to yield false from the function:

gutenberg/packages/url/src/test/index.test.js

Line 52 in 5fde188

[ 'Go here: http://wordpress.org' ],

I assumed that it was the additional conditions we apply on the matched results that accounted for this:

gutenberg/packages/url/src/is-url.native.js

Line 21 in 5fde188

return match !== null && match.length >= 1 && match[ 0 ] === url;

Thus, I wonder: If we're changing the regular expression to check that the given string starts and ends as a URL pattern, is it now enough to just URL_REGEXP.test( url ) ? Or otherwise simplify this condition?

SergioEstevao · 2020-02-28T14:38:13Z

Thus, I wonder: If we're changing the regular expression, is it now enough to just URL_REGEXP.test( url ) ? Or otherwise simplify this condition?

@aduth Good point I updated the code to use the simplified condition.

aduth

Based on tests and assessment of spec-conformance at #20435 (comment), I think this looks fine. 👍

Fixed isURL regex to take account file urls.

40f5560

Added unit test to validate isURL when parsing file URLs.

SergioEstevao added [Type] Bug An existing feature does not function as intended Mobile App - i.e. Android or iOS Native mobile impl of the block editor. (Note: used in scripts, ping mobile folks to change) labels Feb 25, 2020

SergioEstevao added this to the Future milestone Feb 25, 2020

SergioEstevao requested a review from etoledom February 25, 2020 13:24

SergioEstevao requested a review from talldan as a code owner February 25, 2020 13:24

aduth reviewed Feb 26, 2020

View reviewed changes

SergioEstevao added 2 commits February 27, 2020 16:17

Change regex to support url with the format file:/

024149c

Check for start and end of line.

7ddfe49

Simplify test now that we test for start and end of line in regex.

f9be985

aduth approved these changes Feb 28, 2020

View reviewed changes

SergioEstevao merged commit 251bab4 into master Feb 28, 2020

SergioEstevao deleted the rnmobile/fix_isURL_for_file_urls branch February 28, 2020 14:59

aduth mentioned this pull request Feb 28, 2020

URL: Use test data from web-platform-tests for isURL spec conformance #20534

Closed

youknowriad modified the milestones: Future, Gutenberg 7.7 Mar 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed isURL regex to take account file urls. #20435

Fixed isURL regex to take account file urls. #20435

SergioEstevao commented Feb 25, 2020

github-actions bot commented Feb 25, 2020 •

edited

Loading

aduth Feb 26, 2020

aduth Feb 26, 2020

aduth Feb 26, 2020 •

edited

Loading

SergioEstevao Feb 27, 2020

aduth Feb 27, 2020

aduth Feb 27, 2020

aduth Feb 27, 2020

SergioEstevao Feb 27, 2020 •

edited

Loading

SergioEstevao Feb 28, 2020

SergioEstevao Feb 28, 2020

aduth commented Feb 28, 2020

aduth commented Feb 28, 2020 •

edited

Loading

SergioEstevao commented Feb 28, 2020

aduth left a comment

		@@ -1,4 +1,4 @@
		const URL_REGEXP = /((([A-Za-z]{3,9}:(?:\/\/)?)(?:[\-;:&=\+\$,\w]+@)?[A-Za-z0-9\.\-]+\|(?:www\.\|[\-;:&=\+\$,\w]+@)[A-Za-z0-9\.\-]+)(?::\d{2,5})?((?:\/[\+~%\/\.\w\-_])?\??(?:[\-\+=&;%@\.\w_])#?(?:[\.\!\/\\\w]*))?)/i;
		const URL_REGEXP = /((([A-Za-z]{3,9}:(?:\/\/?\/)?)(?:[\-;:&=\+\$,\w]+@)?[A-Za-z0-9\.\-]+\|(?:www\.\|[\-;:&=\+\$,\w]+@)[A-Za-z0-9\.\-]+)(?::\d{2,5})?((?:\/[\+~%\/\.\w\-_])?\??(?:[\-\+=&;%@\.\w_])#?(?:[\.\!\/\\\w]*))?)/i;

Fixed isURL regex to take account file urls. #20435

Fixed isURL regex to take account file urls. #20435

Conversation

SergioEstevao commented Feb 25, 2020

Description

How has this been tested?

Screenshots

Types of changes

Checklist:

github-actions bot commented Feb 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aduth Feb 26, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SergioEstevao Feb 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aduth commented Feb 28, 2020

aduth commented Feb 28, 2020 • edited Loading

SergioEstevao commented Feb 28, 2020

aduth left a comment

Choose a reason for hiding this comment

github-actions bot commented Feb 25, 2020 •

edited

Loading

aduth Feb 26, 2020 •

edited

Loading

SergioEstevao Feb 27, 2020 •

edited

Loading

aduth commented Feb 28, 2020 •

edited

Loading