Diff Match Patch #215

theck13 · 2019-12-13T04:30:59Z

Fix

Update the diff_match_patch class with the Java changes from google/diff-match-patch#80.

Review

Only one developer is required to review these changes, but anyone can perform the review.

dmsnell · 2019-12-28T17:44:38Z

Simperium/src/main/java/name/fraser/neil/plaintext/diff_match_patch.java

+            throw new IllegalArgumentException(
+                    "Invalid number in diff_fromDelta: " + tokens[x + 2].substring(1), e);
+          }
+          tokens[x + 2] = tokens[x + 2].charAt(0) + String.valueOf(m + 1);


it's my understanding that @jleandroperez did not incorporate this into the iOS library, meaning that while I'm pretty confident in it, it's untested in practice

While testing these changes, the (null) part was not removed from a corrupted note with the high-surrogate (null) low-surrogate pattern. According to the comment above this block, this code should be attempting to repair notes with that corrupted content, right?

Here is how I tested it.

Create corrupted note using another platform.

Launch app and open corrupted note.

Edit corrupted note before or after corruption.

The ChannelTest.testRequestObjectForInvalidChange() test is failing when running ./gradlew connectedCheck on this branch. Interestingly, removing the block in question here makes the test pass.

@theck13 @dmsnell reason why I didn't add the (repair) part of the fix, is that:

Once a string is badly split, (null) is generated by the iOS SDK.

And we do get data loss (the (null) is sent over the wire).

Meaning that... once the issue shows up, it's irrecoverable. That is, as long as the badly split string is generated ios-side.

Now, that being said, I couldn't trigger any test case in which a malformed string could be patched up, other than unit tests. For that reason, for the time being, I've opted to change as little as possible in DMP (it's been quite stable for the past couple releases!)

That's fine @jleandroperez - the patch doesn't attempt to recover lost data - it attempts to undo the change vs. leave the unexpected (null) in there - something I'm willing to acknowledge is more of a "how do you prefer to proceed when things are broken" question than "this is the right choice"

that being said I'm surprised that tests failed and I'll be looking into the code to try and ifgure out what's up, also surprised that it didn't remove the (null)

my guess is that I got something wrong in the Java code but I do have at least one test confirming the behavior in the diff-match-patch library PR

Create corrupted note using another platform.
Launch app and open corrupted note.

After investigating why we had unexpected corruption on Android and pinpointing sqlite as the perpetrator, I believe that it makes sense that the reason we're not seeing the (null) issue fixed here is because we're not seeing the corrupted patch come in. This fix will only adjust patching which when applied would produce that bug and it appears like we're starting with a copy of the note from the server which won't present the buggy patch.

In other words, in those steps I don't think we're reproducing the (null) bug.

dmsnell · 2019-12-28T17:48:06Z

Simperium/src/main/java/name/fraser/neil/plaintext/diff_match_patch.java

+      }
+      throw new IllegalArgumentException();
+    }
+    return decoded.toString();


I'd definitely appreciate it if you could scrutinize this block @theck13. I'm making some assumptions about how URLDecoder() works and it's worth pointing out that this entirely replaces that behavior. In the JavaScript library we're only switching over to the custom parser once the native one fails with the Unicode error.

I ran the speedtest and couldn't find any significant performance impact but I wasn't sure what exceptional cases could exist here.

I'm making some assumptions about how URLDecoder() works

Did you look at the URLDecoder.decode(String s, String enc) source code?

In the JavaScript library we're only switching over to the custom parser once the native one fails with the Unicode error.

Do you mean something like URLDecoder.decode(String s, String enc) when you refer to "the native one" above? If so, we could do something similar by trying URLDecoder.decode(param, "UTF-8") first then reverting to decodeURI(param) if the try fails.

try { param = URLDecoder.decode(param, "UTF-8"); } catch (UnsupportedEncodingException e) { param = this.decodeURI(param); } catch (IllegalArgumentException e) { // Malformed URI sequence. throw new IllegalArgumentException( "Illegal escape in diff_fromDelta: " + param, e); }

I tried exploring using another decoder but didn't get anything working - it was mainly due to the combined work of percent-encoding decoding and UTF-8 decoding.

if that's working for you it'd make the patch better. we would want to try/catch this.decodeURI() as well because it can also throw upon malformed input

we would want to try/catch this.decodeURI() as well because it can also throw upon malformed input

I'm not sure I understand this part. Are you saying we should add UnsupportedEncodingException to the second catch since this.decodeURI() could throw that exception as well?

The way that snippet above is written, the param = this.decodeURI(param); statement will only be run if the param = URLDecoder.decode(param, "UTF-8"); throws an UnsupportedEncodingException. Isn't that what we want?

we have two levels of exceptions here and it's confusing.

we can percent-encode a surrogate half's code unit as if it were valid Unicode and the UTF-8 algorithm can convert that value into three percent-hex-digit-pairs.

\ud83c for example would turn into %ED%A0%BC even though what we think would be the corresponding code point, 0xD83C, is marked as entirely off-limits because it would cause confusion with surrogate pairs in UTF-16.

that being said, there's no fundamental issue in encoding that number as there is with encoding negative numbers or numbers bigger than 0x10000000

if we had \ud83c\udd70 as two UTF16 code units then they would represent \U0001F70 as the code point and that would encode into four percent-hex-digit-pairs %F0%9F%85%B0

so this is the kind of exception we want to handle here. we specifically don't want to fail just because someone encoded half a character. we can recover from that.

on the other hand, there are structural issues we can't deal with, such as if someone sends %123 or %F%D%C and there are other issues such as if they send an invalid byte sequence through the percent-hex-digit-pairs. for instance, if we got %30 instead of %F0 as the first pair in that encoding - %30%9F%85%B0 - then we wouldn't be able to decode it according to the algorithm because it would specify the wrong continuation and length bits.

does that help clarify what's happening? the main point is that this.decodeURI() can also fail for reasons that URLDecoder.decode() would fail which also have nothing to do with surrogates

dmsnell · 2020-01-15T03:17:50Z

@theck13 I've reconfirmed all the tests in google/diff-match-patch#80 and copied the entire Java file from there into here. also I rebased against develop to incorporate the "ignore the corrupted ghost" changes into this branch. if you think I shouldn't have done this we can git reset --hard f705baac94ec7e43976ebc8ef9650eb5d3a5f11f to where it was.

there are other updates from diff-match-patch including a couple changed data structures. if this is an "update the library" PR those are probably welcome, but you can make the call if you'd rather split them out; I'm happy to backup and manually apply the changes from google/diff-match-patch#80 specifically

dmsnell · 2020-01-16T00:29:22Z

Merging as discussed in Slack

theck13 added the Bug label Dec 13, 2019

theck13 requested a review from dmsnell December 13, 2019 04:30

theck13 force-pushed the diff-match-patch branch from 99922ed to f705baa Compare December 27, 2019 16:38

dmsnell reviewed Dec 28, 2019

View reviewed changes

theck13 and others added 3 commits January 14, 2020 20:11

Update diff match patch with decode URI and H(null)L pattern guard

a5eb28a

Update version from 0.9.3 to 0.9.4

91bd10e

Remove attempted H(null)L fix

977ace2

dmsnell force-pushed the diff-match-patch branch from f705baa to 977ace2 Compare January 15, 2020 03:15

dmsnell merged commit 89ecb33 into develop Jan 16, 2020

dmsnell deleted the diff-match-patch branch January 16, 2020 00:30

theck13 mentioned this pull request Jan 16, 2020

Diff Match Patch Automattic/simplenote-android#924

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diff Match Patch #215

Diff Match Patch #215

theck13 commented Dec 13, 2019 •

edited

Loading

dmsnell Dec 28, 2019

theck13 Jan 3, 2020

theck13 Jan 3, 2020

jleandroperez Jan 7, 2020

dmsnell Jan 7, 2020

dmsnell Jan 14, 2020

dmsnell Dec 28, 2019

theck13 Jan 3, 2020

dmsnell Jan 3, 2020

theck13 Jan 3, 2020

dmsnell Jan 3, 2020

dmsnell commented Jan 15, 2020

dmsnell commented Jan 16, 2020

Diff Match Patch #215

Diff Match Patch #215

Conversation

theck13 commented Dec 13, 2019 • edited Loading

Fix

Review

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmsnell commented Jan 15, 2020

dmsnell commented Jan 16, 2020

theck13 commented Dec 13, 2019 •

edited

Loading