-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diff Match Patch #215
Diff Match Patch #215
Conversation
99922ed
to
f705baa
Compare
throw new IllegalArgumentException( | ||
"Invalid number in diff_fromDelta: " + tokens[x + 2].substring(1), e); | ||
} | ||
tokens[x + 2] = tokens[x + 2].charAt(0) + String.valueOf(m + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's my understanding that @jleandroperez did not incorporate this into the iOS library, meaning that while I'm pretty confident in it, it's untested in practice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While testing these changes, the (null)
part was not removed from a corrupted note with the high-surrogate (null) low-surrogate
pattern. According to the comment above this block, this code should be attempting to repair notes with that corrupted content, right?
Here is how I tested it.
- Create corrupted note using another platform.
- Launch app and open corrupted note.
- Edit corrupted note before or after corruption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ChannelTest.testRequestObjectForInvalidChange()
test is failing when running ./gradlew connectedCheck
on this branch. Interestingly, removing the block in question here makes the test pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@theck13 @dmsnell reason why I didn't add the (repair) part of the fix, is that:
- Once a string is badly split,
(null)
is generated by the iOS SDK. - And we do get data loss (the
(null)
is sent over the wire).
Meaning that... once the issue shows up, it's irrecoverable. That is, as long as the badly split string is generated ios-side.
Now, that being said, I couldn't trigger any test case in which a malformed string could be patched up, other than unit tests. For that reason, for the time being, I've opted to change as little as possible in DMP (it's been quite stable for the past couple releases!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine @jleandroperez - the patch doesn't attempt to recover lost data - it attempts to undo the change vs. leave the unexpected (null)
in there - something I'm willing to acknowledge is more of a "how do you prefer to proceed when things are broken" question than "this is the right choice"
that being said I'm surprised that tests failed and I'll be looking into the code to try and ifgure out what's up, also surprised that it didn't remove the (null)
my guess is that I got something wrong in the Java code but I do have at least one test confirming the behavior in the diff-match-patch
library PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create corrupted note using another platform.
Launch app and open corrupted note.
After investigating why we had unexpected corruption on Android and pinpointing sqlite
as the perpetrator, I believe that it makes sense that the reason we're not seeing the (null)
issue fixed here is because we're not seeing the corrupted patch come in. This fix will only adjust patching which when applied would produce that bug and it appears like we're starting with a copy of the note from the server which won't present the buggy patch.
In other words, in those steps I don't think we're reproducing the (null)
bug.
} | ||
throw new IllegalArgumentException(); | ||
} | ||
return decoded.toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd definitely appreciate it if you could scrutinize this block @theck13. I'm making some assumptions about how URLDecoder()
works and it's worth pointing out that this entirely replaces that behavior. In the JavaScript library we're only switching over to the custom parser once the native one fails with the Unicode error.
I ran the speedtest
and couldn't find any significant performance impact but I wasn't sure what exceptional cases could exist here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm making some assumptions about how
URLDecoder()
works
Did you look at the URLDecoder.decode(String s, String enc)
source code?
In the JavaScript library we're only switching over to the custom parser once the native one fails with the Unicode error.
Do you mean something like URLDecoder.decode(String s, String enc)
when you refer to "the native one" above? If so, we could do something similar by trying URLDecoder.decode(param, "UTF-8")
first then reverting to decodeURI(param)
if the try fails.
try {
param = URLDecoder.decode(param, "UTF-8");
} catch (UnsupportedEncodingException e) {
param = this.decodeURI(param);
} catch (IllegalArgumentException e) {
// Malformed URI sequence.
throw new IllegalArgumentException(
"Illegal escape in diff_fromDelta: " + param, e);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried exploring using another decoder but didn't get anything working - it was mainly due to the combined work of percent-encoding decoding and UTF-8 decoding.
if that's working for you it'd make the patch better. we would want to try/catch
this.decodeURI()
as well because it can also throw upon malformed input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would want to
try/catch
this.decodeURI()
as well because it can also throw upon malformed input
I'm not sure I understand this part. Are you saying we should add UnsupportedEncodingException
to the second catch
since this.decodeURI()
could throw that exception as well?
The way that snippet above is written, the param = this.decodeURI(param);
statement will only be run if the param = URLDecoder.decode(param, "UTF-8");
throws an UnsupportedEncodingException
. Isn't that what we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have two levels of exceptions here and it's confusing.
we can percent-encode a surrogate half's code unit as if it were valid Unicode and the UTF-8 algorithm can convert that value into three percent-hex-digit-pairs.
\ud83c
for example would turn into %ED%A0%BC
even though what we think would be the corresponding code point, 0xD83C
, is marked as entirely off-limits because it would cause confusion with surrogate pairs in UTF-16.
that being said, there's no fundamental issue in encoding that number as there is with encoding negative numbers or numbers bigger than 0x10000000
if we had \ud83c\udd70
as two UTF16 code units then they would represent \U0001F70
as the code point and that would encode into four percent-hex-digit-pairs %F0%9F%85%B0
so this is the kind of exception we want to handle here. we specifically don't want to fail just because someone encoded half a character. we can recover from that.
on the other hand, there are structural issues we can't deal with, such as if someone sends %123
or %F%D%C
and there are other issues such as if they send an invalid byte sequence through the percent-hex-digit-pairs. for instance, if we got %30
instead of %F0
as the first pair in that encoding - %30%9F%85%B0
- then we wouldn't be able to decode it according to the algorithm because it would specify the wrong continuation and length bits.
does that help clarify what's happening? the main point is that this.decodeURI()
can also fail for reasons that URLDecoder.decode()
would fail which also have nothing to do with surrogates
f705baa
to
977ace2
Compare
@theck13 I've reconfirmed all the tests in google/diff-match-patch#80 and copied the entire Java file from there into here. also I rebased against there are other updates from |
Merging as discussed in Slack |
Fix
Update the
diff_match_patch
class with the Java changes from google/diff-match-patch#80.Review
Only one developer is required to review these changes, but anyone can perform the review.