Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/ime iob fix #834

Merged
merged 4 commits into from
Oct 16, 2019
Merged

Fix/ime iob fix #834

merged 4 commits into from
Oct 16, 2019

Conversation

mzorz
Copy link
Contributor

@mzorz mzorz commented Jul 26, 2019

Fix

This PR aims at fixing or alleviating the crash being observed in wordpress-mobile/WordPress-Android#9904

Preamble

Long story short as per what I could observe: each time focus is changed, or when the input method is changed (for example, you change the keyboard or tap the microphone on the keyboard), this will create a new input method and the actively focused EditText will be called on its onCreateInputConnection. Here, the base implementation (same one as AztecText is using, as it's not overriding that method) will create a new instance and return that one.

By using an InputConnectionWrapper and plenty of logs I was able to identify the following situation:

2019-07-24 09:36:09.837 19641-19641/org.wordpress.aztec E/WordPress-EDITOR: ATICW beginBatchEdit
 2019-07-24 09:36:09.996 19641-19641/org.wordpress.aztec E/WordPress-EDITOR: ATICW constructor
 2019-07-24 09:36:10.058 19641-19641/org.wordpress.aztec W/IInputConnectionWrapper: endBatchEdit on inactive InputConnection
 2019-07-24 09:36:10.061 19641-19641/org.wordpress.aztec E/WordPress-EDITOR: ATICW constructor
 2019-07-24 09:36:10.063 19641-19641/org.wordpress.aztec W/IInputConnectionWrapper: beginBatchEdit on inactive InputConnection
 2019-07-24 09:36:10.063 19641-19641/org.wordpress.aztec W/IInputConnectionWrapper: getTextBeforeCursor on inactive InputConnection
 2019-07-24 09:36:10.068 19641-19659/org.wordpress.aztec D/EGL_emulation: eglMakeCurrent: 0xa6b04240: ver 3 0 (tinfo 0xa6b03280)
 2019-07-24 09:36:10.079 19641-19641/org.wordpress.aztec W/IInputConnectionWrapper: getTextAfterCursor on inactive InputConnection

Then we have this: a beginBatchEdit is followed by a constructor call. This means, a new InputConnection is going to be established, but also a batch edit has already started on the InputConnection that is being detached.
This was followed by many entries like this one:

2019-07-24 09:36:10.079 19641-19641/org.wordpress.aztec W/IInputConnectionWrapper: getTextAfterCursor on inactive InputConnection

At some point this can be followed by an IndexOutOfBounds crash, for example looking in the logs here:

2019-07-24 16:28:41.219 12148-12148/org.wordpress.aztec E/WordPress-EDITOR: ATICW beginBatchEdit
2019-07-24 16:28:41.219 12148-12148/org.wordpress.aztec E/WordPress-EDITOR: ATICW commitText: newCursorPosition: 1 - text: $ - key.length(): 1
2019-07-24 16:28:41.219 12148-12148/org.wordpress.aztec E/WordPress-EDITOR: ATICW commitText: newCursorPosition: 1 - text: $
2019-07-24 16:28:41.219 12148-12148/org.wordpress.aztec E/WordPress-EDITOR: ATICW commitText: previousSelectionStart: 718 - previousSelectionEnd: 718
2019-07-24 16:28:41.290 12148-12148/org.wordpress.aztec E/WordPress-EDITOR: ATICW constructor
2019-07-24 16:28:41.358 12148-12148/org.wordpress.aztec E/WordPress-EDITOR: ATICW commitText: currentSelectionStart: 718
2019-07-24 16:28:41.361 12148-12148/org.wordpress.aztec E/WordPress-EDITOR: ATICW commitText: noPreviousSelection: true
2019-07-24 16:28:41.361 12148-12148/org.wordpress.aztec E/WordPress-EDITOR: ATICW commitText: cursorDidNotMove: true
2019-07-24 16:28:41.361 12148-12148/org.wordpress.aztec E/WordPress-EDITOR: ATICW commitText: cursorMovedBackwardsOrAtBeginningOfInput: false
2019-07-24 16:28:41.361 12148-12148/org.wordpress.aztec W/IInputConnectionWrapper: endBatchEdit on inactive InputConnection

These logs were added by using an InputConnectionWrapper that didn't affect behavior (just added the logs).

This piece here shows an insertion of a $ character at position 718. When an insertion / addition happens, the new length should be added by one and the new cursor position should be also augmented, as it will stay to the right of the appended/inserted character. The problem we can see here in that log snippet is newCursorPosition should have moved the selection by one (1), but still previousSelectionStart and previousSelectionEnd is 718, and currentSelectionStart after moving should be 718+1=719, but instead it's still 718. See how things can go wrong there? On the other runs where no new constructor call is seen in between, things are added up correctly.

What this PR does

This PR just keeps a reference for the latest InputConnection created, as long as no new attributes for a new InputConnection to be created are sent (so we're working under the premise that there is no need to create a new InputConnection if the parameters passed are the same as per the current one).

Test

have been trying to test this using monkey https://developer.android.com/studio/test/monkey, you can test it the same way issuing a command like this one from Terminal:

adb -s emulator-5554 shell monkey -p org.wordpress.aztec -v 10000

To make conditions more convenient for this crash to happen:

  1. disable sourceView so there are no switches
    https://github.com/wordpress-mobile/AztecEditor-Android/blob/develop/app/src/main/kotlin/org/wordpress/aztec/demo/MainActivity.kt#L748
    (just comment out that line)
  2. disable suggestions so we don't hit the SuggestionPopUP crash first Crash: IndexOutOfBoundsException on SuggestionPopUpWindow #833
    android:inputType="textNoSuggestions" add this to the xml https://github.com/wordpress-mobile/AztecEditor-Android/blob/develop/app/src/main/res/layout-v17/activity_main.xml#L26
  3. clean wiped fresh Android 8.0 image emulator, clean your clipboard memory as well (issue echo -n '' | pbcopy on a terminal window), then run:
  4. run adb -s emulator-5554 shell monkey -p org.wordpress.aztec -v 10000

Alternatively, you can also try pinning the Aztec demo screen so no unneeded app switches happen, and in this case do not disable sourceView, as you'll want to have focus to alternate so the new InputConnection is created when switching between EditText:

  1. Run and pin the AztecDemo app (check this)
  2. Folow steps 2-4 above.

If you need to kill the monkey process running in the device/emulator, issue the following command:
adb shell ps | awk '/com.android.commands.monkey/ { system("adb shell kill " $2) }'

Make sure strings will be translated:

  • If there are new strings that have to be translated, I have added them to the client's strings.xml as a part of the integration PR.

return true
}

if (ed1.actionId == ed2.actionId
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np: Perhaps some comments here can be helpful to provide context on this comparison

aztec/src/main/kotlin/org/wordpress/aztec/AztecText.kt Outdated Show resolved Hide resolved
@@ -41,7 +41,9 @@ class CssUnderlinePlugin : ISpanPostprocessor, ISpanPreprocessor {
if (hiddenSpan.TAG == SPAN_TAG) {
val parentStyle = hiddenSpan.attributes.getValue(CssStyleFormatter.STYLE_ATTRIBUTE)
val childStyle = calypsoUnderlineSpan.attributes.getValue(CssStyleFormatter.STYLE_ATTRIBUTE)
hiddenSpan.attributes.setValue(CssStyleFormatter.STYLE_ATTRIBUTE, CssStyleFormatter.mergeStyleAttributes(parentStyle, childStyle))
if (parentStyle != null && childStyle != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this part of the InputConnection issue, or solely part of #832 and just bundled with this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh that's correct it's part of #832 but I needed to keep it within this patch so my monkey runs wouldn't be capped by finding the issue fixed in #832 first and then stopping

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense. 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the PR has been reverted, the guard here also got reverted. So, #831 got opened again but I think it makes sense to re-address it on its own now. Can you wrangle that @mkevins ?

Copy link
Contributor

@mkevins mkevins Feb 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that this particular commit was included in both #832, and #834, and was accidentally reverted in the revert of #834, so we need to revert the revert. Does that match your understanding @hypest ? I've created this branch here: https://github.com/wordpress-mobile/AztecEditor-Android/compare/issue/831-css-underline-illegal-state-revert-revert, but so far I've not been successful in reproducing the issue on develop using the monkeys (adb -s emulator-5554 shell monkey -p org.wordpress.aztec -v 1000000). Do we have any documented steps to manually reproduce the crash?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we need to revert the revert. Does that match your understanding @hypest ?

Heh, I wouldn't just call it "revert the revert" anymore as it's not clear if it would bring back other things besides the guard but, judging from the branch you shared I'd say that yeah, re-instating the guard is the goal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any documented steps to manually reproduce the crash?

From what I recall, there weren't steps to reliably reproduce the crash, not sure if @mzorz has a better recollection on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 ok been looking into this, this is how it goes: the reverted code made this issue reappear #831, and #832 has the fix we need to re-apply so, basically adding that check back will prevent the crash from happening (that means it is my understanding @mkevins 's branch has the correct code to re-apply the condition check 👍 ).
Also back then, I wasn't able to reproduce manually, but only when running the monkeys.
I'd say let's create a PR from @mkevins branch and let's apply it. Thanks for looking into this @hypest and @mkevins

@mkevins
Copy link
Contributor

mkevins commented Aug 2, 2019

Great work in this investigation @mzorz 🎉

Nice work on logging those details! Very clever way of "peeking under the hood". I found it useful to redirect stderr as well:
adb -s emulator-5554 shell monkey -p org.wordpress.aztec -v 10000 | monkey-log.txt 2>&1

Also, at one point, I had closed the emulator while the monkey was running, forgetting that the monkey process is actually running on the emulator, not the host. I was reminded of this the next time I launched that particular image. 😆

I left a few comments in the code. I had a difficult time reproducing the specific IOB crash on Android 9.0, but the overall concept makes sense to me, as well as the workaround. I only wonder why the current behavior doesn't wreak more havoc in other scenarios. I.e. what is it about this app that surfaces this underlying Android behavior?

As you mentioned earlier, the seed does not guarantee a deterministic outcome. I had some crashes both with and without this PR applied on API 27, but I could not reproduce any of mentioned crashes with or without the PR on API 29 ⃰.

E.g. this was Pixel 2 emulator with API 27, seed 24004:

// CRASH: org.wordpress.aztec (pid 2020)
// Short Msg: java.lang.ArrayIndexOutOfBoundsException
// Long Msg: java.lang.ArrayIndexOutOfBoundsException: length=19; index=-1
// Build Label: google/sdk_gphone_x86/generic_x86:8.1.0/OSM1.180201.007/4586646:user/release-keys
// Build Changelist: 4586646
// Build Time: 1517878968000
// java.lang.ArrayIndexOutOfBoundsException: length=19; index=-1
// 	at android.text.DynamicLayout.getBlockIndex(DynamicLayout.java:648)
// 	at android.widget.Editor.drawHardwareAccelerated(Editor.java:1703)
// 	at android.widget.Editor.onDraw(Editor.java:1672)
// 	at android.widget.TextView.onDraw(TextView.java:6882)
// 	at android.view.View.draw(View.java:19192)
// 	at android.view.View.updateDisplayListIfDirty(View.java:18142)
// 	at android.view.View.draw(View.java:18920)
// 	at android.view.ViewGroup.drawChild(ViewGroup.java:4236)
// 	at android.view.ViewGroup.dispatchDraw(ViewGroup.java:4022)
// 	at android.view.View.updateDisplayListIfDirty(View.java:18133)
// 	at android.view.View.draw(View.java:18920)
// 	at android.view.ViewGroup.drawChild(ViewGroup.java:4236)
// 	at android.view.ViewGroup.dispatchDraw(ViewGroup.java:4022)
// 	at android.view.View.draw(View.java:19195)
// 	at android.widget.ScrollView.draw(ScrollView.java:1739)
// 	at android.view.View.updateDisplayListIfDirty(View.java:18142)
// 	at android.view.View.draw(View.java:18920)
// 	at android.view.ViewGroup.drawChild(ViewGroup.java:4236)
// 	at android.view.ViewGroup.dispatchDraw(ViewGroup.java:4022)
// 	at android.view.View.updateDisplayListIfDirty(View.java:18133)
// 	at android.view.View.draw(View.java:18920)
// 	at android.view.ViewGroup.drawChild(ViewGroup.java:4236)
// 	at android.view.ViewGroup.dispatchDraw(ViewGroup.java:4022)
// 	at android.view.View.updateDisplayListIfDirty(View.java:18133)
// 	at android.view.View.draw(View.java:18920)
// 	at android.view.ViewGroup.drawChild(ViewGroup.java:4236)
// 	at android.view.ViewGroup.dispatchDraw(ViewGroup.java:4022)
// 	at android.view.View.updateDisplayListIfDirty(View.java:18133)
// 	at android.view.View.draw(View.java:18920)
// 	at android.view.ViewGroup.drawChild(ViewGroup.java:4236)
// 	at android.view.ViewGroup.dispatchDraw(ViewGroup.java:4022)
// 	at android.view.View.updateDisplayListIfDirty(View.java:18133)
// 	at android.view.View.draw(View.java:18920)
// 	at android.view.ViewGroup.drawChild(ViewGroup.java:4236)
// 	at android.view.ViewGroup.dispatchDraw(ViewGroup.java:4022)
// 	at android.view.View.updateDisplayListIfDirty(View.java:18133)
// 	at android.view.View.draw(View.java:18920)
// 	at android.view.ViewGroup.drawChild(ViewGroup.java:4236)
// 	at android.view.ViewGroup.dispatchDraw(ViewGroup.java:4022)
// 	at android.view.View.draw(View.java:19195)
// 	at com.android.internal.policy.DecorView.draw(DecorView.java:788)
// 	at android.view.View.updateDisplayListIfDirty(View.java:18142)
// 	at android.view.ThreadedRenderer.updateViewTreeDisplayList(ThreadedRenderer.java:669)
// 	at android.view.ThreadedRenderer.updateRootDisplayList(ThreadedRenderer.java:675)
// 	at android.view.ThreadedRenderer.draw(ThreadedRenderer.java:783)
// 	at android.view.ViewRootImpl.draw(ViewRootImpl.java:2992)
// 	at android.view.ViewRootImpl.performDraw(ViewRootImpl.java:2806)
// 	at android.view.ViewRootImpl.performTraversals(ViewRootImpl.java:2359)
// 	at android.view.ViewRootImpl.doTraversal(ViewRootImpl.java:1392)
// 	at android.view.ViewRootImpl$TraversalRunnable.run(ViewRootImpl.java:6752)
// 	at android.view.Choreographer$CallbackRecord.run(Choreographer.java:911)
// 	at android.view.Choreographer.doCallbacks(Choreographer.java:723)
// 	at android.view.Choreographer.doFrame(Choreographer.java:658)
// 	at android.view.Choreographer$FrameDisplayEventReceiver.run(Choreographer.java:897)
// 	at android.os.Handler.handleCallback(Handler.java:790)
// 	at android.os.Handler.dispatchMessage(Handler.java:99)
// 	at android.os.Looper.loop(Looper.java:164)
// 	at android.app.ActivityThread.main(ActivityThread.java:6494)
// 	at java.lang.reflect.Method.invoke(Native Method)
// 	at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:438)
// 	at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:807)
// 

And this was the same run, with the PR applied:

// CRASH: org.wordpress.aztec (pid 4908)
// Short Msg: java.lang.IndexOutOfBoundsException
// Long Msg: java.lang.IndexOutOfBoundsException: setSpan (569 ... 597) ends beyond length 569
// Build Label: google/sdk_gphone_x86/generic_x86:8.1.0/OSM1.180201.007/4586646:user/release-keys
// Build Changelist: 4586646
// Build Time: 1517878968000
// java.lang.IndexOutOfBoundsException: setSpan (569 ... 597) ends beyond length 569
//  at android.text.SpannableStringBuilder.checkRange(SpannableStringBuilder.java:1309)
//  at android.text.SpannableStringBuilder.setSpan(SpannableStringBuilder.java:680)
//  at android.text.SpannableStringBuilder.setSpan(SpannableStringBuilder.java:672)
//  at android.widget.Editor$SuggestionsPopupWindow.updateSuggestions(Editor.java:3719)
//  at android.widget.Editor$SuggestionsPopupWindow.show(Editor.java:3591)
//  at android.widget.Editor.replace(Editor.java:386)
//  at android.widget.Editor$3.run(Editor.java:2231)
//  at android.os.Handler.handleCallback(Handler.java:790)
//  at android.os.Handler.dispatchMessage(Handler.java:99)
//  at android.os.Looper.loop(Looper.java:164)
//  at android.app.ActivityThread.main(ActivityThread.java:6494)
//  at java.lang.reflect.Method.invoke(Native Method)
//  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:438)
//  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:807)

⃰I did get an IOB crash at one point, on API 29, without the PR, but realized later the trace was on stderr, so did not get saved to the log. Since then, I have not been able to observe the crash on API 29, neither with nor without the PR applied.

Since this touches some "deeper" parts, it may be helpful to have an extra set of eyes on this one.

@mzorz
Copy link
Contributor Author

mzorz commented Aug 2, 2019

Nice work on logging those details! Very clever way of "peeking under the hood". I found it useful to redirect stderr as well:
adb -s emulator-5554 shell monkey -p org.wordpress.aztec -v 10000 | monkey-log.txt 2>&1

Thank you! Nice one there too

Also, at one point, I had closed the emulator while the monkey was running, forgetting that the monkey process is actually running on the emulator, not the host. I was reminded of this the next time I launched that particular image. 😆

Lol - forgot to add this bit of info, we talked over DM but for reference of others reading this here, you can kill the monkey process running on the device by issuing this adb command:
adb shell ps | awk '/com.android.commands.monkey/ { system("adb shell kill " $2) }'.

I left a few comments in the code. I had a difficult time reproducing the specific IOB crash on Android 9.0, but the overall concept makes sense to me, as well as the workaround.

I actually couldn't see this specific crash on Android 9 either (seen other weirder ones though).

I only wonder why the current behavior doesn't wreak more havoc in other scenarios. I.e. what is it about this app that surfaces this underlying Android behavior?

And that's a good question, my take on this is I think something particular that other apps don't have here is pretty much the Android focus is always on the same EditText (AztecText) here, which makes things quite different from other typical screens where you have some buttons maybe various EditText and you won't spend much time on these screens either, as opposed to the situation of writing a blog post where the screen is the same, you may switch to different kind of inputs (voice, text, change keyboard language), things you probably wouldn't do on the same EditText normally on a typical form screen.

And this was the same run, with the PR applied:
// java.lang.IndexOutOfBoundsException: setSpan (569 ... 597) ends beyond length 569
// at android.text.SpannableStringBuilder.checkRange(SpannableStringBuilder.java:1309)
// at android.text.SpannableStringBuilder.setSpan(SpannableStringBuilder.java:680)
// at android.text.SpannableStringBuilder.setSpan(SpannableStringBuilder.java:672)
// at android.widget.Editor$SuggestionsPopupWindow.updateSuggestions(Editor.java:3719)
// at android.widget.Editor$SuggestionsPopupWindow.show(Editor.java:3591)

I got that one too without the patch - in fact I forgot to mention earlier I also did tests with suggestions turned off (by adding noSuggestions value to the inputType attribute in the manifest) to avoid getting stopped there as well. I tracked that one here as well: #833.

I was tempted to try and do some more research on that one as it looked a bit easier to grasp but, since it doesn't seem to be affecting many users (haven't heard of that being reported by actual users before) I only tracked it as an issue and continue onto chasing the other IOB problem.

@mzorz
Copy link
Contributor Author

mzorz commented Aug 2, 2019

Thank you so much for taking the time to try and review this one @mkevins !! ❤️

Since this touches some "deeper" parts, it may be helpful to have an extra set of eyes on this one.

💯 , hoping to get maybe @daniloercoli and / or @hypest as you find time to play with monkeys a bit? 🙏 🐒

@mkevins
Copy link
Contributor

mkevins commented Aug 2, 2019

I got that one too without the patch - in fact I forgot to mention earlier I also did tests with suggestions turned off (by adding noSuggestions value to the inputType attribute in the manifest) to avoid getting stopped there as well. I tracked that one here as well: #833.

Ah, thanks for the extra context. That makes sense, and I guess it's a good sign that this monkey can find all these obscure bugs.. even if they aren't currently being reported. I suppose there is always a chance a code change later on could put more traffic in these code paths, and it's good to find these kind of things beforehand.

@@ -291,6 +294,9 @@ open class AztecText : AppCompatEditText, TextWatcher, UnknownHtmlSpan.OnUnknown

private var focusOnVisible = true

var inputConnection: InputConnection? = null
var inputConnectionEditorInfo: EditorInfo? = null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should keep a WeakReference to those objects. These are objects used deeply withing the Android editing subsystem and I'm not sure if we'll introduce other bugs if we keep strong references to them. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, updated the inputConnection to use a WeakReference in d20fb85.

The var inputConnectionEditorInfo is an actual new instance that copies the attributes values by value on this line so, left as it was 👍

@hypest
Copy link
Contributor

hypest commented Aug 16, 2019

I wasn't able to reproduce the crash on develop (hash: 167ffe6).

Still, the fix in this PR is interesting since it seems to fix the issue on your tests @mzorz . By the way @mkevins , I'm, not sure from you comments so far: does this PR fix the crash for you on any API levels?

There's only one thing that bothers me and not sure how to decide upon: we're breaking the constructor's assumption that it will create a new InputConnection. That is, the framework code that calls it is most probably assuming that a new InputConnection will be created and so, not sure how disruptive it can be to break this assumption.

The IndexOutOfBounds crash seems to be a whole class of issues, with different root causes and different manifestations (some crashes are caused by the DynamicLayout class, some have the indexes only be off by one from the text's length and others have the indexes be far away for the text's length. Which particular subcase(s) do you think this fix addresses @mzorz ?

By the way, the original issue mentioned here (wordpress-mobile/WordPress-Android#9904) seems to be about DynamicLayout reflow, and there's at least this PR that mentions that DynamicLayout had a bug before Android P (taken from this comment).

I guess what I'm trying to say is, I'd prefer if we could "focus" this fix, maybe only apply it in Android versions lower than Android P, and see if the reports for the particular subcases decrease.

@mzorz
Copy link
Contributor Author

mzorz commented Aug 16, 2019

Thanks for all those good questions @hypest!

Which particular subcase(s) do you think this fix addresses?

This is the full stacktrace for the IOB crash mentioned in #9904, which this PR says in its description is aiming at fixing:

java.lang.IndexOutOfBoundsException: 105, -99
    at android.text.PackedIntVector.deleteAt(PackedIntVector.java:222)
    at android.text.DynamicLayout.reflow(DynamicLayout.java:317)
    at android.text.DynamicLayout.-wrap0
    at android.text.DynamicLayout$ChangeWatcher.reflow(DynamicLayout.java:750)
    at android.text.DynamicLayout$ChangeWatcher.onSpanChanged(DynamicLayout.java:779)
    at android.text.SpannableStringBuilder.sendSpanChanged(SpannableStringBuilder.java:1296)
    at android.text.SpannableStringBuilder.sendToSpanWatchers(SpannableStringBuilder.java:651)
    at android.text.SpannableStringBuilder.replace(SpannableStringBuilder.java:581)
    at android.text.SpannableStringBuilder.replace(SpannableStringBuilder.java:504)
    at android.text.SpannableStringBuilder.replace(SpannableStringBuilder.java:502)
    at android.view.inputmethod.BaseInputConnection.replaceText(BaseInputConnection.java:848)
    at android.view.inputmethod.BaseInputConnection.commitText(BaseInputConnection.java:197)
    at com.android.internal.widget.EditableInputConnection.commitText(EditableInputConnection.java:183)
    at com.android.internal.view.IInputConnectionWrapper.executeMessage(IInputConnectionWrapper.java:344)
    at com.android.internal.view.IInputConnectionWrapper$MyHandler.handleMessage(IInputConnectionWrapper.java:85)
    at android.os.Handler.dispatchMessage(Handler.java:108)
    at android.os.Looper.loop(Looper.java:166)
    at android.app.ActivityThread.main(ActivityThread.java:7529)
    at java.lang.reflect.Method.invoke(Method.java)
    at com.android.internal.os.Zygote$MethodAndArgsCaller.run(Zygote.java:245)
    at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:921)

This crash I managed to reproduce in my tests with monkeys (as well as the other mentioned crashes - my efforts in investigating this crash were mostly about trying to avoid hitting the other crashes first, and managed to do so by following the steps described above). So it's true there is a set of causes that make an IOB Exception occur - the investigation done for this PR only tried to isolate the one described in #9904 and the code in this PR only tries to workaround the situation in which those conditions are met.

With the help of much logging using this wrapper for InputConnection (I may have forgotten to link it here), I was able to determine the relation between a beginBatchEdit starting and endBatchEdit finishing, and seeing the change in IME connection in between was something that appeared as a pattern in my tests right a few lines before crashing.

So, that crash pasted here in this comment 2 paragraphs above is the particular crash this PR aims at fixing, but the true information about context of when it's about to happen is given by the logs one can see once implementing the wrapper. Note, the wrapper is not part of the solution, it was just a tool to be able to gain more insight on why this was happening.

There's only one thing that bothers me and not sure how to decide upon: we're breaking the constructor's assumption that it will create a new InputConnection.

The docs say here:

An editor should start by implementing View.onCreateInputConnection(EditorInfo) to return its own input connection.

It doesn't say anything about an assumption that it should create a new one each time it gets called. I'll take documentation may not be updated or not directly reflecting reality, but also we could argue reality is what we can experiment and observe for sure and not what documentation or method names suggest 😅 (this is supposed to be nothing else but a funny comment BTW)

So, the InputConnection we're returning is that of AztecText (actually, its super class EditText) first, and then keeping its reference. When the IME changes, we can tell them to use the already created InputConnection.

By the way, the original issue mentioned here (wordpress-mobile/WordPress-Android#9904) seems to be about DynamicLayout reflow, and there's at least this PR that mentions that DynamicLayout had a bug before Android P (taken from this comment).

True, that's another different issue. This PR does not intend to fix that one. In fact, it was sometimes hard to be able to get the crash I was investigating without getting that other crash first, as described above.

I guess what I'm trying to say is, I'd prefer if we could "focus" this fix, maybe only apply it in Android versions lower than Android P, and see if the reports for the particular subcases decrease.

💯 I think I haven't seen it on Android 9, and @mkevins apparently has seen it once as per this:

⃰I did get an IOB crash at one point, on API 29, without the PR, but realized later the trace was on stderr, so did not get saved to the log.
Since then, I have not been able to observe the crash on API 29, neither with nor without the PR applied.

It would look like API 29 is more stable in this regard so, if we'd like to focus this better, I'd say we can make the code run conditionally only for Android 8.0 and 8.1.

What do you think? @mkevins @hypest ?

@mkevins
Copy link
Contributor

mkevins commented Oct 7, 2019

Hi @mzorz 👋

My apologies for the long delay in getting back on this one. Do we have any numbers to guide us on this? I certainly like the idea of limiting the fix to affected versions only, since the workaround has the potential for side effects undocumented in the Android source code and docs. Also, do you know if we have a way to try the fix in an extremely "conservative" way (i.e. very limited rollout), to observe effectiveness in a low-risk way?

@designsimply
Copy link

designsimply commented Oct 8, 2019

Do we have any numbers to guide us on this?

At the details link at https://sentry.io/share/issue/25c5482018784560b498f2fddb147974/ you can see that in the last 90 days there were ~2000 crash events logged by Sentry and the percentages of OSs affected were:

  • 62% Android 7.0
  • 17% Android 6.0
  • 9% Android 7.1.1
  • 6% Android 7.1.2
  • 3% Android 7.1
  • <1% Android 6.0.1

Do you know if we have a way to try the fix in an extremely "conservative" way (i.e. very limited rollout), to observe effectiveness in a low-risk way?

I can check with our release managers about this, however, I think rolling out a change that's ready to revert in a hotfix would be preferable in this case because it would be far easier to measure the impact using a tool such as Sentry.

@designsimply
Copy link

It is possible to do a staged rollout, FWIW – we do it for our Android apps – you have to update the % by hand manually, but it is doable.

@hypest
Copy link
Contributor

hypest commented Oct 14, 2019

It would look like API 29 is more stable in this regard so, if we'd like to focus this better, I'd say we can make the code run conditionally only for Android 8.0 and 8.1.

and

in the last 90 days there were ~2000 crash events logged by Sentry and the percentages of OSs affected were:

62% Android 7.0
17% Android 6.0
9% Android 7.1.1
6% Android 7.1.2
3% Android 7.1
<1% Android 6.0.1

Judging from these stats Sheri shared in the comment above, I'd suggest we run the code conditionally for Android v7.x only, and see if the v7.x stats decrease. We can roll this out in the wild for two weeks (one sprint or app release cycle) and see. @designsimply , can you also share here the stats for just the last 15days perhaps, so we can compare at the end of the app release cycle?

@mzorz
Copy link
Contributor Author

mzorz commented Oct 14, 2019

Thank you all for keeping up with this from where we left! 🙇

Judging from these stats Sheri shared in the comment above, I'd suggest we run the code conditionally for Android v7.x only, and see if the v7.x stats decrease

Agreed on that approach @hypest 👍 just a note though - Sentry reports the same problem as a different issue in https://sentry.io/share/issue/81b90713f6fe4893b463255d6895b21b/ (first reported here).

In this other issue, the affected platform is 100% Android 8.0:

100.00% | Android 8.0.0

Affecting 314 users and ~1.8k events in the past 14 days (that's more than the 62% of 340 users / 2000 events happening on Android 7.0). I'd lean towards trying on Android 8 (also since it's where I based most of my monkey tests setup on).

I'll make the version check and update here 👍

@mzorz
Copy link
Contributor Author

mzorz commented Oct 14, 2019

Conditional Android version check added in 2169aff - tested on a few emulators to check it effectively goes through / bypasses the fix.

Ready for another 👀pass @mkevins @hypest 🙇

Copy link
Contributor

@mkevins mkevins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limiting to most affected platform is nice, and I like the use of WeakReference as well. Looks good to try, and observe the crash stats as it rolls out.

Copy link
Contributor

@hypest hypest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @mzorz !

Let's give this focused-to-Android-8.0.0 solution a try. I'll 👍 the PR too. Good luck!

Note, I did encounter the crash @mkevins saw and mentioned here. Kept it in a gist for reference, just in case: https://gist.github.com/hypest/011bc56b85980cdef7e7ece1e57a1664

@hypest
Copy link
Contributor

hypest commented Oct 16, 2019

I'll merge this PR and also include it in the currently open gutenberg-mobile release.

@hypest hypest merged commit 1f257c7 into develop Oct 16, 2019
@mzorz mzorz mentioned this pull request Dec 5, 2019
@hypest hypest deleted the fix/ime-iob-fix branch April 30, 2020 09:31
@mzorz mzorz mentioned this pull request Aug 25, 2020
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants