-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/ime iob fix #834
Fix/ime iob fix #834
Conversation
…r childStyle are null
…passed remain the same per each call
return true | ||
} | ||
|
||
if (ed1.actionId == ed2.actionId |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np: Perhaps some comments here can be helpful to provide context on this comparison
@@ -41,7 +41,9 @@ class CssUnderlinePlugin : ISpanPostprocessor, ISpanPreprocessor { | |||
if (hiddenSpan.TAG == SPAN_TAG) { | |||
val parentStyle = hiddenSpan.attributes.getValue(CssStyleFormatter.STYLE_ATTRIBUTE) | |||
val childStyle = calypsoUnderlineSpan.attributes.getValue(CssStyleFormatter.STYLE_ATTRIBUTE) | |||
hiddenSpan.attributes.setValue(CssStyleFormatter.STYLE_ATTRIBUTE, CssStyleFormatter.mergeStyleAttributes(parentStyle, childStyle)) | |||
if (parentStyle != null && childStyle != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this part of the InputConnection issue, or solely part of #832 and just bundled with this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, makes sense. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that the PR has been reverted, the guard here also got reverted. So, #831 got opened again but I think it makes sense to re-address it on its own now. Can you wrangle that @mkevins ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that this particular commit was included in both #832, and #834, and was accidentally reverted in the revert of #834, so we need to revert the revert. Does that match your understanding @hypest ? I've created this branch here: https://github.com/wordpress-mobile/AztecEditor-Android/compare/issue/831-css-underline-illegal-state-revert-revert, but so far I've not been successful in reproducing the issue on develop using the monkeys (adb -s emulator-5554 shell monkey -p org.wordpress.aztec -v 1000000
). Do we have any documented steps to manually reproduce the crash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we need to revert the revert. Does that match your understanding @hypest ?
Heh, I wouldn't just call it "revert the revert" anymore as it's not clear if it would bring back other things besides the guard but, judging from the branch you shared I'd say that yeah, re-instating the guard is the goal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any documented steps to manually reproduce the crash?
From what I recall, there weren't steps to reliably reproduce the crash, not sure if @mzorz has a better recollection on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋 ok been looking into this, this is how it goes: the reverted code made this issue reappear #831, and #832 has the fix we need to re-apply so, basically adding that check back will prevent the crash from happening (that means it is my understanding @mkevins 's branch has the correct code to re-apply the condition check 👍 ).
Also back then, I wasn't able to reproduce manually, but only when running the monkeys.
I'd say let's create a PR from @mkevins branch and let's apply it. Thanks for looking into this @hypest and @mkevins
Great work in this investigation @mzorz 🎉 Nice work on logging those details! Very clever way of "peeking under the hood". I found it useful to redirect Also, at one point, I had closed the emulator while the monkey was running, forgetting that the monkey process is actually running on the emulator, not the host. I was reminded of this the next time I launched that particular image. 😆 I left a few comments in the code. I had a difficult time reproducing the specific IOB crash on Android 9.0, but the overall concept makes sense to me, as well as the workaround. I only wonder why the current behavior doesn't wreak more havoc in other scenarios. I.e. what is it about this app that surfaces this underlying Android behavior? As you mentioned earlier, the seed does not guarantee a deterministic outcome. I had some crashes both with and without this PR applied on API 27, but I could not reproduce any of mentioned crashes with or without the PR on API 29 ⃰. E.g. this was Pixel 2 emulator with API 27, seed 24004:
And this was the same run, with the PR applied:
⃰I did get an IOB crash at one point, on API 29, without the PR, but realized later the trace was on Since this touches some "deeper" parts, it may be helpful to have an extra set of eyes on this one. |
Thank you! Nice one there too
Lol - forgot to add this bit of info, we talked over DM but for reference of others reading this here, you can kill the monkey process running on the device by issuing this adb command:
I actually couldn't see this specific crash on Android 9 either (seen other weirder ones though).
And that's a good question, my take on this is I think something particular that other apps don't have here is pretty much the Android focus is always on the same
I got that one too without the patch - in fact I forgot to mention earlier I also did tests with suggestions turned off (by adding I was tempted to try and do some more research on that one as it looked a bit easier to grasp but, since it doesn't seem to be affecting many users (haven't heard of that being reported by actual users before) I only tracked it as an issue and continue onto chasing the other IOB problem. |
Thank you so much for taking the time to try and review this one @mkevins !! ❤️
💯 , hoping to get maybe @daniloercoli and / or @hypest as you find time to play with monkeys a bit? 🙏 🐒 |
Ah, thanks for the extra context. That makes sense, and I guess it's a good sign that this monkey can find all these obscure bugs.. even if they aren't currently being reported. I suppose there is always a chance a code change later on could put more traffic in these code paths, and it's good to find these kind of things beforehand. |
@@ -291,6 +294,9 @@ open class AztecText : AppCompatEditText, TextWatcher, UnknownHtmlSpan.OnUnknown | |||
|
|||
private var focusOnVisible = true | |||
|
|||
var inputConnection: InputConnection? = null | |||
var inputConnectionEditorInfo: EditorInfo? = null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should keep a WeakReference
to those objects. These are objects used deeply withing the Android editing subsystem and I'm not sure if we'll introduce other bugs if we keep strong references to them. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, updated the inputConnection
to use a WeakReference
in d20fb85.
The var inputConnectionEditorInfo
is an actual new instance that copies the attributes values by value on this line so, left as it was 👍
I wasn't able to reproduce the crash on develop (hash: 167ffe6). Still, the fix in this PR is interesting since it seems to fix the issue on your tests @mzorz . By the way @mkevins , I'm, not sure from you comments so far: does this PR fix the crash for you on any API levels? There's only one thing that bothers me and not sure how to decide upon: we're breaking the constructor's assumption that it will create a new InputConnection. That is, the framework code that calls it is most probably assuming that a new InputConnection will be created and so, not sure how disruptive it can be to break this assumption. The By the way, the original issue mentioned here (wordpress-mobile/WordPress-Android#9904) seems to be about DynamicLayout reflow, and there's at least this PR that mentions that DynamicLayout had a bug before Android P (taken from this comment). I guess what I'm trying to say is, I'd prefer if we could "focus" this fix, maybe only apply it in Android versions lower than Android P, and see if the reports for the particular subcases decrease. |
Thanks for all those good questions @hypest!
This is the full stacktrace for the IOB crash mentioned in #9904, which this PR says in its description is aiming at fixing:
This crash I managed to reproduce in my tests with monkeys (as well as the other mentioned crashes - my efforts in investigating this crash were mostly about trying to avoid hitting the other crashes first, and managed to do so by following the steps described above). So it's true there is a set of causes that make an IOB Exception occur - the investigation done for this PR only tried to isolate the one described in #9904 and the code in this PR only tries to workaround the situation in which those conditions are met. With the help of much logging using this wrapper for InputConnection (I may have forgotten to link it here), I was able to determine the relation between a So, that crash pasted here in this comment 2 paragraphs above is the particular crash this PR aims at fixing, but the true information about context of when it's about to happen is given by the logs one can see once implementing the wrapper. Note, the wrapper is not part of the solution, it was just a tool to be able to gain more insight on why this was happening.
The docs say here:
It doesn't say anything about an assumption that it should create a new one each time it gets called. I'll take documentation may not be updated or not directly reflecting reality, but also we could argue reality is what we can experiment and observe for sure and not what documentation or method names suggest 😅 (this is supposed to be nothing else but a funny comment BTW) So, the InputConnection we're returning is that of AztecText (actually, its super class EditText) first, and then keeping its reference. When the IME changes, we can tell them to use the already created InputConnection.
True, that's another different issue. This PR does not intend to fix that one. In fact, it was sometimes hard to be able to get the crash I was investigating without getting that other crash first, as described above.
💯 I think I haven't seen it on Android 9, and @mkevins apparently has seen it once as per this:
It would look like API 29 is more stable in this regard so, if we'd like to focus this better, I'd say we can make the code run conditionally only for Android 8.0 and 8.1. |
Hi @mzorz 👋 My apologies for the long delay in getting back on this one. Do we have any numbers to guide us on this? I certainly like the idea of limiting the fix to affected versions only, since the workaround has the potential for side effects undocumented in the Android source code and docs. Also, do you know if we have a way to try the fix in an extremely "conservative" way (i.e. very limited rollout), to observe effectiveness in a low-risk way? |
At the details link at https://sentry.io/share/issue/25c5482018784560b498f2fddb147974/ you can see that in the last 90 days there were ~2000 crash events logged by Sentry and the percentages of OSs affected were:
I can check with our release managers about this, however, I think rolling out a change that's ready to revert in a hotfix would be preferable in this case because it would be far easier to measure the impact using a tool such as Sentry. |
|
and
Judging from these stats Sheri shared in the comment above, I'd suggest we run the code conditionally for Android v7.x only, and see if the v7.x stats decrease. We can roll this out in the wild for two weeks (one sprint or app release cycle) and see. @designsimply , can you also share here the stats for just the last 15days perhaps, so we can compare at the end of the app release cycle? |
Thank you all for keeping up with this from where we left! 🙇
Agreed on that approach @hypest 👍 just a note though - Sentry reports the same problem as a different issue in https://sentry.io/share/issue/81b90713f6fe4893b463255d6895b21b/ (first reported here). In this other issue, the affected platform is 100% Android 8.0:
Affecting 314 users and ~1.8k events in the past 14 days (that's more than the 62% of 340 users / 2000 events happening on Android 7.0). I'd lean towards trying on Android 8 (also since it's where I based most of my monkey tests setup on). I'll make the version check and update here 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limiting to most affected platform is nice, and I like the use of WeakReference
as well. Looks good to try, and observe the crash stats as it rolls out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes @mzorz !
Let's give this focused-to-Android-8.0.0 solution a try. I'll 👍 the PR too. Good luck!
Note, I did encounter the crash @mkevins saw and mentioned here. Kept it in a gist for reference, just in case: https://gist.github.com/hypest/011bc56b85980cdef7e7ece1e57a1664
I'll merge this PR and also include it in the currently open gutenberg-mobile release. |
Fix
This PR aims at fixing or alleviating the crash being observed in wordpress-mobile/WordPress-Android#9904
Preamble
Long story short as per what I could observe: each time focus is changed, or when the input method is changed (for example, you change the keyboard or tap the microphone on the keyboard), this will create a new input method and the actively focused EditText will be called on its
onCreateInputConnection
. Here, the base implementation (same one as AztecText is using, as it's not overriding that method) will create a new instance and return that one.By using an
InputConnectionWrapper
and plenty of logs I was able to identify the following situation:Then we have this: a
beginBatchEdit
is followed by a constructor call. This means, a newInputConnection
is going to be established, but also a batch edit has already started on the InputConnection that is being detached.This was followed by many entries like this one:
At some point this can be followed by an IndexOutOfBounds crash, for example looking in the logs here:
These logs were added by using an
InputConnectionWrapper
that didn't affect behavior (just added the logs).This piece here shows an insertion of a $ character at position 718. When an insertion / addition happens, the new length should be added by one and the new cursor position should be also augmented, as it will stay to the right of the appended/inserted character. The problem we can see here in that log snippet is newCursorPosition should have moved the selection by one (1), but still previousSelectionStart and previousSelectionEnd is 718, and currentSelectionStart after moving should be 718+1=719, but instead it's still 718. See how things can go wrong there? On the other runs where no new constructor call is seen in between, things are added up correctly.
What this PR does
This PR just keeps a reference for the latest
InputConnection
created, as long as no new attributes for a newInputConnection
to be created are sent (so we're working under the premise that there is no need to create a newInputConnection
if the parameters passed are the same as per the current one).Test
have been trying to test this using monkey https://developer.android.com/studio/test/monkey, you can test it the same way issuing a command like this one from Terminal:
adb -s emulator-5554 shell monkey -p org.wordpress.aztec -v 10000
To make conditions more convenient for this crash to happen:
https://github.com/wordpress-mobile/AztecEditor-Android/blob/develop/app/src/main/kotlin/org/wordpress/aztec/demo/MainActivity.kt#L748
(just comment out that line)
android:inputType="textNoSuggestions"
add this to the xml https://github.com/wordpress-mobile/AztecEditor-Android/blob/develop/app/src/main/res/layout-v17/activity_main.xml#L26echo -n '' | pbcopy
on a terminal window), then run:adb -s emulator-5554 shell monkey -p org.wordpress.aztec -v 10000
Alternatively, you can also try pinning the Aztec demo screen so no unneeded app switches happen, and in this case do not disable sourceView, as you'll want to have focus to alternate so the new InputConnection is created when switching between EditText:
If you need to kill the monkey process running in the device/emulator, issue the following command:
adb shell ps | awk '/com.android.commands.monkey/ { system("adb shell kill " $2) }'
Make sure strings will be translated:
strings.xml
as a part of the integration PR.