Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in nativeOnUrlComplete with 0.9.0 - 0.9.6 #1772

Closed
westnordost opened this issue Feb 8, 2018 · 25 comments
Closed

Crash in nativeOnUrlComplete with 0.9.0 - 0.9.6 #1772

westnordost opened this issue Feb 8, 2018 · 25 comments

Comments

@westnordost
Copy link
Contributor

westnordost commented Feb 8, 2018

The new beta of StreetComplete 4.0 is out and I can see in the Google Play console a few crashes in native code from tangram-es 0.9.0 that various people get.

The stacktrace I get looks like this:

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 0, tid: 0 >>> de.westnordost.streetcomplete <<<

backtrace:
 
  #00  pc 0000000000123b8c  /data/app/de.westnordost.streetcomplete-1/lib/arm64/libtangram.so (Java_com_mapzen_tangram_MapController_nativeOnUrlComplete+40)
 
  #01  pc 00000000004188a8  /data/app/de.westnordost.streetcomplete-1/oat/arm64/base.odex
  • It happens both with arm and arm64 architectures
  • It happens since 0.9.0
  • It seems to be independent of Android version (currently I have crash reports from any version between Android 6 and 8)
@westnordost
Copy link
Contributor Author

I am happy to see that the development of tangram-es gained some momentum again and I would love to upgrade to 0.9.6, but I am stuck on 0.8.1 because of this issue here. IIRC it happened quite often.

@matteblair
Copy link
Member

@westnordost Do you mean that this crash does not occur on version 0.8.1 but does occur on higher versions? I'd love to finally get to the bottom of this.

@tallytalwar
Copy link
Member

I also did some refactor to android network handling which are not released yet, we should try these changes also, though if there is a race in core cpp side then it will be still present.

@westnordost
Copy link
Contributor Author

Well, it doesn't happen with the latest 0.8 version but does with 0.9.0. I didn't try any later versions because it hasn't been announced that this was fixed.

@tallytalwar
Copy link
Member

Will see if I can get to run streetcomplete with a debug version of tangram to get a proper native stack trace.

@westnordost
Copy link
Contributor Author

westnordost commented Aug 28, 2018

Any luck with this? IIRC I never experienced the crash myself but only saw it on the Google Play Console. If nothing turns up, I might just use the newest tangram version for the next beta release and see if beta-people are still experiencing the crash. (ETA: 2 weeks)

@matteblair
Copy link
Member

The change to move the suspect condition variable did get merged into master: cfb25f0

So it's possible that this is fixed in the latest release. However, we haven't been able to reproduce your crash either :\

@westnordost
Copy link
Contributor Author

I got this crash right now with 0.9.6 while debugging. I did not really do anything (with the map), iirc I did not even move it when the crash happened:

A/libc: Fatal signal 11 (SIGSEGV), code 1, fault addr 0x4 in tid 21718 (map-data.de/...)

@tallytalwar
Copy link
Member

I had another shot to this on 0.9.6, but could not get it to reproduce.

Though I learned a few things here which might help everyone :D.

  • To debug native tangram code with a debuggable streetcomplete apk, I followed the instructions from: https://developer.android.com/studio/debug/apk-debugger

    • To get the debug symbols for libtangram.so, just build tangram (0.9.6, and not master as master has breaking API changes) with ./gradlew tangram:assembleDebug -Ptangram.abis=armeabi-v7a and include the symbols for libtangram.so from platforms/android/tangram/build/intermediates/cmake/debug/obj/armeabi-v7a/libtangram.so.
    • Above will allow you to get native stacktrace or put breakpoints in tangram cpp code, and possibly help figure out this crash.
    • because I was not able to repro the crash, I couldn't figure out the main culprit here.
  • Another tool which I think can help us figure this issue is: https://developer.android.com/ndk/guides/ndk-stack

    • Given native library debug symbols (mentioned above), and logcat output or tombstone detailing the segv, one can figure out the stack strace resulting in the crash.
    • @westnordost If you have the tombstone for the above native crash, we can probably try out ndk-stack tool to get the native stack trace causing the crash.

@westnordost
Copy link
Contributor Author

Will do and report back!

@westnordost
Copy link
Contributor Author

Hmm sorry, I did not work with NDK before, what do you mean with

and include the symbols for libtangram.so from platforms/android/tangram/build/intermediates/cmake/debug/obj/armeabi-v7a/libtangram.so.

? So I built it, and I got the libtangram.so, but what to do with it then?

By the way, I was completely flabbergasted to see that it compiles, out of the box, with no additional software or configuration required on Windows. Never had that before for a C++ application, cool! Also, it is funny to watch, because during build, a cascade of console windows pops up and vanishes right after several times ;-)

@westnordost
Copy link
Contributor Author

Nevermind!

@westnordost
Copy link
Contributor Author

westnordost commented Sep 15, 2018

I have been trying several hours to reproduce this now with a debugable tangram-es version but to no avail. To be honest, also before plugging in the debugable tangram-es, I haven't been experiencing that crash for quite some while. It seems as if the crash only happened the days after migrating from 0.8.1 to 0.9.6.

Perhaps this is not coincidence: Is it possible that this is actually a migration issue? I.e. a caching data format changed so very slightly or something?

@westnordost westnordost changed the title Crash in nativeOnUrlComplete with 0.9.0 Crash in nativeOnUrlComplete with 0.9.6 Sep 17, 2018
@westnordost
Copy link
Contributor Author

westnordost commented Sep 17, 2018

I released now a new StreetComplete beta with the new 0.9.6 built in one day ago. I got this crash report now:

pid: 0, tid: 0 >>> de.westnordost.streetcomplete <<<

backtrace:
 
  #00  pc 0000000000121364  /data/app/de.westnordost.streetcomplete-HnQYZT1aaSJ5iePL1lxDFA==/lib/arm64/libtangram.so (Java_com_mapzen_tangram_MapController_nativeOnUrlComplete+40)
 
  #01  pc 0000000000011d14  /data/app/de.westnordost.streetcomplete-HnQYZT1aaSJ5iePL1lxDFA==/oat/arm64/base.odex

But this is probably not that much of a help, is it? Is there any chance that I can somehow include somethiing in a beta version of StreetComplete to get a proper full stacktrace of native code?

Also, since I seem to be the only user of tangram-es to report this problem, maybe it is only reproducible with the vector tile server I use because of an oddity there(?)

@westnordost westnordost changed the title Crash in nativeOnUrlComplete with 0.9.6 Crash in nativeOnUrlComplete with 0.9.0 - 0.9.6 Sep 17, 2018
@tallytalwar
Copy link
Member

tallytalwar commented Sep 17, 2018

Hi @westnordost

I would say if you can get a tombstone file that will be great. As per this

When a dynamically linked executable starts, several signal handlers are registered that, 
in the event of a crash, cause a basic crash dump to be written to logcat and a more 
detailed "tombstone" file to be written to /data/tombstones/. The tombstone is a file 
with extra data about the crashed process. In particular, it contains stack traces for all 
the threads in the crashing process (not just the thread that caught the signal), a full 
memory map, and a list of all open file descriptors.

@tallytalwar
Copy link
Member

I should have checked this before, but we did have a major URL handler change in 0.9.0:
#1659

@westnordost
Copy link
Contributor Author

@tallytalwar Hmm this is only eligible when you have access to the device, so, the same as with simply running an APK with a debuggable .so in there. The crash reports I get are collected in Google Play and do not include this information(, as you can see).

@westnordost
Copy link
Contributor Author

@tallytalwar I got now this crash report from an Android 9 device, perhaps Google now built-in a better stack-tracing for native code?

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 0, tid: 0 >>> de.westnordost.streetcomplete <<<

backtrace:
  #00  pc 0000000000121364  /data/app/de.westnordost.streetcomplete-u8kj4EzEivEcow_Jb9giiw==/lib/arm64/libtangram.so (Java_com_mapzen_tangram_MapController_nativeOnUrlComplete+40)
  #01  pc 0000000000011d14  /data/app/de.westnordost.streetcomplete-u8kj4EzEivEcow_Jb9giiw==/oat/arm64/base.odex (com.mapzen.tangram.MapController.nativeOnUrlComplete+196)
  #02  pc 000000000055c988  /system/lib64/libart.so (art_quick_invoke_stub+584)
  #03  pc 00000000000cf740  /system/lib64/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+200)
  #04  pc 00000000002823b0  /system/lib64/libart.so (art::interpreter::ArtInterpreterToCompiledCodeBridge(art::Thread*, art::ArtMethod*, art::ShadowFrame*, unsigned short, art::JValue*)+344)
  #05  pc 000000000027d478  /system/lib64/libart.so (bool art::interpreter::DoCall<true, false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, art::JValue*)+752)
  #06  pc 000000000052ef4c  /system/lib64/libart.so (MterpInvokeDirectRange+244)
  #07  pc 000000000054f494  /system/lib64/libart.so (ExecuteMterpImpl+15252)
  #08  pc 0000000000175fac  /data/app/de.westnordost.streetcomplete-u8kj4EzEivEcow_Jb9giiw==/oat/arm64/base.vdex (com.mapzen.tangram.MapController.access$1000)
  #09  pc 0000000000255e68  /system/lib64/libart.so (_ZN3art11interpreterL7ExecuteEPNS_6ThreadERKNS_20CodeItemDataAccessorERNS_11ShadowFrameENS_6JValueEb.llvm.2146680767+496)
  #10  pc 000000000025b9e8  /system/lib64/libart.so (art::interpreter::ArtInterpreterToInterpreterBridge(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame*, art::JValue*)+216)
  #11  pc 000000000027d45c  /system/lib64/libart.so (bool art::interpreter::DoCall<true, false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, art::JValue*)+724)
  #12  pc 000000000052f0ec  /system/lib64/libart.so (MterpInvokeStaticRange+148)
  #13  pc 000000000054f514  /system/lib64/libart.so (ExecuteMterpImpl+15380)
  #14  pc 0000000000174c7e  /data/app/de.westnordost.streetcomplete-u8kj4EzEivEcow_Jb9giiw==/oat/arm64/base.vdex (com.mapzen.tangram.MapController$13.onResponse+242)
  #15  pc 0000000000255e68  /system/lib64/libart.so (_ZN3art11interpreterL7ExecuteEPNS_6ThreadERKNS_20CodeItemDataAccessorERNS_11ShadowFrameENS_6JValueEb.llvm.2146680767+496)
  #16  pc 000000000025b9e8  /system/lib64/libart.so (art::interpreter::ArtInterpreterToInterpreterBridge(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame*, art::JValue*)+216)
  #17  pc 000000000027c350  /system/lib64/libart.so (bool art::interpreter::DoCall<false, false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, art::JValue*)+920)
  #18  pc 000000000052d350  /system/lib64/libart.so (MterpInvokeInterface+1392)
  #19  pc 000000000054f294  /system/lib64/libart.so (ExecuteMterpImpl+14740)
  #20  pc 00000000001c409a  /data/app/de.westnordost.streetcomplete-u8kj4EzEivEcow_Jb9giiw==/oat/arm64/base.vdex (okhttp3.RealCall$AsyncCall.execute+78)
  #21  pc 0000000000255e68  /system/lib64/libart.so (_ZN3art11interpreterL7ExecuteEPNS_6ThreadERKNS_20CodeItemDataAccessorERNS_11ShadowFrameENS_6JValueEb.llvm.2146680767+496)
  #22  pc 000000000051cb18  /system/lib64/libart.so (artQuickToInterpreterBridge+1032)
  #23  pc 0000000000565afc  /system/lib64/libart.so (art_quick_to_interpreter_bridge+92)
  #24  pc 0000000000076f50  /dev/ashmem/dalvik-jit-code-cache (deleted)

@matteblair
Copy link
Member

HA that's cool, looks like a native stack trace of the Java runtime!

@hjanetzek
Copy link
Member

@westnordost could you try if this change already resolves the issue? I'll dig into this further here
#1881

@westnordost
Copy link
Contributor Author

@hjanetzek the problem is that I have never been able to reproduce this crash in a debug environment, it seems to happen very rarely. But if you maybe fixed it, you can maybe tell me under which circumstances it would be reproducible.

@westnordost
Copy link
Contributor Author

By the way, StreetComplete v8 with the new tangram-es 0.9.6 is now public. I figured the crash happens rarely enough that it can be released. Only 12 times during the beta phase last week.

@hjanetzek
Copy link
Member

It was definitely possible that the UrlCallback got called after the call to MapController.dispose(). Do you know at which points during runtime the crash happend?

@westnordost
Copy link
Contributor Author

No

@westnordost
Copy link
Contributor Author

I get no more crash reports when using the new version of tangram-es!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants