Skip to content
This repository has been archived by the owner on Aug 8, 2023. It is now read-only.

Firebase Android tests flapping #6546

Closed
friedbunny opened this issue Oct 1, 2016 · 20 comments
Closed

Firebase Android tests flapping #6546

friedbunny opened this issue Oct 1, 2016 · 20 comments
Assignees
Labels
Android Mapbox Maps SDK for Android tests

Comments

@friedbunny
Copy link
Contributor

In the last week or so, Firebase tests have been flapping:

https://www.bitrise.io/build/74f7be21c7dedd19 — two firebase test failures
https://www.bitrise.io/build/a1ae533de44ce8bd — one firebase test failure
https://www.bitrise.io/build/fa9e4efdc1c76881 — took 45 minutes to timeout when downloading Java
https://www.bitrise.io/build/e3f62dfb41987c22 — firebase networking timeout
https://www.bitrise.io/build/73e363e750edb3d1 — five firebase test failures
https://www.bitrise.io/build/91f588750e7ed5af — device farm timed out

Even after digging through the (very long) logcat, it’s not obvious to me why these are failing. These builds all passed after restarting them. Some are likely because of Bitrise networking instability, but others could represent bugs in our code or tests — it’s hard to say.

Let’s investigate these failures and see if there’s anything we can do to improve reliability.

/cc @tobrun @zugaldia

@friedbunny friedbunny added Android Mapbox Maps SDK for Android tests labels Oct 1, 2016
@tobrun
Copy link
Member

tobrun commented Oct 3, 2016

Thank you for listing these up @friedbunny! I will keep 👀 on it.
From my perspective, the test were running pretty reliable for the last 2-3 weeks (after we fixed some native crashes), I was actually surprised by the quality of service that Firebase offers. Now after last week I'm a bit in doubt. Let's see what this week brings!

@tobrun
Copy link
Member

tobrun commented Oct 6, 2016

Looked at all the builds from yesterday and was able to locate one flaky build vs ~ 40 builds:

I will continue to monitor the job queu in the following days

@jfirebaugh
Copy link
Contributor

@jfirebaugh
Copy link
Contributor

@tobrun
Copy link
Member

tobrun commented Oct 10, 2016

We need to determine if the source of the behaviour is:

  • a bug in the testing framework (espresso)
  • instability in the firebase infrastructure

Short term actions:

  • @tobrun tries reproducing flaky tests locally, if reproducible, try optimising/improving stability
  • @zugaldia contacts Firebase support

What can we do if we can't resolve this issue? If the failing tests are perceived as more hinderance as actually beneficial we should look into downscaling the amount of tests we run. IMO the most important thing these tests need to flag is the regression that core crashes on startup. Having one test that validates this instead of having 150+ tests currently could make the test more reliable

eg. If you would take 20 PR's. you would find one with a flaky test. This roughly means that we have one flaky test for 3000 different test cases. If we would execute 1 test case for each PR we would be able to test 3000 PR's successfully without the tests failing because of flakyness.

@zugaldia
Copy link
Member

@zugaldia contacts Firebase support

Just did.

@tobrun
Copy link
Member

tobrun commented Oct 10, 2016

Been running some test runs locally and was able to occasionally reproduce it. Going to see if I can make the test harness run more robust. I'm also thinking that changes to the OnMapReady callback could have resulted in the test being less reliable.

This was referenced Oct 11, 2016
@tobrun
Copy link
Member

tobrun commented Oct 12, 2016

Found another failing build, this time it couldn't download a dependency.

@friedbunny
Copy link
Contributor Author

A whopping 30 failed tests: https://www.bitrise.io/build/006abf10f975dbbf

@tobrun
Copy link
Member

tobrun commented Oct 13, 2016

I light of the above reports and to unblock other contributors: I'm going to downscale our +150 tests to just one. This should still catch the most important regressions that show when a MapView is rendered on screen. I'm going to look into #6366 to run scheduled builds with the full set of tests on a daily basis and run manual tests on the firebase gui to resolve our issues.

@tobrun
Copy link
Member

tobrun commented Oct 19, 2016

I was able to identify #6667 as the source of unreliable executing tests and resolve it in #6747

@tobrun
Copy link
Member

tobrun commented Nov 9, 2016

#6667 got resolved, I'm reenabling our instrumentation tests in #6980. Going to close this for now since I haven't heard of any other flakyness in the lasts weeks. Feel free to reopen if there are any unreliable builds.

@tobrun tobrun closed this as completed Nov 9, 2016
@tobrun
Copy link
Member

tobrun commented Dec 9, 2016

Reopening as I have seeing some random runtime style test irregularly fail with:

android.support.test.espresso.AppNotIdleException: Looped for 6 iterations over 60 SECONDS. The following Idle Conditions failed ASYNC_TASKS_HAVE_IDLED.
	at dalvik.system.VMStack.getThreadStackTrace(Native Method)
	at java.lang.Thread.getStackTrace(Thread.java:580)
	at android.support.test.espresso.base.DefaultFailureHandler.getUserFriendlyError(DefaultFailureHandler.java:92)
	at android.support.test.espresso.base.DefaultFailureHandler.handle(DefaultFailureHandler.java:56)
	at android.support.test.espresso.ViewInteraction.runSynchronouslyOnUiThread(ViewInteraction.java:184)
	at android.support.test.espresso.ViewInteraction.check(ViewInteraction.java:158)
	at com.mapbox.mapboxsdk.testapp.style.BaseStyleTest.checkViewIsDisplayed(BaseStyleTest.java:25)
	at com.mapbox.mapboxsdk.testapp.style.BackgroundLayerTest.testBackgroundColorAsInt(BackgroundLayerTest.java:84)
	at java.lang.reflect.Method.invoke(Native Method)
	at java.lang.reflect.Method.invoke(Method.java:372)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at android.support.test.internal.statement.UiThreadStatement.evaluate(UiThreadStatement.java:55)
	at android.support.test.rule.ActivityTestRule$ActivityStatement.evaluate(ActivityTestRule.java:270)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.junit.runners.Suite.runChild(Suite.java:128)
	at org.junit.runners.Suite.runChild(Suite.java:27)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
	at android.support.test.internal.runner.TestExecutor.execute(TestExecutor.java:59)
	at android.support.test.runner.AndroidJUnitRunner.onStart(AndroidJUnitRunner.java:262)
	at android.app.Instrumentation$InstrumentationThread.run(Instrumentation.java:1853)

@tobrun tobrun reopened this Dec 9, 2016
@tobrun tobrun self-assigned this Dec 9, 2016
@tobrun
Copy link
Member

tobrun commented Dec 9, 2016

Also seeing occurrences of native crashes:

12-08 09:33:59.196: A/libc(12537): Fatal signal 6 (SIGABRT), code -6 in tid 12537 (pboxsdk.testapp)
12-08 09:33:59.301: I/DEBUG(357): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
12-08 09:33:59.301: I/DEBUG(357): Build fingerprint: 'google/shamu/shamu:5.1.1/LMY48Y/2364368:user/release-keys'
12-08 09:33:59.301: I/DEBUG(357): Revision: '33696'
12-08 09:33:59.301: I/DEBUG(357): ABI: 'arm'
12-08 09:33:59.302: I/DEBUG(357): pid: 12537, tid: 12537, name: pboxsdk.testapp  >>> com.mapbox.mapboxsdk.testapp <<<
12-08 09:33:59.302: I/DEBUG(357): signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
12-08 09:33:59.395: I/DEBUG(357): Abort message: '/usr/local/google/buildbot/src/android/ndk-r13-release/external/libcxx/../../external/libcxxabi/src/abort_message.cpp:74: void abort_message(const char *, ...): assertion "terminating with uncaught exception of type std::out_of_range: vector" failed'
12-08 09:33:59.395: I/DEBUG(357):     r0 00000000  r1 000030f9  r2 00000006  r3 00000000
12-08 09:33:59.395: I/DEBUG(357):     r4 b6febe38  r5 00000006  r6 0000000b  r7 0000010c
12-08 09:33:59.395: I/DEBUG(357):     r8 bed39dd8  r9 bed39fb0  sl bed39ea8  fp bed39e98
12-08 09:33:59.395: I/DEBUG(357):     ip 000030f9  sp bed397c8  lr b6e683c5  pc b6e8c0ec  cpsr 600f0010
12-08 09:33:59.395: I/DEBUG(357): backtrace:
12-08 09:33:59.395: I/DEBUG(357):     #00 pc 0003b0ec  /system/lib/libc.so (tgkill+12)
12-08 09:33:59.395: I/DEBUG(357):     #01 pc 000173c1  /system/lib/libc.so (pthread_kill+52)
12-08 09:33:59.395: I/DEBUG(357):     #02 pc 00017fd3  /system/lib/libc.so (raise+10)
12-08 09:33:59.395: I/DEBUG(357):     #03 pc 00014795  /system/lib/libc.so (__libc_android_abort+36)
12-08 09:33:59.395: I/DEBUG(357):     #04 pc 00012f44  /system/lib/libc.so (abort+4)
12-08 09:33:59.395: I/DEBUG(357):     #05 pc 00015ab1  /system/lib/libc.so (__libc_fatal+16)
12-08 09:33:59.395: I/DEBUG(357):     #06 pc 00014819  /system/lib/libc.so (__assert2+20)
12-08 09:33:59.395: I/DEBUG(357):     #07 pc 0035c78b  /data/app/com.mapbox.mapboxsdk.testapp-1/lib/arm/libmapbox-gl.so
12-08 09:33:59.395: I/DEBUG(357):     #08 pc 0035c853  /data/app/com.mapbox.mapboxsdk.testapp-1/lib/arm/libmapbox-gl.so
12-08 09:33:59.395: I/DEBUG(357):     #09 pc 0035abc9  /data/app/com.mapbox.mapboxsdk.testapp-1/lib/arm/libmapbox-gl.so
12-08 09:33:59.396: I/DEBUG(357):     #10 pc 0035a4d3  /data/app/com.mapbox.mapboxsdk.testapp-1/lib/arm/libmapbox-gl.so (__cxa_throw+122)
12-08 09:33:59.396: I/DEBUG(357):     #11 pc 00057939  /data/app/com.mapbox.mapboxsdk.testapp-1/lib/arm/libmapbox-gl.so
12-08 09:33:59.396: I/DEBUG(357):     #12 pc 0012923b  /data/app/com.mapbox.mapboxsdk.testapp-1/lib/arm/libmapbox-gl.so
12-08 09:33:59.396: I/DEBUG(357):     #13 pc 001290e7  /data/app/com.mapbox.mapboxsdk.testapp-1/lib/arm/libmapbox-gl.so
12-08 09:33:59.396: I/DEBUG(357):     #14 pc 00086843  /data/app/com.mapbox.mapboxsdk.testapp-1/lib/arm/libmapbox-gl.so
12-08 09:33:59.396: I/DEBUG(357):     #15 pc 00086827  /data/app/com.mapbox.mapboxsdk.testapp-1/lib/arm/libmapbox-gl.so
12-08 09:33:59.396: I/DEBUG(357):     #16 pc 000867ef  /data/app/com.mapbox.mapboxsdk.testapp-1/lib/arm/libmapbox-gl.so
12-08 09:33:59.396: I/DEBUG(357):     #17 pc 000867b9  /data/app/com.mapbox.mapboxsdk.testapp-1/lib/arm/libmapbox-gl.so
12-08 09:33:59.396: I/DEBUG(357):     #18 pc 00086583  /data/app/com.mapbox.mapboxsdk.testapp-1/lib/arm/libmapbox-gl.so
12-08 09:33:59.396: I/DEBUG(357):     #19 pc 00089e4f  /data/app/com.mapbox.mapboxsdk.testapp-1/lib/arm/libmapbox-gl.so
12-08 09:33:59.396: I/DEBUG(357):     #20 pc 00089cd9  /data/app/com.mapbox.mapboxsdk.testapp-1/lib/arm/libmapbox-gl.so
12-08 09:33:59.396: I/DEBUG(357):     #21 pc 0007686d  /data/app/com.mapbox.mapboxsdk.testapp-1/lib/arm/libmapbox-gl.so
12-08 09:33:59.396: I/DEBUG(357):     #22 pc 0005cc95  /data/app/com.mapbox.mapboxsdk.testapp-1/lib/arm/libmapbox-gl.so
12-08 09:33:59.396: I/DEBUG(357):     #23 pc 0009f3fd  /data/app/com.mapbox.mapboxsdk.testapp-1/lib/arm/libmapbox-gl.so
12-08 09:33:59.396: I/DEBUG(357):     #24 pc 0009f92f  /data/app/com.mapbox.mapboxsdk.testapp-1/lib/arm/libmapbox-gl.so
12-08 09:33:59.396: I/DEBUG(357):     #25 pc 0009f107  /data/app/com.mapbox.mapboxsdk.testapp-1/lib/arm/libmapbox-gl.so
12-08 09:33:59.396: I/DEBUG(357):     #26 pc 000119ff  /system/lib/libutils.so (android::SimpleLooperCallback::handleEvent(int, int, void*)+10)
12-08 09:33:59.396: I/DEBUG(357):     #27 pc 00012661  /system/lib/libutils.so (android::Looper::pollInner(int)+484)
12-08 09:33:59.396: I/DEBUG(357):     #28 pc 00012709  /system/lib/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+92)
12-08 09:33:59.396: I/DEBUG(357):     #29 pc 00081711  /system/lib/libandroid_runtime.so (android::NativeMessageQueue::pollOnce(_JNIEnv*, int)+22)
12-08 09:33:59.396: I/DEBUG(357):     #30 pc 000b3863  /data/dalvik-cache/arm/system@[email protected]

Don't have the symbolicated stacktrace of this due #7358.

@jfirebaugh
Copy link
Contributor

Bump. I'm having to restart most Android builds at least once.

@tobrun
Copy link
Member

tobrun commented Dec 15, 2016

@jfirebaugh thank you for the bump to prioritize this:

Been going through the latests 80 bitrise builds an noticed that 21 builds failed. A couple of those failures were related to the PR but most of them are not. Aside 2 infrastructure related failures (internet/timeout), the remaining failures are related to the following 2 issues:

cc @mapbox/android

@jfirebaugh
Copy link
Contributor

I can't get #7513 to pass at all (3 retries), although the same commit was passing yesterday.

12-22 11:32:03.905: E/Surface(2112): dequeueBuffer: IGraphicBufferProducer::requestBuffer failed: -2147483646
12-22 11:32:03.906: I/Adreno(2112): Native window GetBuffer failed
12-22 11:32:03.906: E/libEGL(2112): eglMakeCurrent:777 error 300d (EGL_BAD_SURFACE)
12-22 11:32:03.906: A/OpenGLRenderer(2112): Failed to make current on surface 0x8f751fc0, error=EGL_BAD_SURFACE
12-22 11:32:04.038: I/WindowState(841): WIN DEATH: Window{31ea2e01 u0 com.mapbox.mapboxsdk.testapp/com.mapbox.mapboxsdk.testapp.activity.style.RuntimeStyleTestActivity}

@tobrun
Copy link
Member

tobrun commented Dec 22, 2016

I haven't seen that crash before. This has really spun out of control since we are seeing different crashes. For now with the upcoming holidays and limited bandwidth. I'm going to scale down the amount of instrumentation tests run on CI to one. I will not scale them back until we are sure the issues in this ticket are addressed.

@friedbunny
Copy link
Contributor Author

It may be unrelated, but we saw this failure via #7725, a simple macOS/iOS PR on the current release branch:

AWS Device Farm Plugin version 1.2
Could not locate AWS_ACCESS_KEY_ID_DEVICE_FARM in gradle.properties
Could not locate AWS_SECRET_ACCESS_KEY_DEVICE_FARM in gradle.properties
configuring spoon
Download https://jcenter.bintray.com/junit/junit/4.12/junit-4.12.pom
...
Download https://jcenter.bintray.com/org/mockito/mockito-core/1.10.19/mockito-core-1.10.19.jar
AWS Device Farm configuration is NOT VALID

FAILURE: Build failed with an exception.

* What went wrong:
A problem occurred configuring project ':MapboxGLAndroidSDKWearTestApp'.
> failed to find Build Tools revision 24.0.2

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.

BUILD FAILED

Total time: 1 mins 50.192 secs
Makefile:508: recipe for target 'android-test' failed
make: *** [android-test] Error 1

@tobrun
Copy link
Member

tobrun commented Jul 12, 2017

Root cause has been fixed with #9198. Since #9353 we are running our generated runtime style tests on Firebase.

@tobrun tobrun closed this as completed Jul 12, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Android Mapbox Maps SDK for Android tests
Projects
None yet
Development

No branches or pull requests

4 participants