Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] ListenableActionFutureTests.testListenersNotifiedOnCorrectThreads failed #68772

Closed
mark-vieira opened this issue Feb 9, 2021 · 8 comments · Fixed by #68805
Closed

[CI] ListenableActionFutureTests.testListenersNotifiedOnCorrectThreads failed #68772

mark-vieira opened this issue Feb 9, 2021 · 8 comments · Fixed by #68805
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI

Comments

@mark-vieira
Copy link
Contributor

This looks like a newly added test.

Build scan:
https://gradle-enterprise.elastic.co/s/benbyofejft46/tests/:server:test/org.elasticsearch.action.support.ListenableActionFutureTests/testListenersNotifiedOnCorrectThreads#1

Repro line:
./gradlew ':server:test' --tests "org.elasticsearch.action.support.ListenableActionFutureTests.testListenersNotifiedOnCorrectThreads" -Dtests.seed=6F4C360FA7AEA154 -Dtests.security.manager=true -Dtests.locale=und -Dtests.timezone=America/Panama -Druntime.java=11

Reproduces locally?:
Nope.

Applicable branches:
Only master so far.

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?search.relativeStartTime=P7D&search.timeZoneId=America/Los_Angeles&tests.container=org.elasticsearch.action.support.ListenableActionFutureTests&tests.disabledDistributions=WyJvdXRjb21lOnNraXBwZWQiXQ&tests.sortField=FAILED&tests.test=testListenersNotifiedOnCorrectThreads&tests.unstableOnly=true

Failure excerpt:


org.elasticsearch.action.support.ListenableActionFutureTests > testListenersNotifiedOnCorrectThreads FAILED
    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=336, name=adder-3, state=RUNNABLE, group=TGRP-ListenableActionFutureTests]

        Caused by:
        java.lang.AssertionError
            at __randomizedtesting.SeedInfo.seed([6F4C360FA7AEA154]:0)
            at org.junit.Assert.fail(Assert.java:86)
            at org.junit.Assert.assertTrue(Assert.java:41)
            at org.junit.Assert.assertTrue(Assert.java:52)
            at org.elasticsearch.action.support.ListenableActionFutureTests.lambda$testListenersNotifiedOnCorrectThreads$0(ListenableActionFutureTests.java:105)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=339, name=completer-6, state=RUNNABLE, group=TGRP-ListenableActionFutureTests]

        Caused by:
        java.lang.AssertionError: 
        Expected: "adder-3"
             but: was "completer-6"
            at __randomizedtesting.SeedInfo.seed([6F4C360FA7AEA154]:0)
            at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
            at org.junit.Assert.assertThat(Assert.java:956)
            at org.junit.Assert.assertThat(Assert.java:923)
            at org.elasticsearch.action.support.ListenableActionFutureTests$3.onResponse(ListenableActionFutureTests.java:97)
            at org.elasticsearch.action.support.ListenableActionFutureTests$3.onResponse(ListenableActionFutureTests.java:93)
            at org.elasticsearch.action.support.ListenableActionFuture.executeListener(ListenableActionFuture.java:89)
            at org.elasticsearch.action.support.ListenableActionFuture.done(ListenableActionFuture.java:72)
            at org.elasticsearch.common.util.concurrent.BaseFuture.set(BaseFuture.java:133)
            at org.elasticsearch.action.support.AdapterActionFuture.onResponse(AdapterActionFuture.java:58)
            at org.elasticsearch.action.support.ListenableActionFutureTests.lambda$testListenersNotifiedOnCorrectThreads$1(ListenableActionFutureTests.java:136)
@mark-vieira mark-vieira added >test-failure Triaged test failures from CI :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. labels Feb 9, 2021
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Feb 9, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@mark-vieira
Copy link
Contributor Author

@DaveCTurner you mind taking a look here since you recently authored this test?

@mark-vieira
Copy link
Contributor Author

Since this test was just added and has already failed I went ahead and muted it in master.

@DaveCTurner
Copy link
Contributor

Ugh. Thanks Mark, I'll mute it on 7.x too and investigate tomorrow.

@mark-vieira
Copy link
Contributor Author

Thanks, David.

@DaveCTurner
Copy link
Contributor

I left this running in a loop overnight and it failed similarly for me after ~100k runs. I think this means that future#onResponse(...) can return before future#done() has finished executing, if there are multiple threads completing the future.

I don't see much that might explain this until I get down to BaseFuture which is too complicated to form an immediate opinion on. For the record, we copied this class from Guava in c1b86f5 and then ported a related-looking bugfix in fbd352b. Guava's version has seen many changes in the meantime. I've scanned its history and don't see any other changes with a related-looking description.

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Feb 10, 2021

This test fails pretty reliably:

/*
 * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
 * or more contributor license agreements. Licensed under the Elastic License
 * 2.0 and the Server Side Public License, v 1; you may not use this file except
 * in compliance with, at your election, the Elastic License 2.0 or the Server
 * Side Public License, v 1.
 */

package org.elasticsearch.common.util.concurrent;

import org.elasticsearch.test.ESTestCase;

import java.util.concurrent.BrokenBarrierException;
import java.util.concurrent.CyclicBarrier;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;
import java.util.concurrent.atomic.AtomicBoolean;
import java.util.concurrent.atomic.AtomicReference;

public class BaseFutureTests extends ESTestCase {

    public void testConcurrentCompletion() throws InterruptedException {
        final AtomicReference<TestFuture> futureRef = new AtomicReference<>();

        final Thread[] threads = new Thread[2];
        final CyclicBarrier cyclicBarrier = new CyclicBarrier(threads.length + 1);
        for (int i = 0; i < threads.length; i++) {
            threads[i] = new Thread(() -> {
                while (true) {
                    awaitSafe(cyclicBarrier);

                    final TestFuture testFuture = futureRef.get();
                    if (testFuture == null) {
                        return;
                    }

                    testFuture.set(null);
                    assertTrue(testFuture.doneExecuted.get()); // fails on this line

                    awaitSafe(cyclicBarrier);
                }

            });
            threads[i].start();
        }

        for (int i = 0; i < 100; i++) {
            futureRef.set(new TestFuture());
            awaitSafe(cyclicBarrier);
            awaitSafe(cyclicBarrier);
        }

        futureRef.set(null);
        awaitSafe(cyclicBarrier);

        for (Thread thread : threads) {
            thread.join();
        }
    }

    private void awaitSafe(CyclicBarrier cyclicBarrier) {
        try {
            cyclicBarrier.await(1, TimeUnit.SECONDS);
        } catch (InterruptedException | BrokenBarrierException | TimeoutException e) {
            throw new AssertionError("unexpected", e);
        }
    }

    private static class TestFuture extends BaseFuture<Void> {

        final AtomicBoolean doneExecuted = new AtomicBoolean();

        @Override
        protected void done(boolean success) {
            assertTrue(doneExecuted.compareAndSet(false, true));
        }
    }

}

I think that's bad, right?

@DaveCTurner
Copy link
Contributor

I think that's bad, right?

No, it's ok, set() might return false indicating that it lost a race, and then we have to take evasive action.

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Feb 10, 2021
This test assumed, incorrectly, that `future#done()` completes before
`future#set()` returns, but this isn't true if there are multiple
threads racing to complete the future. In other words listeners added
before calling `onResponse()` are not necessarily notified by the time
`onResponse()` returns. This commit fixes the test to account for this
subtle point.

Closes elastic#68772
DaveCTurner added a commit that referenced this issue Feb 10, 2021
This test assumed, incorrectly, that `future#done()` completes before
`future#set()` returns, but this isn't true if there are multiple
threads racing to complete the future. In other words listeners added
before calling `onResponse()` are not necessarily notified by the time
`onResponse()` returns. This commit fixes the test to account for this
subtle point.

Closes #68772
DaveCTurner added a commit that referenced this issue Feb 10, 2021
This test assumed, incorrectly, that `future#done()` completes before
`future#set()` returns, but this isn't true if there are multiple
threads racing to complete the future. In other words listeners added
before calling `onResponse()` are not necessarily notified by the time
`onResponse()` returns. This commit fixes the test to account for this
subtle point.

Closes #68772
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants