Adding `RemainingActivityListener` for diagnosis #643

jglick · 2023-08-23T19:47:01Z

Helpful to see cases where #198 revealed something nondeterministic about test execution.

MarkEWaite

This looks very good to me.

As far as I can tell, this is an "always on" extension that will be run on every test. Let me know if I've misunderstood.

jglick · 2023-08-23T21:21:31Z

this is an "always on" extension that will be run on every test

Correct; the default behavior is to print a warning about a potential detected problem, but otherwise take no action.

jglick · 2023-08-23T21:22:47Z

ICYMI jenkinsci/copyartifact-plugin#206 (comment) (in that case .wait=true)

basil · 2023-08-24T19:46:31Z

This flags builds that are executed with FreeStyleBuild b = j.buildAndAssertSuccess(f);. This does not seem right to me.

jglick · 2023-08-24T19:49:46Z

Not that I am aware of; do you have an example?

basil · 2023-08-24T19:54:25Z

Try running the core test suite with org.jvnet.hudson.test.RemainingActivityListener.fatal=true; tons of examples. We might need to add waitForCompletion(r) to JenkinsRule#assertBuildStatus.

jglick · 2023-08-24T20:04:27Z

Possibly. Is this output visible somewhere? Some PR? Any specific test cases for example?

basil · 2023-08-24T20:05:26Z

We might need to add waitForCompletion(r) to JenkinsRule#assertBuildStatus.

That doesn't seem to help. Even with

diff --git a/src/main/java/org/jvnet/hudson/test/JenkinsRule.java b/src/main/java/org/jvnet/hudson/test/JenkinsRule.java
index 424bad72..9c460f96 100644
--- a/src/main/java/org/jvnet/hudson/test/JenkinsRule.java
+++ b/src/main/java/org/jvnet/hudson/test/JenkinsRule.java
@@ -1442,6 +1442,7 @@ public class JenkinsRule implements TestRule, MethodRule, RootAction {
      * Asserts that the outcome of the build is a specific outcome.
      */
     public <R extends Run> R assertBuildStatus(Result status, R r) throws Exception {
+        waitForCompletion(r);
         if(status==r.getResult())
             return r;

I still get e.g.

[ERROR] Tests run: 5, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 33.21 s <<< FAILURE! -- in hudson.LauncherTest
[ERROR] hudson.LauncherTest.correctlyExpandEnvVars -- Time elapsed: 16.01 s <<< FAILURE!
java.lang.AssertionError: test0 #1 still seems to be running, which could break deletion of log files or metadata
        at org.jvnet.hudson.test.RemainingActivityListener.onTearDown(RemainingActivityListener.java:62)
        at org.jvnet.hudson.test.JenkinsRule.after(JenkinsRule.java:496)
        at org.jvnet.hudson.test.JenkinsRule$1.evaluate(JenkinsRule.java:625)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.lang.Thread.run(Thread.java:833)

where that test is only running

FreeStyleBuild build = rule.buildAndAssertSuccess(project);

I get dozens of similar errors. I am afraid this utility is not very useful with this high rate of false positives.

Note that I am running with -DforkCount=8 on an Intel Core i7-9700K with only 8 physical cores, so I am intentionally overtaxing the CPU here to create scheduling delays.

jglick · 2023-08-24T20:29:14Z

I will check if I can reproduce, and adjust accordingly.

jglick · 2023-08-24T21:27:20Z

I see some failures using -DforkCount=1C (which nearly melts my laptop) after

diff --git test/pom.xml test/pom.xml
index 75035b367d..19e08b39d0 100644
--- test/pom.xml
+++ test/pom.xml
@@ -116,7 +116,7 @@ THE SOFTWARE.
     <dependency>
       <groupId>${project.groupId}</groupId>
       <artifactId>jenkins-test-harness</artifactId>
-      <version>2058.va_7b_41a_286207</version>
+      <version>2062.v3efc79721e45</version>
       <scope>test</scope>
       <exclusions>
         <exclusion>
@@ -322,6 +322,7 @@ THE SOFTWARE.
           <systemPropertyVariables>
             <hudson.maven.debug>${mavenDebug}</hudson.maven.debug>
             <buildDirectory>${project.build.directory}</buildDirectory>
+            <org.jvnet.hudson.test.RemainingActivityListener.fatal>true</org.jvnet.hudson.test.RemainingActivityListener.fatal>
           </systemPropertyVariables>
           <reuseForks>false</reuseForks>
         </configuration>

mostly of queue items, which may or may not matter depending on the circumstances, and some executors. For now, fatal is intended to be used for diagnostic purposes rather than being configured permanently. Will dig into the cause more deeply in a couple weeks. Seems harmless in the meantime since by default it merely logs a warning.

basil · 2023-08-24T21:34:24Z

I'm not sure I would really consider this change complete until it is proven that the fatal mode can be configured permanently in some significant test suite. I would personally not have approved this PR in its current form.

Another problem is these false positives:

[ERROR] hudson.model.QueueCrashTest.doNotPersistQueueOnCrashBeforeSave -- Time elapsed: 14.70 s <<< FAILURE!
java.lang.AssertionError: hudson.model.Queue$BuildableItem:hudson.model.FreeStyleProject@5ba1b54f[test0]:1 is still scheduled, which if it ever runs, could break deletion of log files or metadata
        at org.jvnet.hudson.test.RemainingActivityListener.onTearDown(RemainingActivityListener.java:62)
        at org.jvnet.hudson.test.JenkinsRule.after(JenkinsRule.java:496)
        at org.jvnet.hudson.test.JenkinsRule$1.evaluate(JenkinsRule.java:625)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.lang.Thread.run(Thread.java:833)

[ERROR] hudson.model.QueueCrashTest.persistQueueOnCrash -- Time elapsed: 6.571 s <<< FAILURE!
java.lang.AssertionError: hudson.model.Queue$BuildableItem:hudson.model.FreeStyleProject@6ddc90f[test0]:2 is still scheduled, which if it ever runs, could break deletion of log files or metadata
        at org.jvnet.hudson.test.RemainingActivityListener.onTearDown(RemainingActivityListener.java:62)
        at org.jvnet.hudson.test.JenkinsRule.after(JenkinsRule.java:496)
        at org.jvnet.hudson.test.JenkinsRule$1.evaluate(JenkinsRule.java:625)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.lang.Thread.run(Thread.java:833)

basil · 2023-08-24T22:50:11Z

src/main/java/org/jvnet/hudson/test/RemainingActivityListener.java

+
+    private static String problem() {
+        for (Computer c : Jenkins.get().getComputers()) {
+            for (Executor x : c.getAllExecutors()) {


This code is incorrect. QueueTaskFuture#get (which is what JenkinsRule uses to determine build completion) waits until AsyncFutureImpl#set is called to complete the future before notifying consumers. That call is done by WorkUnitContext#synchronizeEnd via Executor#finish1. At that point, Executor#isIdle is still false and the executor is still in Computer#getAllExecutors (which is what the code in this PR is checking). Executor#finish2 is called after Executor#finish1 and calls Computer#removeExecutor which (possibly asynchronously!) calls executors.remove(e) to remove the executor, at which point Computer#getAllExecutors will finally stop returning the non-idle executor. If Executor#finish1 completes before the executor is removed by Executor#finish2 (possible in certain scheduling scenarios) then the code in this PR will result in a false positive. Similarly, if Executor#finish2 decides to run the removal asynchronously (possible in certain scheduling scenarios) then the code in this PR will also result in a false positive.

There are really two things that need to be solved to make this code correct:

The code in this PR needs to wait until both Executor#finish1 and Executor#finish2 have been called before calling Computer#getAllExecutors. Currently this code runs after QueueTaskFuture#get which means that Executor#finish1 has been called but Executor#finish2 has not necessarily been called.

If Executor#finish2 has been called but has placed its task onto Computer#threadPoolForRemoting (which happens only sometimes) then the code in this PR needs to wait until that task has completed, for example by having Computer#removeExecutor keep track of how many pending removals are in progress.

Adding RemainingActivityListener for diagnosis

3d1fe7e

jglick added the enhancement label Aug 23, 2023

Added wait option

04bf078

jglick mentioned this pull request Aug 23, 2023

Update dependencies and stabilize tests jenkinsci/copyartifact-plugin#206

Merged

jglick marked this pull request as ready for review August 23, 2023 20:59

jglick requested a review from MarkEWaite August 23, 2023 20:59

MarkEWaite approved these changes Aug 23, 2023

View reviewed changes

jglick merged commit 3efc797 into jenkinsci:master Aug 24, 2023
13 checks passed

jglick deleted the RemainingActivityListener branch August 24, 2023 19:11

jglick added a commit to jglick/copyartifact-plugin that referenced this pull request Aug 24, 2023

jenkinsci/jenkins-test-harness#643 released

1d2e081

MarkEWaite pushed a commit to jenkinsci/copyartifact-plugin that referenced this pull request Aug 24, 2023

jenkinsci/jenkins-test-harness#643 released

0662a9b

basil reviewed Aug 24, 2023

View reviewed changes

basil mentioned this pull request Aug 24, 2023

Clear the queue before tearing down tests jenkinsci/jenkins#8428

Merged

This was referenced Oct 27, 2023

Temporary memory leak in FutureImpl.executors jenkinsci/jenkins#8640

Merged

False positives from RemainingActivityListener #672

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding `RemainingActivityListener` for diagnosis #643

Adding `RemainingActivityListener` for diagnosis #643

jglick commented Aug 23, 2023

MarkEWaite left a comment

jglick commented Aug 23, 2023

jglick commented Aug 23, 2023 •

edited

Loading

basil commented Aug 24, 2023

jglick commented Aug 24, 2023

basil commented Aug 24, 2023

jglick commented Aug 24, 2023

basil commented Aug 24, 2023

jglick commented Aug 24, 2023

jglick commented Aug 24, 2023

basil commented Aug 24, 2023

basil Aug 24, 2023

Adding RemainingActivityListener for diagnosis #643

Adding RemainingActivityListener for diagnosis #643

Conversation

jglick commented Aug 23, 2023

MarkEWaite left a comment

Choose a reason for hiding this comment

jglick commented Aug 23, 2023

jglick commented Aug 23, 2023 • edited Loading

basil commented Aug 24, 2023

jglick commented Aug 24, 2023

basil commented Aug 24, 2023

jglick commented Aug 24, 2023

basil commented Aug 24, 2023

jglick commented Aug 24, 2023

jglick commented Aug 24, 2023

basil commented Aug 24, 2023

basil Aug 24, 2023

Choose a reason for hiding this comment

Adding `RemainingActivityListener` for diagnosis #643

Adding `RemainingActivityListener` for diagnosis #643

jglick commented Aug 23, 2023 •

edited

Loading