Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] LazyRolloverDuringDisruptionIT testRolloverIsExecutedOnce failing #112634

Closed
elasticsearchmachine opened this issue Sep 7, 2024 · 3 comments · Fixed by #115075
Closed

[CI] LazyRolloverDuringDisruptionIT testRolloverIsExecutedOnce failing #112634

elasticsearchmachine opened this issue Sep 7, 2024 · 3 comments · Fixed by #115075
Assignees
Labels
:Data Management/Data streams Data streams and their lifecycles low-risk An open issue or test failure that is a low risk to future releases Team:Data Management Meta label for data/management team >test-failure Triaged test failures from CI

Comments

@elasticsearchmachine
Copy link
Collaborator

elasticsearchmachine commented Sep 7, 2024

Build Scans:

Reproduction Line:

./gradlew ':modules:data-streams:internalClusterTest' --tests "org.elasticsearch.datastreams.LazyRolloverDuringDisruptionIT.testRolloverIsExecutedOnce" -Dtests.seed=75EB7EE7A566E476 -Dtests.locale=ff -Dtests.timezone=Asia/Jerusalem -Druntime.java=23

Applicable branches:
8.15

Reproduces locally?:
N/A

Failure History:
See dashboard

Failure Message:

com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=137, name=long_gc_simulation_1, state=RUNNABLE, group=TGRP-LazyRolloverDuringDisruptionIT]

Issue Reasons:

  • [8.15] 9 consecutive failures in step openjdk23_checkpart1_java-matrix
  • [8.15] 9 failures in test testRolloverIsExecutedOnce (2.2% fail rate in 417 executions)
  • [8.15] 9 failures in step openjdk23_checkpart1_java-matrix (100.0% fail rate in 9 executions)
  • [8.15] 9 failures in pipeline elasticsearch-periodic (64.3% fail rate in 14 executions)

Note:
This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

@elasticsearchmachine elasticsearchmachine added :Data Management/Data streams Data streams and their lifecycles >test-failure Triaged test failures from CI labels Sep 7, 2024
@elasticsearchmachine
Copy link
Collaborator Author

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine elasticsearchmachine added Team:Data Management Meta label for data/management team needs:risk Requires assignment of a risk label (low, medium, blocker) labels Sep 7, 2024
@masseyke
Copy link
Member

This looks like a java 23 problem unrelated to this test: #112618 (comment).

@elasticsearchmachine
Copy link
Collaborator Author

This has been muted on branch 8.x

Mute Reasons:

  • [8.x] 8 consecutive failures in step openjdk23_checkpart1_java-matrix
  • [8.x] 9 failures in test testRolloverIsExecutedOnce (1.4% fail rate in 627 executions)
  • [8.x] 8 failures in step openjdk23_checkpart1_java-matrix (100.0% fail rate in 8 executions)
  • [8.x] 8 failures in pipeline elasticsearch-periodic (57.1% fail rate in 14 executions)

Build Scans:

@gmarouli gmarouli self-assigned this Oct 18, 2024
@gmarouli gmarouli added low-risk An open issue or test failure that is a low risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels Oct 18, 2024
elasticsearchmachine pushed a commit that referenced this issue Oct 18, 2024
…tes (#115075)

In JDK 23 `Thread.resume` has been removed this means that we cannot use
`IntermittentLongGCDisruption` that depends on it. 

We simulate the master node disruption with a `CyclicBarrier` that
blocks cluster state updates.

Closes: #115045

The backport will close:
#112634
gmarouli added a commit to gmarouli/elasticsearch that referenced this issue Oct 18, 2024
…tes (elastic#115075)

In JDK 23 `Thread.resume` has been removed this means that we cannot use
`IntermittentLongGCDisruption` that depends on it. 

We simulate the master node disruption with a `CyclicBarrier` that
blocks cluster state updates.

Closes: elastic#115045

The backport will close:
elastic#112634
gmarouli added a commit to gmarouli/elasticsearch that referenced this issue Oct 18, 2024
…tes (elastic#115075)

In JDK 23 `Thread.resume` has been removed this means that we cannot use
`IntermittentLongGCDisruption` that depends on it. 

We simulate the master node disruption with a `CyclicBarrier` that
blocks cluster state updates.

Closes: elastic#115045

The backport will close:
elastic#112634
elasticsearchmachine pushed a commit that referenced this issue Oct 18, 2024
…te updates (#115075) (#115085)

* Replace IntermittentLongGCDisruption with blocking cluster state updates (#115075)

In JDK 23 `Thread.resume` has been removed this means that we cannot use
`IntermittentLongGCDisruption` that depends on it. 

We simulate the master node disruption with a `CyclicBarrier` that
blocks cluster state updates.

Closes: #115045

The backport will close:
#112634

* Unmute LazyRolloverDuringDisruptionIT
elasticsearchmachine pushed a commit that referenced this issue Oct 21, 2024
…tes (#115075) (#115086)

In JDK 23 `Thread.resume` has been removed this means that we cannot use
`IntermittentLongGCDisruption` that depends on it. 

We simulate the master node disruption with a `CyclicBarrier` that
blocks cluster state updates.

Closes: #115045

The backport will close:
#112634

Co-authored-by: Elastic Machine <[email protected]>
gmarouli added a commit to gmarouli/elasticsearch that referenced this issue Oct 24, 2024
…tes (elastic#115075)

In JDK 23 `Thread.resume` has been removed this means that we cannot use
`IntermittentLongGCDisruption` that depends on it.

We simulate the master node disruption with a `CyclicBarrier` that
blocks cluster state updates.

Closes: elastic#115045

The backport will close:
elastic#112634

(cherry picked from commit 5dec36e)

# Conflicts:
#	modules/data-streams/src/internalClusterTest/java/org/elasticsearch/datastreams/LazyRolloverDuringDisruptionIT.java
elasticsearchmachine pushed a commit that referenced this issue Oct 24, 2024
…tes (#115075) (#115580)

In JDK 23 `Thread.resume` has been removed this means that we cannot use
`IntermittentLongGCDisruption` that depends on it.

We simulate the master node disruption with a `CyclicBarrier` that
blocks cluster state updates.

Closes: #115045

The backport will close:
#112634

(cherry picked from commit 5dec36e)

# Conflicts:
#	modules/data-streams/src/internalClusterTest/java/org/elasticsearch/datastreams/LazyRolloverDuringDisruptionIT.java
georgewallace pushed a commit to georgewallace/elasticsearch that referenced this issue Oct 25, 2024
…tes (elastic#115075)

In JDK 23 `Thread.resume` has been removed this means that we cannot use
`IntermittentLongGCDisruption` that depends on it. 

We simulate the master node disruption with a `CyclicBarrier` that
blocks cluster state updates.

Closes: elastic#115045

The backport will close:
elastic#112634
jfreden pushed a commit to jfreden/elasticsearch that referenced this issue Nov 4, 2024
…tes (elastic#115075)

In JDK 23 `Thread.resume` has been removed this means that we cannot use
`IntermittentLongGCDisruption` that depends on it. 

We simulate the master node disruption with a `CyclicBarrier` that
blocks cluster state updates.

Closes: elastic#115045

The backport will close:
elastic#112634
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Data streams Data streams and their lifecycles low-risk An open issue or test failure that is a low risk to future releases Team:Data Management Meta label for data/management team >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants