Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky-test: PerformanceTransactionTest.testConsumeTxnMessage #14109

Closed
lhotari opened this issue Feb 3, 2022 · 8 comments · Fixed by #16981 or #17837
Closed

Flaky-test: PerformanceTransactionTest.testConsumeTxnMessage #14109

lhotari opened this issue Feb 3, 2022 · 8 comments · Fixed by #16981 or #17837

Comments

@lhotari
Copy link
Member

lhotari commented Feb 3, 2022

Flaky-test: org.apache.pulsar.testclient.PerformanceTransactionTest.testConsumeTxnMessage
Number of failures: 5

org.apache.pulsar.testclient.PerformanceTransactionTest is flaky. The testConsumeTxnMessage test method fails sporadically.

java.lang.AssertionError: expected [null] but found [org.apache.pulsar.client.impl.MessageImpl@9317087]
	at org.testng.Assert.fail(Assert.java:99)
	at org.testng.Assert.failNotSame(Assert.java:1033)
	at org.testng.Assert.assertNull(Assert.java:965)
	at org.testng.Assert.assertNull(Assert.java:953)
	at org.apache.pulsar.testclient.PerformanceTransactionTest.testConsumeTxnMessage(PerformanceTransactionTest.java:235)

Usage tip: To enable automatic navigation to failure message, open the following links with CTRL/CMD-click.
example failure 2022-02-02T16:48:35.3692126Z

Full exception stacktrace
java.lang.AssertionError: expected [null] but found [org.apache.pulsar.client.impl.MessageImpl@9317087]
	at org.testng.Assert.fail(Assert.java:99)
	at org.testng.Assert.failNotSame(Assert.java:1033)
	at org.testng.Assert.assertNull(Assert.java:965)
	at org.testng.Assert.assertNull(Assert.java:953)
	at org.apache.pulsar.testclient.PerformanceTransactionTest.testConsumeTxnMessage(PerformanceTransactionTest.java:235)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
	at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:45)
	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:73)
	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

java.lang.AssertionError: expected [null] but found [org.apache.pulsar.client.impl.MessageImpl@7858aa00]
	at org.testng.Assert.fail(Assert.java:99)
	at org.testng.Assert.failNotSame(Assert.java:1033)
	at org.testng.Assert.assertNull(Assert.java:965)
	at org.testng.Assert.assertNull(Assert.java:953)
	at org.apache.pulsar.testclient.PerformanceTransactionTest.testConsumeTxnMessage(PerformanceTransactionTest.java:235)

Usage tip: To enable automatic navigation to failure message, open the following links with CTRL/CMD-click.
example failure 2022-01-29T05:03:33.6909916Z

Full exception stacktrace
java.lang.AssertionError: expected [null] but found [org.apache.pulsar.client.impl.MessageImpl@7858aa00]
	at org.testng.Assert.fail(Assert.java:99)
	at org.testng.Assert.failNotSame(Assert.java:1033)
	at org.testng.Assert.assertNull(Assert.java:965)
	at org.testng.Assert.assertNull(Assert.java:953)
	at org.apache.pulsar.testclient.PerformanceTransactionTest.testConsumeTxnMessage(PerformanceTransactionTest.java:235)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
	at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:45)
	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:73)
	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

java.lang.AssertionError: expected [null] but found [org.apache.pulsar.client.impl.MessageImpl@360348b7]
	at org.testng.Assert.fail(Assert.java:99)
	at org.testng.Assert.failNotSame(Assert.java:1033)
	at org.testng.Assert.assertNull(Assert.java:965)
	at org.testng.Assert.assertNull(Assert.java:953)
	at org.apache.pulsar.testclient.PerformanceTransactionTest.testConsumeTxnMessage(PerformanceTransactionTest.java:235)

Usage tip: To enable automatic navigation to failure message, open the following links with CTRL/CMD-click.
example failure 2022-01-28T12:46:12.5203828Z

Full exception stacktrace
java.lang.AssertionError: expected [null] but found [org.apache.pulsar.client.impl.MessageImpl@360348b7]
	at org.testng.Assert.fail(Assert.java:99)
	at org.testng.Assert.failNotSame(Assert.java:1033)
	at org.testng.Assert.assertNull(Assert.java:965)
	at org.testng.Assert.assertNull(Assert.java:953)
	at org.apache.pulsar.testclient.PerformanceTransactionTest.testConsumeTxnMessage(PerformanceTransactionTest.java:235)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
	at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:45)
	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:73)
	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

java.lang.AssertionError: expected [null] but found [org.apache.pulsar.client.impl.MessageImpl@2ac4931c]
	at org.testng.Assert.fail(Assert.java:99)
	at org.testng.Assert.failNotSame(Assert.java:1033)
	at org.testng.Assert.assertNull(Assert.java:965)
	at org.testng.Assert.assertNull(Assert.java:953)
	at org.apache.pulsar.testclient.PerformanceTransactionTest.testConsumeTxnMessage(PerformanceTransactionTest.java:235)

Usage tip: To enable automatic navigation to failure message, open the following links with CTRL/CMD-click.
example failure 2022-02-02T13:38:31.6649715Z

Full exception stacktrace
java.lang.AssertionError: expected [null] but found [org.apache.pulsar.client.impl.MessageImpl@2ac4931c]
	at org.testng.Assert.fail(Assert.java:99)
	at org.testng.Assert.failNotSame(Assert.java:1033)
	at org.testng.Assert.assertNull(Assert.java:965)
	at org.testng.Assert.assertNull(Assert.java:953)
	at org.apache.pulsar.testclient.PerformanceTransactionTest.testConsumeTxnMessage(PerformanceTransactionTest.java:235)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
	at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:45)
	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:73)
	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

java.lang.AssertionError: expected [null] but found [org.apache.pulsar.client.impl.MessageImpl@54f7543]
	at org.testng.Assert.fail(Assert.java:99)
	at org.testng.Assert.failNotSame(Assert.java:1033)
	at org.testng.Assert.assertNull(Assert.java:965)
	at org.testng.Assert.assertNull(Assert.java:953)
	at org.apache.pulsar.testclient.PerformanceTransactionTest.testConsumeTxnMessage(PerformanceTransactionTest.java:235)

Usage tip: To enable automatic navigation to failure message, open the following links with CTRL/CMD-click.
example failure 2022-01-29T05:24:40.1685311Z

Full exception stacktrace
java.lang.AssertionError: expected [null] but found [org.apache.pulsar.client.impl.MessageImpl@54f7543]
	at org.testng.Assert.fail(Assert.java:99)
	at org.testng.Assert.failNotSame(Assert.java:1033)
	at org.testng.Assert.assertNull(Assert.java:965)
	at org.testng.Assert.assertNull(Assert.java:953)
	at org.apache.pulsar.testclient.PerformanceTransactionTest.testConsumeTxnMessage(PerformanceTransactionTest.java:235)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
	at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:45)
	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:73)
	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

@lhotari
Copy link
Member Author

lhotari commented Feb 3, 2022

The test is flaky that was introduced by the changes in #11933. @liangyepianzhou would you mind taking a look at #14109 ? Thank you

@liangyepianzhou
Copy link
Contributor

This is an unstable test caused by message retransmissions. I think it should be possible to fix this test by reducing the accuracy. I'll give a PR with the fix later.

@lhotari
Copy link
Member Author

lhotari commented Feb 3, 2022

This is an unstable test caused by message retransmissions. I think it should be possible to fix this test by reducing the accuracy. I'll give a PR with the fix later.

Thanks @liangyepianzhou , that would be great.

@github-actions
Copy link

github-actions bot commented Mar 6, 2022

The issue had no activity for 30 days, mark with Stale label.

@github-actions
Copy link

The issue had no activity for 30 days, mark with Stale label.

@codelipenghui
Copy link
Contributor

@liangyepianzhou Do you want to push a PR to fix the issue?

liangyepianzhou added a commit that referenced this issue Aug 9, 2022
* [Fix][Flaky-test] Fix testConsumeTxnMessage
Master #14109
## Motivation
The transaction commit is async, so the consumer can still receive message when the consumer rebuilds.
## Modification
Add  Awaitility.await() for check-ing whether the ongoingTxns = 0.
Technoboy- pushed a commit that referenced this issue Aug 10, 2022
* [Fix][Flaky-test] Fix testConsumeTxnMessage
Master #14109
## Motivation
The transaction commit is async, so the consumer can still receive message when the consumer rebuilds.
## Modification
Add  Awaitility.await() for check-ing whether the ongoingTxns = 0.
Technoboy- pushed a commit to merlimat/pulsar that referenced this issue Aug 16, 2022
* [Fix][Flaky-test] Fix testConsumeTxnMessage
Master apache#14109
## Motivation
The transaction commit is async, so the consumer can still receive message when the consumer rebuilds.
## Modification
Add  Awaitility.await() for check-ing whether the ongoingTxns = 0.
@codelipenghui
Copy link
Contributor

The flaky test still happens on the master branch https://github.com/apache/pulsar/runs/7899134537?check_suite_focus=true, reopen the ticket. @liangyepianzhou Could you please help check again?

@codelipenghui codelipenghui reopened this Aug 18, 2022
Jason918 pushed a commit that referenced this issue Sep 2, 2022
* [Fix][Flaky-test] Fix testConsumeTxnMessage
Master #14109
## Motivation
The transaction commit is async, so the consumer can still receive message when the consumer rebuilds.
## Modification
Add  Awaitility.await() for check-ing whether the ongoingTxns = 0.

(cherry picked from commit c29503e)
nicoloboschi pushed a commit to datastax/pulsar that referenced this issue Sep 2, 2022
* [Fix][Flaky-test] Fix testConsumeTxnMessage
Master apache#14109
## Motivation
The transaction commit is async, so the consumer can still receive message when the consumer rebuilds.
## Modification
Add  Awaitility.await() for check-ing whether the ongoingTxns = 0.

(cherry picked from commit c29503e)
(cherry picked from commit 6db6679)
@poorbarcode
Copy link
Contributor

congbobo184 pushed a commit that referenced this issue Sep 29, 2022
…7837)

Fixes: #14109


### Motivation

The expected execution flow for this test is: 

1. send 505 messages
2. commit 10 transactions, every transaction ack 50 messages
3. receive the last 5 messages in the last transaction, wait for transaction timeout
4. confirm that the last 5 messages can be consumed by new consumer

<strong>(High light)</strong> The default value for transaction TTL is 10 seconds, and the default value for `Awaitility.await` is also 10 seconds,  so this test is not stable.

Note: This is a guess cause, the problem is not reproduced locally. But after transaction TTL is set to 11s, the probability of the problem occurring is 100%.

### Modifications

Fix flaky test
- set transaction TTL to 5s

Other changes
- define a name for the task thread
- acknowledge the last 5 messages

### Matching PR in forked repository

PR in forked repository: 

- poorbarcode#13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment