Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-14580: Moving EndToEndLatency from core to tools module #13095

Merged
merged 16 commits into from
Mar 2, 2023

Conversation

vamossagar12
Copy link
Contributor

Move EndToEndLatency to tools

@ijuma
Copy link
Member

ijuma commented Jan 9, 2023

Thanks for the PR. Can we add a test in that case? We'd want to verify manually that the test matches the previous behavior.

@ijuma
Copy link
Member

ijuma commented Jan 9, 2023

Also, I'm currently focused on completing KAFKA-14470. @mimaison since you fleshed out KAFKA-14525, do you have cycles to do these reviews?

@vamossagar12
Copy link
Contributor Author

Actually i pinged too soon :) Before getting it reviewed, I would test on my local and also add a couple of tests. Thanks.

@mimaison
Copy link
Member

mimaison commented Jan 9, 2023

Yes I can review this.

I started looking at KAFKA-14525 because we were stepping on each others toes in KAFKA-14470, but we should finish that first.

Many of the tests for these commands start full clusters and all that test logic is currently in core. We should be able to move it to server-common but I'm not quite sure if we want to drag many ZooKeeper bits there.

@ijuma
Copy link
Member

ijuma commented Jan 9, 2023

We already depend on core when it comes to the tools test module, so we don't necessarily have to move things for that.

@vamossagar12
Copy link
Contributor Author

hi @mimaison , I added a few basic unit tests and updated the system test needs (end_to_end_latency.py). I haven't set it up locally but I am hoping those should run from here to validate if the changes worked. Thanks!

@vamossagar12
Copy link
Contributor Author

Looks like there are some more checkstyle failures. Will fix them.

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vamossagar12, thanks for working on this.

I left some comments.

@vamossagar12
Copy link
Contributor Author

#13095 (comment)

I looked at other classes in tool and saw arg4jparse being used in them and assumed that this is the direction that has been chosen. Is that not the case? In that case I can revert to using args. Plz let me know

@vamossagar12
Copy link
Contributor Author

@fvaleri , i removed the code changes related to argparse4j. That way the interface of the tool is exactly similar to what it was previously. Thanks

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments.

In general we should be as close as possible to the original code, to avoid having an impact on correctness and/or performance. Any improvement or big refactoring can be discussed in a separate PR.

int numMessages = Integer.parseInt(args[2]);
String acks = args[3];
int messageSizeBytes = Integer.parseInt(args[4]);
String propertiesFile = args.length > 5 ? args[5] : null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not equivalent and there is some missing logic (filter).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that was a miss. Added the filter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code is using Optional, which is a much better approach, and this also requires some changes further down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I can do that but imo it's not going to cause much difference in terms of readability or other factors. Of course using Optional would help in attain parity with the scala code but that's all we get.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make the code much more readable IMO and it's a little change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, please don't replace Option with null. The equivalent is Optional in such cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ijuma , @fvaleri I made the changes. Let me know how's it looking now.

long begin = System.nanoTime();
//Send message (of random bytes) synchronously then immediately poll for it
producer.send(new ProducerRecord<>(topic, message)).get();
ConsumerRecords<byte[], byte[]> records = consumer.poll(Duration.ofMillis(POLL_TIMEOUT_MS));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you are not using the iterator as in the original code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I am not using the iterator is that ConsumerRecord exposes functions like isEmpty and count which seemed easier to understand when used in validate method. Let me know if you think otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to stick with the original code.

Copy link
Member

@ijuma ijuma Jan 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to keep the original code when it's strictly worse (for example the original code ends up exhausting the iterator to get the size).

We should keep the original code if there isn't a clear improvement from changing and whatever changes we do should be localized - changes that affect many methods, other files, etc. are best avoided if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah in this case I feel using the methods provided by ConsumerRecords seems cleaner.

We should keep the original code if there isn't a clear improvement from changing and whatever changes we do should be localized - changes that affect many methods, other files, etc. are best avoided if possible.

Ack. Would keep that in mind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to keep the original code when it's strictly worse (for example the original code ends up exhausting the iterator to get the size).

Fair enough. Thanks.

@vamossagar12
Copy link
Contributor Author

@fvaleri , thanks. I made the changes.

Properties adminProps = loadPropsWithBootstrapServers(propertiesFile, brokers);
Admin adminClient = Admin.create(adminProps);
NewTopic newTopic = new NewTopic(topic, defaultNumPartitions, defaultReplicationFactor);
adminClient.createTopics(Collections.singletonList(newTopic));
Copy link
Contributor

@fvaleri fvaleri Jan 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to remove this line to avoid TopicExistsException when the test topic does not exist. Once you do that, I'll approve.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that was a miss. Removed it.

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Builds on 8 and 17 are ok.
Test failure on 11 is unrelated and it works fine on my machine.

Thanks.

@vamossagar12
Copy link
Contributor Author

Builds on 8 and 17 are ok. Test failure on 11 is unrelated and it works fine on my machine.

Thanks.

Thanks @fvaleri . @mimaison would you be able to take a look as well? Thanks

@vamossagar12
Copy link
Contributor Author

hi @mimaison would you plz review this PR whenever you get the chance? It's already approved by @fvaleri . Thanks

Copy link
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. Thanks for the PR, I left a few minor suggestions.

List<TopicPartition> topicPartitions = consumer.
partitionsFor(topic).
stream().map(p -> new TopicPartition(p.topic(), p.partition()))
.collect(Collectors.toList());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes we put dots at the end of lines and some other times it's in the front. Can you make it constant in this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i made the usage of dots consistent in this block of code. I couldn't find other occurrences of inconsistencies wrt dots. There was one for + when being used for concatenation which i have made consistent.

@vamossagar12
Copy link
Contributor Author

Sorry for the delay. Thanks for the PR, I left a few minor suggestions.

No problem! I addressed the comments. Thanks for the review.

Copy link
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the PR

Copy link
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we also need to update EndToEndLatencyService to poin to the new class: https://github.com/apache/kafka/blob/trunk/tests/kafkatest/services/performance/end_to_end_latency.py#L93

@vamossagar12
Copy link
Contributor Author

vamossagar12 commented Jan 28, 2023

EndToEndLatencyService

Thanks @mimaison . Actually I lack some context here. The class, TestEndToEndLatency was renamed to EndToEndLatency with this very old PR: e43c9af#diff-52dbfa7ab683a53b91a84c35f309b56ff1b2a1cd94e4ccb86c5843e9e44a050f and in a subsequent PR, the support for zk_connect was removed. So, the line you highlighted above mayn't need to be changed. Also, there's no reference of TestEndToEndLatency in the project anymore in trunk so I am assuming it would be shipped with older version (< 0.9) maybe?

Were you referring to this line instead? https://github.com/apache/kafka/blob/trunk/tests/kafkatest/services/performance/end_to_end_latency.py#L127
which is invoked on line #91? I agree, that needs to be changed.

@mimaison
Copy link
Member

I simply meant that it looks like some changes in the system tests are required too.

@vamossagar12
Copy link
Contributor Author

I simply meant that it looks like some changes in the system tests are required too.

Got it. Thanks for the confirmation.

@vamossagar12
Copy link
Contributor Author

@mimaison , I updated the system test to point to the new class. That one place seemed to be the only one relevant in this case.

@fvaleri
Copy link
Contributor

fvaleri commented Jan 31, 2023

@mimaison , I updated the system test to point to the new class. That one place seemed to be the only one relevant in this case.

Do you have a test run output that shows it works and run time is similar? You can look at what I did for the JmxTool migration that is also used by STs. I would also suggest to discard the first run of such test, because the test framework needs to start a bunch of containers.

@vamossagar12
Copy link
Contributor Author

@fvaleri , @mimaison here's a sample run from my local setup :

(ducktape) sarao@C02GG1KCML7H kafka % TC_PATHS="tests/kafkatest/benchmarks/core/benchmark_test.py::Benchmark.test_end_to_end_latency" bash tests/docker/run_tests.sh

> Configure project :
Starting build with version 3.5.0-SNAPSHOT (commit id 4ecc30e0) using Gradle 7.6, Java 1.8 and Scala 2.13.10
Build properties: maxParallelForks=8, maxScalacThreads=8, maxTestRetries=0

BUILD SUCCESSFUL in 10s
170 actionable tasks: 170 up-to-date
docker exec ducker01 bash -c "cd /opt/kafka-dev && ducktape --cluster-file /opt/kafka-dev/tests/docker/build/cluster.json  ./tests/kafkatest/benchmarks/core/benchmark_test.py::Benchmark.test_end_to_end_latency "
/usr/local/lib/python3.9/dist-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated
  "class": algorithms.Blowfish,
[INFO:2023-02-12 00:59:22,204]: starting test run with session id 2023-02-12--002...
[INFO:2023-02-12 00:59:22,206]: running 10 tests...
[INFO:2023-02-12 00:59:22,208]: Triggering test 1 of 10...
[INFO:2023-02-12 00:59:22,271]: RunnerClient: Loading test {'directory': '/opt/kafka-dev/tests/kafkatest/benchmarks/core', 'file_name': 'benchmark_test.py', 'cls_name': 'Benchmark', 'method_name': 'test_end_to_end_latency', 'injected_args': {'security_protocol': 'SASL_PLAINTEXT', 'compression_type': 'none'}}
[INFO:2023-02-12 00:59:22,300]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_PLAINTEXT.compression_type=none: on run 1/1
[INFO:2023-02-12 00:59:22,321]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_PLAINTEXT.compression_type=none: Setting up...
[INFO:2023-02-12 00:59:27,898]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_PLAINTEXT.compression_type=none: Running...
[INFO:2023-02-12 01:03:02,248]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_PLAINTEXT.compression_type=none: Tearing down...
[INFO:2023-02-12 01:03:34,257]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_PLAINTEXT.compression_type=none: PASS
[INFO:2023-02-12 01:03:34,263]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_PLAINTEXT.compression_type=none: Data: {'latency_50th_ms': 7.0, 'latency_99th_ms': 47.0, 'latency_999th_ms': 169.0}
[INFO:2023-02-12 01:03:34,347]: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[INFO:2023-02-12 01:03:34,350]: Triggering test 2 of 10...
[INFO:2023-02-12 01:03:34,399]: RunnerClient: Loading test {'directory': '/opt/kafka-dev/tests/kafkatest/benchmarks/core', 'file_name': 'benchmark_test.py', 'cls_name': 'Benchmark', 'method_name': 'test_end_to_end_latency', 'injected_args': {'security_protocol': 'SASL_PLAINTEXT', 'compression_type': 'snappy'}}
[INFO:2023-02-12 01:03:34,434]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_PLAINTEXT.compression_type=snappy: on run 1/1
[INFO:2023-02-12 01:03:34,446]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_PLAINTEXT.compression_type=snappy: Setting up...
[INFO:2023-02-12 01:03:47,646]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_PLAINTEXT.compression_type=snappy: Running...
[INFO:2023-02-12 01:07:06,578]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_PLAINTEXT.compression_type=snappy: Tearing down...
[INFO:2023-02-12 01:07:34,367]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_PLAINTEXT.compression_type=snappy: PASS
[INFO:2023-02-12 01:07:34,372]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_PLAINTEXT.compression_type=snappy: Data: {'latency_50th_ms': 9.0, 'latency_99th_ms': 50.0, 'latency_999th_ms': 131.0}
[INFO:2023-02-12 01:07:34,442]: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[INFO:2023-02-12 01:07:34,445]: Triggering test 3 of 10...
[INFO:2023-02-12 01:07:34,485]: RunnerClient: Loading test {'directory': '/opt/kafka-dev/tests/kafkatest/benchmarks/core', 'file_name': 'benchmark_test.py', 'cls_name': 'Benchmark', 'method_name': 'test_end_to_end_latency', 'injected_args': {'security_protocol': 'SASL_SSL', 'compression_type': 'none'}}
[INFO:2023-02-12 01:07:34,506]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_SSL.compression_type=none: on run 1/1
[INFO:2023-02-12 01:07:34,520]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_SSL.compression_type=none: Setting up...
[INFO:2023-02-12 01:07:38,538]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_SSL.compression_type=none: Running...
[INFO:2023-02-12 01:11:25,549]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_SSL.compression_type=none: Tearing down...
[INFO:2023-02-12 01:11:49,491]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_SSL.compression_type=none: PASS
[INFO:2023-02-12 01:11:49,495]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_SSL.compression_type=none: Data: {'latency_50th_ms': 9.0, 'latency_99th_ms': 114.0, 'latency_999th_ms': 390.0}
[INFO:2023-02-12 01:11:49,554]: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[INFO:2023-02-12 01:11:49,556]: Triggering test 4 of 10...
[INFO:2023-02-12 01:11:49,608]: RunnerClient: Loading test {'directory': '/opt/kafka-dev/tests/kafkatest/benchmarks/core', 'file_name': 'benchmark_test.py', 'cls_name': 'Benchmark', 'method_name': 'test_end_to_end_latency', 'injected_args': {'security_protocol': 'SASL_SSL', 'compression_type': 'snappy'}}
[INFO:2023-02-12 01:11:49,638]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_SSL.compression_type=snappy: on run 1/1
[INFO:2023-02-12 01:11:49,651]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_SSL.compression_type=snappy: Setting up...
[INFO:2023-02-12 01:11:53,742]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_SSL.compression_type=snappy: Running...
[INFO:2023-02-12 01:15:14,809]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_SSL.compression_type=snappy: Tearing down...
[INFO:2023-02-12 01:15:46,057]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_SSL.compression_type=snappy: PASS
[INFO:2023-02-12 01:15:46,061]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=SASL_SSL.compression_type=snappy: Data: {'latency_50th_ms': 8.0, 'latency_99th_ms': 62.0, 'latency_999th_ms': 125.0}
[INFO:2023-02-12 01:15:46,131]: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[INFO:2023-02-12 01:15:46,136]: Triggering test 5 of 10...
[INFO:2023-02-12 01:15:46,190]: RunnerClient: Loading test {'directory': '/opt/kafka-dev/tests/kafkatest/benchmarks/core', 'file_name': 'benchmark_test.py', 'cls_name': 'Benchmark', 'method_name': 'test_end_to_end_latency', 'injected_args': {'security_protocol': 'PLAINTEXT', 'compression_type': 'none'}}
[INFO:2023-02-12 01:15:46,228]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=PLAINTEXT.compression_type=none: on run 1/1
[INFO:2023-02-12 01:15:46,319]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=PLAINTEXT.compression_type=none: Setting up...
[INFO:2023-02-12 01:15:50,523]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=PLAINTEXT.compression_type=none: Running...
[INFO:2023-02-12 01:17:00,744]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=PLAINTEXT.compression_type=none: Tearing down...
[INFO:2023-02-12 01:17:30,336]: RunnerClient: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_end_to_end_latency.security_protocol=PLAINTEXT.compression_type=none: FAIL: TimeoutError("Kafka servers didn't register at ZK within 30 seconds")

I stopped after #5 as there was some flakiness.

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vamossagar12,

In end_to_end_latency.py there is a reference to a class that doesn't exist: kafka.tools.TestEndToEndLatency. The class EndToEndLatencyService contained in that file is used in benchmark_test.py and test_performance_services.py.

With benchmark_test I have 1 failure when running the entire suite (62 tests), but it works if I run the failing test in isolation.

kafkatest.benchmarks.core.benchmark_test.Benchmark.test_long_term_producer_throughput.security_protocol=PLAINTEXT.compression_type=none

With sanity_checks I have 3 failures, because it also runs with old versions, where we still have the old package name. For more context see #13233.

kafkatest.sanity_checks.test_performance_services.PerformanceServiceTest.test_version.version=0.9.0.1

kafkatest.sanity_checks.test_performance_services.PerformanceServiceTest.test_version.version=0.9.0.1.new_consumer=False

kafkatest.sanity_checks.test_performance_services.PerformanceServiceTest.test_version.version=1.1.1.new_consumer=False

@vamossagar12
Copy link
Contributor Author

Thanks @fvaleri for pointing me to the fix you made for JMXTool issue. I made the changes to also look into version when choosing the EndToEndLatency class.

I did point out the usage of TestEndToEndLatency here but I don't have context here. Also, when I try to run kafkatest.sanity_checks.test_performance_services.PerformanceServiceTest.test_version.version, I get this:

[INFO:2023-02-13 03:03:31,147]: RunnerClient: kafkatest.sanity_checks.test_performance_services.PerformanceServiceTest.test_version.version=dev.metadata_quorum=ZK: FAIL: gaierror(-2, 'Name or service not known')
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 185, in _do_run
    self.setup_test()
  File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 237, in setup_test
    self.test.setup()
  File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/test.py", line 74, in setup
    self.setUp()
  File "/opt/kafka-dev/tests/kafkatest/sanity_checks/test_performance_services.py", line 38, in setUp
    self.zk.start()
  File "/usr/local/lib/python3.9/dist-packages/ducktape/services/service.py", line 265, in start
    self.start_node(node, **kwargs)
  File "/opt/kafka-dev/tests/kafkatest/services/zookeeper.py", line 95, in start_node
    node.account.ssh("mkdir -p %s" % ZookeeperService.DATA)
  File "/usr/local/lib/python3.9/dist-packages/ducktape/cluster/remoteaccount.py", line 35, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/ducktape/cluster/remoteaccount.py", line 300, in ssh
    client = self.ssh_client
  File "/usr/local/lib/python3.9/dist-packages/ducktape/cluster/remoteaccount.py", line 215, in ssh_client
    self._set_ssh_client()
  File "/usr/local/lib/python3.9/dist-packages/ducktape/cluster/remoteaccount.py", line 35, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/ducktape/cluster/remoteaccount.py", line 189, in _set_ssh_client
    client.connect(
  File "/usr/local/lib/python3.9/dist-packages/paramiko/client.py", line 340, in connect
    to_try = list(self._families_and_addresses(hostname, port))
  File "/usr/local/lib/python3.9/dist-packages/paramiko/client.py", line 203, in _families_and_addresses
    addrinfos = socket.getaddrinfo(
  File "/usr/lib/python3.9/socket.py", line 953, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

@vamossagar12
Copy link
Contributor Author

vamossagar12 commented Feb 16, 2023

@fvaleri , I fixed the above error. Some of the containers had died because of which the error occurred. I have a clean run of the system test now:

================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.11.3
session_id:       2023-02-15--005
run time:         7 minutes 46.389 seconds
tests run:        7
passed:           7
flaky:            0
failed:           0
ignored:          0
================================================================================
test_id:    kafkatest.sanity_checks.test_performance_services.PerformanceServiceTest.test_version.version=0.8.2.2.new_consumer=False
status:     PASS
run time:   56.724 seconds
{"producer_performance": {"records_per_sec": 17123.287671, "mb_per_sec": 1.63}, "end_to_end_latency": {"latency_50th_ms": 1.0, "latency_99th_ms": 5.0, "latency_999th_ms": 16.0}, "consumer_performance": {"records_per_sec": 1428571.4286, "mb_per_sec": 136.2392}}
--------------------------------------------------------------------------------
test_id:    kafkatest.sanity_checks.test_performance_services.PerformanceServiceTest.test_version.version=0.9.0.1
status:     PASS
run time:   1 minute 7.159 seconds
{"producer_performance": {"records_per_sec": 10152.284264, "mb_per_sec": 0.97}, "end_to_end_latency": {"latency_50th_ms": 1.0, "latency_99th_ms": 10.0, "latency_999th_ms": 23.0}, "consumer_performance": {"records_per_sec": 62305.296, "mb_per_sec": 3.5948}}
--------------------------------------------------------------------------------
test_id:    kafkatest.sanity_checks.test_performance_services.PerformanceServiceTest.test_version.version=0.9.0.1.new_consumer=False
status:     PASS
run time:   58.971 seconds
{"producer_performance": {"records_per_sec": 13262.599469, "mb_per_sec": 1.26}, "end_to_end_latency": {"latency_50th_ms": 1.0, "latency_99th_ms": 9.0, "latency_999th_ms": 23.0}, "consumer_performance": {"records_per_sec": 1428571.4286, "mb_per_sec": 136.2392}}
--------------------------------------------------------------------------------
test_id:    kafkatest.sanity_checks.test_performance_services.PerformanceServiceTest.test_version.version=1.1.1.new_consumer=False
status:     PASS
run time:   1 minute 6.057 seconds
{"producer_performance": {"records_per_sec": 11350.737798, "mb_per_sec": 1.08}, "end_to_end_latency": {"latency_50th_ms": 2.0, "latency_99th_ms": 10.0, "latency_999th_ms": 21.0}, "consumer_performance": {"records_per_sec": 1666666.6667, "mb_per_sec": 158.9457}}
--------------------------------------------------------------------------------
test_id:    kafkatest.sanity_checks.test_performance_services.PerformanceServiceTest.test_version.version=dev.metadata_quorum=COLOCATED_KRAFT
status:     PASS
run time:   1 minute 8.146 seconds
{"producer_performance": {"records_per_sec": 4918.839154, "mb_per_sec": 0.47}, "end_to_end_latency": {"latency_50th_ms": 2.0, "latency_99th_ms": 12.0, "latency_999th_ms": 34.0}, "consumer_performance": {"records_per_sec": 12199.5134, "mb_per_sec": 1.1609}}
--------------------------------------------------------------------------------
test_id:    kafkatest.sanity_checks.test_performance_services.PerformanceServiceTest.test_version.version=dev.metadata_quorum=REMOTE_KRAFT
status:     PASS
run time:   1 minute 16.390 seconds
{"producer_performance": {"records_per_sec": 11520.737327, "mb_per_sec": 1.1}, "end_to_end_latency": {"latency_50th_ms": 2.0, "latency_99th_ms": 13.0, "latency_999th_ms": 42.0}, "consumer_performance": {"records_per_sec": 11722.807, "mb_per_sec": 1.1159}}
--------------------------------------------------------------------------------
test_id:    kafkatest.sanity_checks.test_performance_services.PerformanceServiceTest.test_version.version=dev.metadata_quorum=ZK
status:     PASS
run time:   1 minute 12.011 seconds
{"producer_performance": {"records_per_sec": 7547.169811, "mb_per_sec": 0.72}, "end_to_end_latency": {"latency_50th_ms": 2.0, "latency_99th_ms": 13.0, "latency_999th_ms": 29.0}, "consumer_performance": {"records_per_sec": 11990.8362, "mb_per_sec": 1.1031}}
--------------------------------------------------------------------------------

cc @mimaison

@vamossagar12
Copy link
Contributor Author

Tests passed.

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT. Thanks!

Copy link
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. It looks good overall, I left a couple of small comments

@vamossagar12
Copy link
Contributor Author

Thanks @mimaison . I addressed the comments.

Copy link
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! Should we also have a golden path test?

@vamossagar12
Copy link
Contributor Author

Thanks Michael. I added a happy path testcase.

@vamossagar12
Copy link
Contributor Author

Tests have passed for one of the builds.

Copy link
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the PR

@mimaison mimaison merged commit bb3111f into apache:trunk Mar 2, 2023
@fvaleri fvaleri added the tools label Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants