Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: integration testing for compaction with compression #9594

Merged
merged 8 commits into from
May 3, 2023

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Mar 21, 2023

Issue #9521 exposed a lack of coverage in this area.

The tests are separate to the fixes, because the fixes will be backportable and the tests won't.

In this PR:

  • Update kgo-verifier, kgo-repeater and client-swarm to versions that include compression options, including a mixed mode that exercises several different codecs at the same time, and an optional highly-compressible payload that can enable huge records (100MB+) to sneak into the cluster under test by compressing below the batch size limit.
  • Add a variant of ManyClientsTest that uses compaction: this works well as a reproducer for issues like LZ4 decompression allocates contiguous memory for output buffer #9521, where many concurrent produce requests can try and get memory for their uncompressed batches at the same time.
  • Modify ManyPartitionsTest to use compaction + compression, since this is a stress test to validate scale, and we should use the most stressful type of traffic.
  • Revamp ManyClientsTest message counts etc to adapt its runtime to the different variants (compaction, non-compacted), and use rate limiting to make the runtime more consistent by avoiding clients stalling on long backoffs.

Fixes: #10092

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

  • none

@jcsp jcsp changed the title tests: cover compaction with compression tests: integration testing for compaction with compression Mar 21, 2023
@jcsp
Copy link
Contributor Author

jcsp commented Mar 22, 2023

/cdt tests/rptest/scale_tests/many_clients_test.py tests/rptest/scale_tests/many_partitions_test.py

@jcsp jcsp force-pushed the tests-compaction-compression branch from 4dd0bed to 0b5a4ff Compare April 13, 2023 14:16
@jcsp
Copy link
Contributor Author

jcsp commented Apr 14, 2023

/cdt tests/rptest/scale_tests

mmaslankaprv
mmaslankaprv previously approved these changes Apr 14, 2023
@jcsp
Copy link
Contributor Author

jcsp commented Apr 14, 2023

The new compacted variant of manyclientstest failed, needs investigation

@jcsp
Copy link
Contributor Author

jcsp commented Apr 18, 2023

/cdt tests/rptest/scale_tests/many_clients_test.py

@jcsp jcsp force-pushed the tests-compaction-compression branch from c7d2334 to 167d215 Compare April 20, 2023 16:54
@jcsp
Copy link
Contributor Author

jcsp commented Apr 20, 2023

/cdt tests/rptest/scale_tests/many_clients_test.py

@jcsp
Copy link
Contributor Author

jcsp commented Apr 21, 2023

Clustered ducktape run of manyclientstest was OK -- rebased since #10235 merged, this should be good to go now.

@jcsp jcsp marked this pull request as ready for review April 21, 2023 08:28
@jcsp jcsp requested a review from a team as a code owner April 21, 2023 08:28
@jcsp jcsp requested review from gousteris and removed request for a team April 21, 2023 08:28
@jcsp jcsp requested review from dotnwat, mmaslankaprv, andrwng and VladLazar and removed request for gousteris April 21, 2023 14:02
VladLazar
VladLazar previously approved these changes May 2, 2023
tests/rptest/scale_tests/many_clients_test.py Outdated Show resolved Hide resolved
tests/rptest/services/producer_swarm.py Outdated Show resolved Hide resolved
tests/rptest/scale_tests/many_clients_test.py Show resolved Hide resolved
jcsp added 7 commits May 2, 2023 17:39
This test was designed to run with minimal memory, and
it necessarily needs more memory when handling huge 32MiB
batches that must be compressed+decompressed.
This adds support for --messages-per-second
Running all clients at max speed against a rate limited
cluster results in a very unpredictable runtime, because
some clients end up backing off for a very long time.

Fixes redpanda-data#10092
- Use rate limiting in the client, so that it is
  not vulnerable to very long delays from rate limiting,
  causing rare timeouts due to statistical unfairness of
  which clients get limited.
- Add a 'realistic' compaction case to accompany the
  pathological case.  Realistic is incompressible data,
  so we're just paying the CPU tax, pathological is zeros,
  where we hit the memory inflation risk.
- Make the test adaptively choose messages counts for a
  target runtime.
- Configure a node rate limit that is aligned with the
  IOPs throughput of i3en.xlarge nodes when we are sending
  lots of tiny messages.
- Set a heuristic "effective message size" for the pathological
  compaction/compression case, which reflects the equivalent
  uncompressed message size for throughput calculation
  purposes.

Fixes redpanda-data#10092
@jcsp
Copy link
Contributor Author

jcsp commented May 2, 2023

/cdt tests/rptest/scale_tests/many_clients_test.py

1 similar comment
@jcsp
Copy link
Contributor Author

jcsp commented May 3, 2023

/cdt tests/rptest/scale_tests/many_clients_test.py

@jcsp jcsp merged commit 0a1a103 into redpanda-data:dev May 3, 2023
@jcsp jcsp deleted the tests-compaction-compression branch May 3, 2023 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI Failure (client times out after error sending) in ManyClientsTest.test_many_clients
3 participants