Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

redpanda: Clients hang when bootstrapping to a node which had it's data directory deleted #381

Closed
rkruze opened this issue Jan 7, 2021 · 4 comments
Assignees
Labels
area/redpanda kind/bug Something isn't working

Comments

@rkruze
Copy link
Contributor

rkruze commented Jan 7, 2021

To reproduce:

  1. Bring up a three-node cluster.
  2. On the node with node id 2, delete the data directory and kill -9 the redpanda process.
  3. Bring back up the redpanda process.
  4. Point a client to bootstrap from node id 2.
  5. Most operations to consume or produce will now hang.
@rkruze rkruze added kind/bug Something isn't working area/redpanda labels Jan 7, 2021
@emaxerrno
Copy link
Contributor

@mmaslankaprv this looks like we're holding some some semaphore of sorts.

@mmaslankaprv
Copy link
Member

I've tried this, I am unable to reproduce this. The only thing that I see is the client trying to update metadata. After node successfully rejoins the cluster metadata are updated and client is able to continue.

@mmaslankaprv
Copy link
Member

I was able to make openmessaging-benchmark to hang with this as a last message in logs

It can be reproduced with following steps:

  1. start benchmark workload
  2. stop benchmark with sigkill (Ctrl+C)
  3. remove redpanda folder
  4. restart redpanda
  5. start workload again

It seems to be an openmessaging benchmark worker issue:

Jan 07 13:16:20 ip-10-0-0-202 benchmark-worker[58633]: org.apache.kafka.common.KafkaException: Producer closed while allocating memory
Jan 07 13:16:20 ip-10-0-0-202 benchmark-worker[58633]:         at org.apache.kafka.clients.producer.internals.BufferPool.allocate(BufferPool.java:157) ~[org.apache.kafka-kafka-clients-2.6.0.jar:?]                 Jan 07 13:16:20 ip-10-0-0-202 benchmark-worker[58633]:         at org.apache.kafka.clients.producer.internals.RecordAccumulator.append(RecordAccumulator.java:214) ~[org.apache.kafka-kafka-clients-2.6.0.jar:?]
Jan 07 13:16:20 ip-10-0-0-202 benchmark-worker[58633]:         at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:949) ~[org.apache.kafka-kafka-clients-2.6.0.jar:?]
Jan 07 13:16:20 ip-10-0-0-202 benchmark-worker[58633]:         at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:870) ~[org.apache.kafka-kafka-clients-2.6.0.jar:?]                         Jan 07 13:16:20 ip-10-0-0-202 benchmark-worker[58633]:         at io.openmessaging.benchmark.driver.redpanda.RedpandaBenchmarkProducer.sendAsync(RedpandaBenchmarkProducer.java:45) ~[io.openmessaging.benchmark-driv
er-redpanda-0.0.1-SNAPSHOT.jar:?]
Jan 07 13:16:20 ip-10-0-0-202 benchmark-worker[58633]:         at io.openmessaging.benchmark.worker.LocalWorker.lambda$submitProducersToExecutor$10(LocalWorker.java:243) ~[io.openmessaging.benchmark-benchmark-fram
ework-0.0.1-SNAPSHOT.jar:?]                                                                                                                                                                                          Jan 07 13:16:20 ip-10-0-0-202 benchmark-worker[58633]:         at java.util.ArrayList.forEach(ArrayList.java:1541) ~[?:?]
Jan 07 13:16:20 ip-10-0-0-202 benchmark-worker[58633]:         at io.openmessaging.benchmark.worker.LocalWorker.lambda$submitProducersToExecutor$11(LocalWorker.java:240) [io.openmessaging.benchmark-benchmark-frame
work-0.0.1-SNAPSHOT.jar:?]
Jan 07 13:16:20 ip-10-0-0-202 benchmark-worker[58633]:         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
Jan 07 13:16:20 ip-10-0-0-202 benchmark-worker[58633]:         at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
Jan 07 13:16:20 ip-10-0-0-202 benchmark-worker[58633]:         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
Jan 07 13:16:20 ip-10-0-0-202 benchmark-worker[58633]:         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]                                                              Jan 07 13:16:20 ip-10-0-0-202 benchmark-worker[58633]:         at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) [io.netty-netty-all-4.1.12.Final.jar:4.1.
12.Final]
Jan 07 13:16:20 ip-10-0-0-202 benchmark-worker[58633]:         at java.lang.Thread.run(Thread.java:834) [?:?]

@mmaslankaprv
Copy link
Member

@rkruze can you share the driver setup that were reproducing the issue ?

@rkruze rkruze closed this as not planned Won't fix, can't repro, duplicate, stale Nov 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants