Data Loss at replication test #8214

CalvinSchulze · 2019-08-01T11:42:26Z

Affected Version

Druid 0.15 incubating on Ubuntu using jdk 8
Zookeeper 3.4.11
Kafka 2.12-2.2.0

Description

I decided to test Druid's fault tolerance today and set up a little test:

A logging software, which sends every second a small amount of events to Kafka, which are stream ingested into Druid. All data is supposed to be replicated on 2 historical nodes. Everything runs on 1 machine.

How I tested:

I produced roughly 10kB of data and ingested it
I stopped the logging software
I waited for the data to be handled (or at least, until the web GUI shows it)
I shut down one of the historicals
I restarted the logging software
I produced another 10kB of data
-> First weird behaviour. The data didn't get sent to the running historical and the tasks didn't terminate
I started the second historical again
-> It took a very long time and >100 failing tasks (duration = 0:00:00, no error, no log) for the
second historical to queue the missing segments
-> This test made the first historical crash
I restarted the first historical
-> Queue stays empty forever
-> The missing 5 segments are not realtime anymore
-> None of the historicals contains the missing 5 segments
-> The data is apparently unavailable forever

This didn't go too well, I'd say. Is this intended behaviour, or did I just do something wrong?

Greetings,
Calvin

CalvinSchulze · 2019-08-01T11:52:33Z

Seems to be related to #8137. I'll test it again, as soon as 0.15.1 is available as binary download.

vogievetsky · 2019-09-25T01:24:19Z

Have you had a chance to retest this with 0.15.1 or 0.16.0 that is going to be up in a few hours?

stale · 2020-07-02T07:28:50Z

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.

stale · 2020-08-01T09:22:47Z

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

stale bot added the stale label Jul 2, 2020

stale bot closed this as completed Aug 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Loss at replication test #8214

Data Loss at replication test #8214

CalvinSchulze commented Aug 1, 2019

CalvinSchulze commented Aug 1, 2019 •

edited

Loading

vogievetsky commented Sep 25, 2019

stale bot commented Jul 2, 2020

stale bot commented Aug 1, 2020

Data Loss at replication test #8214

Data Loss at replication test #8214

Comments

CalvinSchulze commented Aug 1, 2019

Affected Version

Description

CalvinSchulze commented Aug 1, 2019 • edited Loading

vogievetsky commented Sep 25, 2019

stale bot commented Jul 2, 2020

stale bot commented Aug 1, 2020

CalvinSchulze commented Aug 1, 2019 •

edited

Loading