Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Loss at replication test #8214

Closed
CalvinSchulze opened this issue Aug 1, 2019 · 4 comments
Closed

Data Loss at replication test #8214

CalvinSchulze opened this issue Aug 1, 2019 · 4 comments
Labels

Comments

@CalvinSchulze
Copy link

Affected Version

Druid 0.15 incubating on Ubuntu using jdk 8
Zookeeper 3.4.11
Kafka 2.12-2.2.0

Description

I decided to test Druid's fault tolerance today and set up a little test:

A logging software, which sends every second a small amount of events to Kafka, which are stream ingested into Druid. All data is supposed to be replicated on 2 historical nodes. Everything runs on 1 machine.

How I tested:

  • I produced roughly 10kB of data and ingested it
  • I stopped the logging software
  • I waited for the data to be handled (or at least, until the web GUI shows it)
  • I shut down one of the historicals
  • I restarted the logging software
  • I produced another 10kB of data
    -> First weird behaviour. The data didn't get sent to the running historical and the tasks didn't terminate
  • I started the second historical again
    -> It took a very long time and >100 failing tasks (duration = 0:00:00, no error, no log) for the
    second historical to queue the missing segments
    -> This test made the first historical crash
  • I restarted the first historical
    -> Queue stays empty forever
    -> The missing 5 segments are not realtime anymore
    -> None of the historicals contains the missing 5 segments
    -> The data is apparently unavailable forever

This didn't go too well, I'd say. Is this intended behaviour, or did I just do something wrong?

Greetings,
Calvin

@CalvinSchulze
Copy link
Author

CalvinSchulze commented Aug 1, 2019

Seems to be related to #8137. I'll test it again, as soon as 0.15.1 is available as binary download.

@vogievetsky
Copy link
Contributor

Have you had a chance to retest this with 0.15.1 or 0.16.0 that is going to be up in a few hours?

@stale
Copy link

stale bot commented Jul 2, 2020

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.

@stale stale bot added the stale label Jul 2, 2020
@stale
Copy link

stale bot commented Aug 1, 2020

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

@stale stale bot closed this as completed Aug 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants