safer tracking of issued leveldb iterators #231

jrudolph · 2016-03-10T13:45:53Z

Previously, creating an iterator and tracking wasn't an atomic operation
resulting in iterator operations being run after the db had been closed.
This resulted in all kinds of crashes because of illegal native state
during the shutdown procedure.

/cc @volkerstampa

jrudolph · 2016-03-10T14:05:52Z

(I will update this after your module restructuring but it's pretty self-contained so this shouldn't be a big problem)

krasserm · 2016-03-11T06:43:54Z

LGTM, many thanks for fixing! Do you also have a running test for reproducing the crashes, for example, one that shuts down the log actors while several readers are trying to read/replay events?

Also, please use the wip-<issue-nr>-<descritpion> or fix-<issue-nr>-<descritpion> pattern for branch names. No need to change for this PR though 😃

jrudolph · 2016-03-11T09:28:47Z

I'll try to write a test that exposes the behavior and fix the suggested issues in the next days.

krasserm · 2016-03-11T09:31:31Z

Great, thanks! I'll also have #169 ready soon.

krasserm · 2016-03-12T07:23:09Z

@jrudolph please sign the CLA so that I can merge your contribution. Btw, #169 is already merged to master.

jrudolph · 2016-03-14T08:31:05Z

Cool, I'm now at writing a test and merging to the new project structure.

jrudolph · 2016-03-14T10:10:29Z

I updated this PR to the current master and addressed your suggestions. I tried writing a test but didn't managed to write one that would evoke the crash without having to make changes to the actor code itself. I guess the reason is that the crash relies heavily on the scheduling of threads so that the previous Thread.sleep(500) in postStop ends at the exact right moment to catch the EventIterator code in the right moment.

Even with a test for this issue in place it wouldn't make sure that the new code works correctly. A better long term solution would be putting the whole native LevelDB API behind another wrapper that takes care for these concerns and which can be tested in isolation.

Regarding the CLA, I sent a mail to [email protected] and will report back once I've signed it.

jrudolph · 2016-03-17T13:54:41Z

Signed the CLA now.

krasserm · 2016-03-17T15:48:35Z

Thanks. Can you please investigate the build failures? https://travis-ci.org/RBMHTechnology/eventuate/builds/115815784#L1167

Previously, creating an iterator and tracking wasn't an atomic operation resulting in iterator operations being run after the db had been closed. This resulted in all kinds of crashes because of illegal native state during the shutdown procedure.

jrudolph · 2016-03-17T15:59:25Z

Looks like I made a last-minute config name change... Let's wait what the next build says.

jrudolph · 2016-03-17T16:28:23Z

Now all checks have passed.

krasserm · 2016-03-18T06:05:27Z

In this commit I experimented with an alternative solution proposed by @volkerstampa. Here, usage of iterators is done in child actors of the parent LeveldbEventLog actor. When the parent actor stops, Akka will ensure that the child actors (using the iterators) will be stopped first, hence there won't be active iterators the the parent actors stops LevelDB. This approach is IMO less complex than using explicit iterator management with a global reference counter. WDYT?

jrudolph · 2016-03-18T06:37:40Z

Yes, that sounds like a nice solution as well. Maybe it could be made a bit more explicit in the code why it is done like this. The flow of data is a bit subtle to understand because the new child actor closes over the state of the parent actor which makes it somewhat hard to see in which way the invariants are enforced.

krasserm · 2016-03-18T06:43:03Z

@jrudolph can you make a proposal how to improve on that?

volkerstampa · 2016-03-18T07:56:55Z

@krasserm I just tried the child-actor based approach with my local setup and can confirm that it seems to fix the problems of crashing tests.

@jrudolph The child-actor closes in the same way over the state of the parent actor as the future did before, right? So, at least to me this does not make things harder to understand than before!?!

jrudolph · 2016-03-18T08:13:30Z

So, at least to me this does not make things harder to understand than before!?!

Yes, I agree. It's a pragmatic solution that requires minimal changes, it's certainly not worse than this PR. From my side it would probably be enough to add a comment to postStop why this is now safe.

So, if there's something left to be desired it's the same as with the current state of this PR. It's more about code organization. It would be nice if there was a single component which alone is responsible for making the LevelDB access crash-safe. In the best case, leveldbjni would ensure crash-safety itself (IMO it's a reasonable expectation for a Java lib not to crash the JVM in any cases...).

krasserm · 2016-03-19T06:53:10Z

@volkerstampa @jrudolph thanks for feedback. I just created #239 (superseding this PR) with an additional comment in postStop() how the issue has been fixed. Please review, thanks!

jrudolph mentioned this pull request Mar 14, 2016

LevelDB event log crashes JVM on shutdown frequently #234

Closed

krasserm added in progress enhancement labels Mar 14, 2016

krasserm added this to the 0.7 milestone Mar 14, 2016

jrudolph force-pushed the w/fix-leveldb-crashes-during-shutdown branch from 91b554d to e80cac3 Compare March 14, 2016 10:01

jrudolph force-pushed the w/fix-leveldb-crashes-during-shutdown branch from e80cac3 to 8b0fd58 Compare March 17, 2016 15:58

krasserm added bug and removed enhancement labels Mar 19, 2016

krasserm removed this from the 0.7 milestone Mar 19, 2016

krasserm closed this Mar 19, 2016

krasserm removed the in progress label Mar 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

safer tracking of issued leveldb iterators #231

safer tracking of issued leveldb iterators #231

jrudolph commented Mar 10, 2016

jrudolph commented Mar 10, 2016

krasserm commented Mar 11, 2016

jrudolph commented Mar 11, 2016

krasserm commented Mar 11, 2016

krasserm commented Mar 12, 2016

jrudolph commented Mar 14, 2016

jrudolph commented Mar 14, 2016

jrudolph commented Mar 17, 2016

krasserm commented Mar 17, 2016

jrudolph commented Mar 17, 2016

jrudolph commented Mar 17, 2016

krasserm commented Mar 18, 2016

jrudolph commented Mar 18, 2016

krasserm commented Mar 18, 2016

volkerstampa commented Mar 18, 2016

jrudolph commented Mar 18, 2016

krasserm commented Mar 19, 2016

safer tracking of issued leveldb iterators #231

safer tracking of issued leveldb iterators #231

Conversation

jrudolph commented Mar 10, 2016

jrudolph commented Mar 10, 2016

krasserm commented Mar 11, 2016

jrudolph commented Mar 11, 2016

krasserm commented Mar 11, 2016

krasserm commented Mar 12, 2016

jrudolph commented Mar 14, 2016

jrudolph commented Mar 14, 2016

jrudolph commented Mar 17, 2016

krasserm commented Mar 17, 2016

jrudolph commented Mar 17, 2016

jrudolph commented Mar 17, 2016

krasserm commented Mar 18, 2016

jrudolph commented Mar 18, 2016

krasserm commented Mar 18, 2016

volkerstampa commented Mar 18, 2016

jrudolph commented Mar 18, 2016

krasserm commented Mar 19, 2016