Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: fix OOM via memory limit on DB row results #52

Merged
merged 3 commits into from
Apr 3, 2021
Merged

Conversation

david-allison
Copy link
Member

@david-allison david-allison commented Apr 1, 2021

We now use a 2MB page size, the same as CursorWindow.sCursorWindowSize

We occasionally got OOMs on methods which returned unbounded data
eg. getting the field data for notes: "Check Database" crashed

To fix this, instead of saying to the Rust "we want 1000 rows max",
we say "we want max 2MB of data"

The calculation for the data is inexact - string length, and 8 bytes
for doubles/longs.
Hasn't been tested thoroughly, but seems to only be < ~7%
off the protobuf size for a string-only column (underestimate).

We measure the rust in-memory usage rather than the size of the
serialized protobuf

Measuring the serialized size wasn't researched, but was assumed to be
hard, as we would need to stream into a protobuf collection, and be
able to efficiently query the new size for each row we add.

Main changes:
StreamingProtobufSQLiteCursor - no longer use fixed size pages
dbcommand: allow access via an offset to the result set rather than
via pages

Adds: setDbPageSize to allow the change of the size for debugging
Adds: Unit tests for the rust - not yet executed in CI (#51)
rename: getPage -> getNextSlice

bumps anki commit to add field sqlite.proto#DbResponse:start_index

Fixes #14 (no longer necessary)
Fixes ankidroid/Anki-Android#8178

fold_while was deprecated in 0.9.0
and undeprecated in 0.10.0
@@ -139,10 +139,10 @@ public String getPath() {
}

@Override
public Sqlite.DBResponse getPage(int page, int sequenceNumber) {
public Sqlite.DBResponse getNextSlice(long startIndex, int sequenceNumber) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Page did not have this effect but now with Slice the code review is making me hungry
in seriousness - between "chunk" or this, I like slice, nice

Copy link
Member

@mikehardy mikehardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm definitely not qualified to review Rust after looking at it for a moment, and don't have the time to get there but the Java bits looked sane (although if you step back for a moment: it's kind of insanely complicated for a flashcard app! until you realize everyone wants multi-device syncing online backup and fast performance from 10,000 cards with 100,000 reviews and about 4 new features a week and then, yep - looks good!)

@Arthur-Milchior
Copy link
Member

The calculation for the data is inexact - string length, and 8 bytes
for doubles/longs.

Not clear to me. Do you mean that you are choosing to make an approximation here. That you are correcting this bug. That it can't be better

@Arthur-Milchior
Copy link
Member

although if you step back for a moment: it's kind of insanely complicated for a flashcard app! until you realize everyone wants multi-device syncing online backup and fast performance from 10,000 cards with 100,000 reviews and about 4 new features a week and then, yep - looks good!

Actually, I believe the number of cards is really not the problem here. It's the fact that deck and note type lists are saved as a big json. Each data is small enough otherwise

@Arthur-Milchior
Copy link
Member

I don't know what page/slices are, what it means that it changes.

what limit the size. The part between Rust and Java, or the sqlite access?

Copy link
Member

@Arthur-Milchior Arthur-Milchior left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also hard to review. I don't know what should be done with DB. I can read the rust but don't know what constraints must be satisfied.

Do you think it can eventually be reused and shared as an external library, seems a lot of work and I can imagine other people want to use rust in android as backend

@david-allison
Copy link
Member Author

david-allison commented Apr 1, 2021

Can I just say that I really appreciate the review comments, thank you!!!!


The calculation for the data is inexact - string length, and 8 bytes
for doubles/longs.

Not clear to me. Do you mean that you are choosing to make an approximation here. That you are correcting this bug. That it can't be better

Yes, I'm choosing an approximation: realistically it's not worth the time to get an exact size of the protobuf.

A bad implementation would be ridiculously slow, and a good implementation would likely be far too complicated to be worth the engineering effort.

We also estimate the Rust in-memory size. With all these estimations, we were <10% off the 'proper' value, 2MB is a conservative number anyway, I'd expect that we could double it and not need to worry, so none of the estimations bother me.

Actually, I believe the number of cards is really not the problem here. It's the fact that deck and note type lists are saved as a big json. Each data is small enough otherwise

Neither decks nor note types matter, as we should handle them via loadColumn and chunking based on the window size.

This is for the few queries where saying "1000 rows" could blow up and cause an OOM. This mostly occurs on the notes table - the flds column is unlimited length and a few queries iterate over this.

I don't know what page/slices are, what it means that it changes.
what limit the size. The part between Rust and Java, or the sqlite access?

Native code (Rust) has an effectively unlimited heap on Android. The JVM is severely limited.

I was advised to perform the same operation that the Python does with rslib: avoid streaming the results from queries from the database and pull everything into memory in one operation. Streaming is problematic due to locking on the collection object. We might need to revisit this in future, but to be honest, I doubt it.

So: we pull in the entire resultset into the Rust, and stream to the Java Cursor in 2MB "slices" to the Java, to avoid the JVM's heap limit.

@mikehardy
Copy link
Member

although if you step back for a moment: it's kind of insanely complicated for a flashcard app! until you realize everyone wants multi-device syncing online backup and fast performance from 10,000 cards with 100,000 reviews and about 4 new features a week and then, yep - looks good!

Actually, I believe the number of cards is really not the problem here. It's the fact that deck and note type lists are saved as a big json. Each data is small enough otherwise

I was stepping even farther. Why is a flashcards program rendering HTML/CSS/JS (with plugins!), based out of Java, going across a Protobuf bridge to Rust, where it does database access etc. It's sort of like a Rube Goldberg machine. But it all seems justifiable, just so many layers...

@dae
Copy link
Contributor

dae commented Apr 1, 2021

Well, to be fair, you had database cursors in Java calling out to a C++ wrapper over JNI which forwarded the requests on to the C SQLite bindings - most of the layers were already there, they were just someone else's responsibility when you were using requery :-)

All this complexity surrounding the database cursors is unfortunate, and it's considerably more complicated than the computer version's DB layer. But the low heap limit on Java means you're working with constraints that the computer version doesn't have.

The good news is that these changes are a stop-gap, and could potentially be made simpler in the future. Once AnkiDroid has been updated to call backend methods to mutate notes in bulk, it will no longer be necessary to divide large queries into chunks and go through all the extra ceremony of passing them to Java and back - the backend can just make the changes directly. Most of the backend functionality is already implemented; the one remaining task that springs to mind is the import/export code.

@mikehardy
Copy link
Member

I appreciate that you actually responded to my Rube-Goldberging of the system @dae haha - for the avoidance of doubt I honestly do believe each layer is suiting it's purpose well. To take it farther in the humor direction, I hope that someone implements a card template in WASM using our javascript methods, and I will personally send them a dollar if their original language before compiling to WASM was...Rust :)

We now use a 2MB page size, the same as CursorWindow.sCursorWindowSize

We occasionally got OOMs on methods which returned unbounded data
eg. getting the field data for notes: "Check Database" crashed

To fix this, instead of saying to the Rust "we want 1000 rows max",
we say "we want max 2MB of data"

The calculation for the data is inexact - string length, and 8 bytes
for doubles/longs.
Hasn't been tested thoroughly, but seems to only be < ~7%
off the protobuf size for a string-only column (underestimate).

We measure the rust in-memory usage rather than the size of the
serialized protobuf

Measuring the serialized size wasn't researched, but was assumed to be
hard, as we would need to stream into a protobuf collection, and be
able to efficiently query the new size for each row we add.

Main changes:
StreamingProtobufSQLiteCursor - no longer use pages
dbcommand: allow access via an offset to the result set rather than
via pages

Adds: setDbPageSize to allow the change of the size for debugging
Adds: Unit tests for the rust - not yet executed in CI (#51)
rename: getPage -> getNextSlice

bumps `anki` commit to add field sqlite.proto#DbResponse:start_index

Fixes #14 (no longer necessary)
Fixes ankidroid/#8178
Allows setting the page size from a reference to the backend
@david-allison david-allison merged commit d09b392 into main Apr 3, 2021
@david-allison david-allison deleted the fix-oom branch April 3, 2021 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants