Suggested data modelling approach #1529

alexvinidiktov · 2021-06-28T08:36:13Z

Hello,

I'm evaluating Kinto for use in a language learning web app.

Users can learn any number of languages, they can create shelves of flashcard lists (card decks), and they can learn the card lists using spaced repetition.

How would I model the data for optimal sync?

If I use one collection (languages) and nest everything under it, would sync be very slow for dozens of shelves containing hundreds of decks which contain thousands of cards?

Another approach I can think of is using one kinto collection per language. But I'm not sure this will speed up sync enough.

What about splitting the data into several collections (one per entity): languages, shelves, decks, cards?
Then I would need to link them together in some manner. This would presumable make sync faster. But this would cause relationship management issues, wouldn't it?

languages
-- shelves
---- decks
------ cards

dstaley · 2021-07-02T22:03:19Z

(pinging @leplatrem in case he wants to also share his thoughts)

I think your best bet would be to store shelves, decks, and cards as three separate collections, and use a property on the deck and card records to point to the parent record (that is, decks would have a property pointing to the parent shelf, and cards would have a property pointing to the parent deck). This would mean that when a new shelf/deck/card is created, your client will pull only a single record representing the data unique to the newly created resource (technically it will pull all updated records in that collection).

The most painful part of this setup would be the initial sync. If you have dozens of shelves (~36) with hundreds of decks each (~400), each with thousands of cards (~5000), your initial sync would be 36 shelf records, 14,400 deck records, and 72,000,000 card records. While I doubt these numbers are accurate, I'm yielding to your opinion on that matter. With a dataset this large, you'll need to tweak a few important Kinto server settings to adjust the maximum collection size and the default page size (since kinto.js currently defaults to pulling as many records as the kinto server will let it in a single HTTP request). However, after the initial sync, incremental syncs will only upload records that have changed, which would be quite small. So, this approach is heavy for the initial sync, but highly-optimized for incremental sync. It also ensures that your incremental syncs only carry modified data.

However, you could choose to merge decks and cards into a single resource to reduce the number of records required for storage. In the above example, you'd only have 36+14,400 records per user, which is a lot more reasonable than tens of millions of records! The downside here is that you'd need to rely heavily on Kinto's JSON Patch operation support, which isn't used for synchronization with the kinto.js library. If you choose to go this route, kinto-http.js might be a better option, where you can be a lot more granular with the HTTP requests you make.

If you have more questions, please feel free to reach out again! I encourage you to give it a shot using real data so you can get a feel for what the performance characteristics are like. Kinto works fairly well for small and medium sized datasets, but when you're talking about datasets as large as the one you're describing, it would take a bit of tinkering to get it working smoothly. The Kinto server API has all the bits you'd need to get this working; it's just that our JS client is optimized for the more common use cases.

leplatrem · 2021-07-06T14:14:06Z

I agree with Dylan suggestions.

If I understood correctly, the data that you sync is readonly and coming from the server. So if the cost of the initial sync is your only issue, you can also ship «dumps» of the server records as JSON, along with the assets of your application, that you load into your local DB without pulling from the network.

With regards to managing links between different collections, it really depends how often you would create/delete/update the records behind the different links.

From my experience, beyond a couple of thousands records that change often, synchronization can be painful. Make sure you try out performance of synchronization with thousands of records before going too far with coding.

alexvinidiktov · 2021-07-08T09:41:10Z

Thanks @dstaley, @leplatrem!

The most painful part of this setup would be the initial sync. If you have dozens of shelves (~36) with hundreds of decks each (~400), each with thousands of cards (~5000), your initial sync would be 36 shelf records, 14,400 deck records, and 72,000,000 card records.

The most realistic scenario for a very active app user (like myself) who is learning multiple languages is to have from 10,000 to 50,000 card records overall.

Would this number of records be OK for initial sync?

However, after the initial sync, incremental syncs will only upload records that have changed, which would be quite small.

Most updates should be fairly small, except for when the user imports cards into a deck from an external source. I want to allow them to import up to 1,000 cards at once, but I could limit it to a smaller number, and sync after each import.

The downside here is that you'd need to rely heavily on Kinto's JSON Patch operation support, which isn't used for synchronization with the kinto.js library.

I'd like to avoid this if at all possible.

I encourage you to give it a shot using real data so you can get a feel for what the performance characteristics are like.

I will try and do that.

Kinto works fairly well for small and medium sized datasets, but when you're talking about datasets as large as the one you're describing, it would take a bit of tinkering to get it working smoothly.

Would 50,000 records constitute a medium sized dataset?

If I understood correctly, the data that you sync is readonly and coming from the server.

Overall I expect the data to be mostly static (most of it should rarely change), but it is not readonly. It is the data that the user has generated while working with the app: creating new shelves, decks, cards, learning their cards.

With regards to managing links between different collections, it really depends how often you would create/delete/update the records behind the different links.

I expect the most frequent operations to be updating cards learning statistics (each card has a stage and due date_time) which change during a learning session, and creating new cards.

I would expect a typical user to learn from 10 to 100 cards in a typical day, and to create from 0 to maybe 50 new cards per day.

From my experience, beyond a couple of thousands records that change often, synchronization can be painful.

What do you mean by 'painful'? Slow?

alexcottner added the stale For marking issues as stale. Labeled issues will be closed soon if label is not removed. label Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggested data modelling approach #1529

Suggested data modelling approach #1529

alexvinidiktov commented Jun 28, 2021

dstaley commented Jul 2, 2021

leplatrem commented Jul 6, 2021

alexvinidiktov commented Jul 8, 2021 •

edited

Loading

Suggested data modelling approach #1529

Suggested data modelling approach #1529

Comments

alexvinidiktov commented Jun 28, 2021

dstaley commented Jul 2, 2021

leplatrem commented Jul 6, 2021

alexvinidiktov commented Jul 8, 2021 • edited Loading

alexvinidiktov commented Jul 8, 2021 •

edited

Loading