Skip to content
This repository has been archived by the owner on May 3, 2022. It is now read-only.

Restrict allowed key types #39

Open
pwnall opened this issue Oct 17, 2018 · 10 comments
Open

Restrict allowed key types #39

pwnall opened this issue Oct 17, 2018 · 10 comments

Comments

@pwnall
Copy link

pwnall commented Oct 17, 2018

IndexedDB supports a fairly large set of key types, and comparing between them isn't very intuitive. How about supporting an explicit subset?

Proposal:

  • number
  • string
  • (maybe) exactly one type of typed array; Uint8Array seems the most intuitive

The ordering has to remain the same as in IndexedDB, if we want the option of async iterators over keys / values.

Dates can be serialized to numbers when passed in -- I think this would be consistent with WebIDL.

The main missing element here is Array objects. This does preclude some nice use cases, but I claim those are more advanced. If I'm wrong, we can always expand the set of supported types, whereas it's harder / impossible to narrow it.

@domenic
Copy link
Collaborator

domenic commented Oct 17, 2018

Previous discussion in #2. To be clear, you are only talking about restricting keys, not values?

One thing I'm missing is, what are the advantages of restricting?

@pwnall
Copy link
Author

pwnall commented Oct 17, 2018

Sorry, I have no idea how I missed #2. Would you prefer that I close this and comment there?

I haven't opened an issue for values yet because I'm still thinking through that. I think we can mostly discuss keys and values separately.

I'd like to discuss this for two reasons:

  1. Databases (especially Chrome's backing store) perform better with small (<= thousands of bytes) keys. Arrays invite larger keys.
  2. I'd prefer a simpler conceptual model to a more complicated one, and fewer key / value types help there, IMHO

If we adopt all types right away, I'd prefer that we have good use cases. I can see reasons for number / string keys right away. I can see raw buffers used to squeeze more storage out of the DB.

@pwnall
Copy link
Author

pwnall commented Oct 17, 2018

Having written the above, I'm starting to think that I'd rather that we limit key sizes instead of types. What are your thoughts on the two axes?

@domenic
Copy link
Collaborator

domenic commented Oct 17, 2018

No worries, a new fresh issue seems like a good idea in this case.

I'm not a heavy IDB user, but array keys seem really convenient for "tuple"-keyed storage. I'd love to get more feedback, either from users or from implementers who have worked with IDB users, to either substantiate or refute my intuition here. I agree that we could start out small though and expand upon encountering demand.

Restricting sizes seems pretty reasonable, but is there an easy way to determine key size? Maybe what we want here is a new option for IDB databases which allows them to reject all keys over a certain size? Is key size computation interoperable?

@pwnall
Copy link
Author

pwnall commented Oct 17, 2018

Answering the last question: key serialization is implementation-dependent. To be consistent across browsers, we'd have to define some cost model and use that.

We can have a simple and efficient model as long as we leave Arrays out -- numbers would be in, strings and native arrays would have caps on their lengths / buffer sizes. Once we add Arrays, we'd have to come up with a full cost model, so the cost of an Array would be the sum of its elements + fixed overhead.

This is additional complexity for the implementation and spec. As a high-level API, I think it's worth taking up the extra complexity in order to keep developers on a "happy" path.

@vweevers
Copy link

I'm not sure how relevant this is, but array keys could be useful for compound indexes. For example [author, publisher] when storing books. Then, through key ranges you can answer queries like "give me all books by Roald Dahl", e.g. greater than ["Roald Dahl", ""].

I haven't used IndexedDB exactly like this, because my background is in Level (and level-js which wraps IndexedDB) where things like this are usually handled on a higher level - IndexedDB only sees string or binary keys. That said, compound keys are very powerful conceptually.

I do agree with:

IndexedDB supports a fairly large set of key types, and comparing between them isn't very intuitive

The approach we took in level-js is: you can use any key type you want. If you want to normalize types (e.g. serialize numbers to strings) or avoid the cost of the structured clone algorithm, then we recommend to wrap level-js with encoding-down. Array keys can then be encoded into binary keys with bytewise for example, which has a more intuitive sorting order. This encoding has a performance cost too, which might be an acceptable trade-off for a given app. So TLDR; we leave the choice of key type to the user.

@asutherland
Copy link

asutherland commented Oct 22, 2018

I think any sizing logic would want to go in the HTML standard with the StructuredSerialize algorithm and friends. This could be quite useful in other specs like the Notifiations API which uses the algorithm for Notification.data but does not impose any size limits. As APIs continue to add more spaces to stash user data, it could be useful to bound the data size for implementation sanity and to limit the impacts of broken or malicious sites.

If we did this, perhaps it would make sense to also spec the key size limit for IndexedDB as an immutable limitation from when a database is first opened. This would avoid edge-cases related to direct manipulation of the backing store, and potentially let IndexedDB implementations do some optimizations. Without this, we'd want to spec whether "illegal" key/value pairs are hidden from the async-local-storage API, etc.

Edit: Note that I had made the above comments previously as two separate comments that it looked like Github had eaten due to bad hotel wi-fi. They just appeared when I submitted this comment, and so I've deleted them since their thoughts have been superseded.

@tophf
Copy link

tophf commented Apr 25, 2019

A use case for Uint8Array keys: encoding string keys in UTF8 manually via TextEncoder/TextDecoder to reduce the disk size of the db because at least in Chrome the string keys are always stored in UCS2 (two bytes per character), unlike values which get automatically compressed into one byte ASCII when possible.

@asutherland
Copy link

While it's unfortunately not unusual to optimize to a specific implementation, altering API design due to a single browser's implementation decisions runs counter to the point of web standards and requiring multiple implementations.

@tophf
Copy link

tophf commented Apr 25, 2019

I wasn't suggesting anything, nothing needs to be changed AFAICT, it was just an informational side note confirming the proposed inclusion of Uint8Array.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants