Skip to content
This repository has been archived by the owner on Sep 9, 2021. It is now read-only.

disinguish between key and value queries #38

Open
Gozala opened this issue May 21, 2020 · 8 comments
Open

disinguish between key and value queries #38

Gozala opened this issue May 21, 2020 · 8 comments
Labels
need/triage Needs initial labeling and prioritization

Comments

@Gozala
Copy link
Contributor

Gozala commented May 21, 2020

In the process of adding some types into js-ipfs (ipfs/js-ipfs#2945) I have discovered what appeared to be an error either in declared interface (in readme here or more specifically https://github.com/ipfs/js-ipfs-repo/blob/77a0cdc944ce7fc47f520ab38705259a28cea1dc/src/blockstore.js#L25-L34)
Docs say query returns docs say query returns AsyncIterator<Block> while in practice it appeared to return AsyncIterator<{key:Key, ...?}>

As that is how code in js-ipfs used it:
https://github.com/ipfs/js-ipfs/blob/80c6fdf1c3ad6aaae14eb4297b7b183fd4fa1311/packages/ipfs/src/core/components/refs/local.js#L9

After digging around this repo it appears that:

  1. If you do not pass {keysOnly:true} it returns AsyncIterable<{key:Key, value:Buffer}>, meaning all of the documentation is wrong.
  2. If you do pass {keysOnly:true} it returns AsyncIterable<{key:Key}> that non of the documentation seems to cover.

I would like to propose:

  1. Make a DataStore interface generic as in DataStore<Key, Value, Entry> this would make
    • Raw data store DataStore<Key, Buffer, [Key, Buffer]> (entry follows Object.entries() idiom of representing key values as tuples)
  • Block data store DataStore<CID, Block, Block> as blocks represent both values and entries.
  1. Update documentation so that query reutrns AsyncIterable<Entry> instead.
    • In raw data store that is AsyncIterable<[Key, Buffer]>
    • In block data store that is AsyncIterable<Block>
  2. Consider alternative to keysOnly options. Because not only it complicates API and makes it hard to document but it's also easy to miss if query options are passed into making it unclear what result would be.
    • Option 1: Just create another queryKeys(...):AsyncIterable<Key> method. This is my preference because in some cases it can be better optimized (e.g. when queries occur across process or thread boundaries).
    • Option 2: If adding a method seems like no-go, how about just remove keysOnly in favor of destructuring in JS for await ([key] of store.query({...}) {...}.
@Gozala Gozala added the need/triage Needs initial labeling and prioritization label May 21, 2020
@welcome
Copy link

welcome bot commented May 21, 2020

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

@Stebalien
Copy link
Member

Blocks live at a higher layer of abstraction than the datastore, at least in go-ipfs. The datastore is a simple key/value store.

@Gozala
Copy link
Contributor Author

Gozala commented May 21, 2020

Blocks live at a higher layer of abstraction than the datastore, at least in go-ipfs. The datastore is a simple key/value store.

I believe that is also the case in js-ipfs, however interface is similar in spirit & documentation, but different in practice & implementation.

I think having a high level generic interface seems like reasonable way to simplify mental overhead and also avoid some of the pitfalls I've mentioned in regards to {keysOnly: true}, however it is hard for me to asses how would that align with go-ipfs.

@Stebalien do you have opinions or feedback on the proposed changes ?

@Stebalien
Copy link
Member

I'm having trouble understanding it, could you take a quick pass over it?

Is this two proposals (a separate blockstore/datastore?)? Or are you proposing a typed datastore?

@Gozala
Copy link
Contributor Author

Gozala commented May 25, 2020

@Stebalien reason I opened this ticket is:

  1. Inline documentation of the blockstore API is incorrect. As in it does not match actual behavior.
  2. As I was looking into No 1, I've discovered that documentation of query API in this repo is also incorrect (it omits keysOnly option and implication it has on return type)
  3. Above discovery lead to a specific proposal which:
    • Attempts to disambiguate between key and value queries.
    • Generalized DataStore interface over Key, Value, Entry parameters making most (if not all) data-stores in JS-IPFS being specialization over that generic type.

You have made a point that

Blocks live at a higher layer of abstraction than the datastore, at least in go-ipfs. The datastore is a simple key/value store.

Which I interpreted (I suspect incorrectly) as an argument against No 3, specifically that blockstore and datastore have different interface because they are different layers of abstraction. My response to that was mostly that I think generic data-store interface would still make sense.

@Stebalien
Copy link
Member

Which I interpreted (I suspect incorrectly) as an argument against No 3, specifically that blockstore and datastore have different interface because they are different layers of abstraction. My response to that was mostly that I think generic data-store interface would still make sense.

You correctly interpreted my comment. Maybe it's fine in JavaScript? I know that the type system in go is weak enough that abstracting over arbitrary types became a problem for us. We:

  1. Were constantly casting between types (not an issue in JS).
  2. Only really supported "bytes" because we couldn't serialize anything else. This was the main problem. In theory, we supported arbitrary types. However, in practice, all of our concrete persistent datastore implementations only supported bytes.

I expect you'll run into issue 2 in JS as well, unless I'm missing something. I guess with typescript you may be able to enforce reasonable constraints such that datastore adapters operate over arbitrary types while concrete backends operate over specific datatypes?

@achingbrain
Copy link
Member

The blockstore doesn't implement the interface-datastore contract, though it does look similar.

If you like you can think of the blockstore as CIDs in, Blocks out and the datastore it wraps is Keys in, Buffers out.

I'm not sure the .query method on the blockstore is commonly used TBH, we might even be able to remove it.

@Gozala
Copy link
Contributor Author

Gozala commented May 26, 2020

The blockstore doesn't implement the interface-datastore contract, though it does look similar.

If you like you can think of the blockstore as CIDs in, Blocks out and the datastore it wraps is Keys in, Buffers out.

I understand that. It is somewhat separate discussion, which unfortunately I end up mixing up here. I have created #39 to discuss datastore interface and it's relation to blockstore there.

I'm not sure the .query method on the blockstore is commonly used TBH, we might even be able to remove it.

query API was intended subject of this thread. Specifically:

  1. Documentation omits keysOnly option which changes behavior in profound ways.
  2. Which lead me to propose alternative to keysOnly options in original comment.

If removing query all together is better option, that sounds good to. Either way it would be great to do something about query API because in my experience reading code using query can be very confusing and surprising. I encounter handful of times where options were passed form caller and while I was assuming results would be values they were keys.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
need/triage Needs initial labeling and prioritization
Projects
None yet
Development

No branches or pull requests

3 participants