No descriptor ids in spk txout index #1463

LLFourn · 2024-06-06T03:26:35Z

This reverts part of the changes in #1203. There the SpkTxOutIndex<(K,u32)> was changed to SpkTxOutIndex<(DescriptorId, u32>). This led to a complicated translation logic in KeychainTxOutIndex (where the API is based on K) to transform calls to it to calls to the underlying SpkTxOutIndex (which now indexes by DescriptorId). The translation layer was broken when it came to translating range queries from the KeychainTxOutIndex. My solution was just to revert this part of the change and remove the need for a translation layer (almost) altogether. A thin translation layer remains to ensure that un-revealed spks are filtered out before being returned from the KeychainTxOutIndex methods.

I feel like this PR could be extended to include a bunch of ergonomics improvements that are easier to implement now. But I think that's the point of #1451 so I held off and should probably go and scope creep that one instead.

Checklists

All Submissions:

I've signed all my commits
I followed the contribution guidelines
I ran cargo fmt and cargo clippy before committing

Bugfixes:

This pull request breaks the existing API
I've added tests to reproduce the issue which are now passing
I'm linking the issue being fixed by this PR

notmandatory

LGTM but I have a few suggestions before ACKing.

clippy.toml

crates/chain/src/keychain/txout_index.rs

evanlinjin

Thanks for working on this. The range-based method bugs definitely need to be fixed. However, I'm not sure if a total-revamp of the behavior is the correct way forward?

I'm okay with disallowing reassignment of a keychain's descriptor. However, I still think assigning the same descriptor to multiple keychains is important (for consistency). Let me explain.

Currently, our API allows adding overlapping descriptors. I.e. a non-wildcard descriptor which derives a spk that exists in a wildcard descriptor. There is just no good way to detect for this. With the changes presented in this PR, doing the above (which is impossible to detect) will cause undefined behavior, whereas the previous implementation handles this perfectly.

Therefore, for consistency, I think we should just enable both and scenarios and handle things consistently (as we are doing before).

crates/chain/src/keychain/txout_index.rs

evanlinjin · 2024-06-08T06:56:36Z

crates/chain/src/keychain/txout_index.rs

+        // The reason we need a tricky algorithm is because of the "lookahead" feature which means
+        // that some of the spks in the SpkTxoutIndex will not have been revealed yet. So we need to
+        // filter out those spks that are above the last_revealed for that keychain. To do this we
+        // iterate through the last_revealed for each keychain and the spks for each keychain in
+        // tandem. This minimizes BTreeMap queries.
+        core::iter::from_fn(move || loop {
+            let ((keychain, index), spk) = iter_spks.next()?;
+            // We need to find the last revealed that matches the current spk we are considering so
+            // we skip ahead.
+            while current_keychain?.0 < keychain {
+                current_keychain = iter_last_revealed.next();
+            }
+            let (current_keychain, last_revealed) = current_keychain?;
+
+            if current_keychain == keychain && Some(*index) <= last_revealed {
+                break Some(((keychain.clone(), *index), spk.as_script()));
+            }
+        })


I'm trying to understand the performance implications of this (since this complex-ish logic seems to be here for performance reasons, otherwise a .filter can work too).

I.e. let's say the caller is using a large lookahead (1000+) with a large amount of keychains. Would it be more performant to loop through all lookahead spks (except for the last keychain) - as we are doing here. Or, would it be more performant to do multiple .range calls on inner.all_spks() per keychain?

My argument is that performance is more of an issue for usecases with larger keychains-counts/spk-counts/lookahead-values.

Indeed this is written so it hits the optimal time complexity in the number of keychains.

Let $k$ is the number of keychains total, $s$ is the average number of spks per keychain and $h$ is the lookahead value.

Let's consider the worst case performance when the query matches all keychains and $h = 0$. For the code above, there are two map queries which take negligible time and then we iterate through all the spks of every matched keychain so the time is dominated by the $sk$ spks you iterate through.

For the previous implementation where you do two lookups you have to do $sk$ iterations but you have another term that could take more time: $k\log(sk)$ time to do the btree map lookups which domainates when $\log(sk) > s$. The bottom line is that keeping everything other than $k$ constant the complexity is of PR impl is $O(k)$ while the previous impl was $O(k\log(k))$ so it's a significant improvement in theory.

Setting $h > 0$ it becomes $k(s+h)$ (this PR) vs $k\log(k(s+h))$ (prev). In the case that $h$ dominates $s$ which is maximally advantages the former implementation you get $kh$ vs $k(\log(k) + \log(h))$. Therefore the prev impl would always be faster where the lookahead dominates the actual number of spks. To improve performance for applications where $h >> s$ I'd say we should simply add a revleaed spks cache BTreeMap.

evanlinjin · 2024-06-08T07:43:46Z

However, I really like the API changes done in this PR. I.e.

Returning a Vec of revealed spks (instead of iterator)
Introducing KeychainIndexed and Indexed type parameters (however, can we think of a better name?)

ValuedMammal · 2024-06-08T18:06:00Z

There is just no good way to detect for this.

The way to detect it is to check the value returned by self.inner.insert_spk() instead of ignoring it.

crates/chain/src/keychain/txout_index.rs

evanlinjin · 2024-06-10T05:29:29Z

There is just no good way to detect for this.

The way to detect it is to check the value returned by self.inner.insert_spk() instead of ignoring it.

Yes, but that means we need to error when that happens (since it's non-viable to iterate through all of the wildcard-descriptor's spks). This means all methods that may need to derive more spks have to return an error. I don't think this is a reasonable API.

crates/chain/src/keychain.rs

LLFourn · 2024-06-11T04:30:56Z

With the changes presented in this PR, doing the above (which is impossible to detect) will cause undefined behavior

The two claims made here are wrong in my view:

impossible to detect: Figuring out if two descriptors can produce the same spk is not a hard problem of computer science. I bet you can come up with a pretty good algorithm and implementation for this in less than a day. You can do a decent job of it by just by replacing the definite derivation indices with wildcards of any descriptor your adding and using that as a key in a lookup. I really don't think you should do this -- it's not worth your time because checking whether they're equal and documenting that the thing should not be used that way is good enough. This is just the place to solve the problem. Making the internals here complicated is certainly not.
undefined behavior: The first first keychain to derive that descriptor will be the owner of it. To me this seems like a simple, good and well defined behavior. Perhaps you mean it hasn't been defined in terms of writing a test. Feel free to do so.

update: I improved docs around this in d2629cb also mentioning that it's up to the user not to insert overlapping descriptors.

evanlinjin · 2024-06-12T02:56:34Z

I've been thinking about KeychainTxOutIndex overnight, and the direction that this PR takes us is growing on me.

The keychain K is useful for the lifetime of KeychainTxOutIndex so that we can do range queries using K or (K, u32) for spks & indexed outputs.

For persisting, it's a more accurate representation to persist the last_revealed_index with the corresponding descriptor (instead of the keychain K). It's also handy if the user wishes to reconstruct the KeychainTxOutIndex later with a different set of keychains.

In addition, I was in discussion with @LLFourn yesterday. He suggested that we should consider NOT persisting descriptors via the keychain::ChangeSet since some users do not care about persisting the descriptor here.

evanlinjin · 2024-06-12T13:28:18Z

Steps after this PR (not in any specific order):

Rename generic K (keychain) to L (label). Label represents how we want to "label" the keychains for the lifetime of KeychainTxOutIndex.
Introduce trait Keychain that produces KeychainId (which replaces DescriptorId) and a method that produces spks. A keychain always produces the same spks.
KeychainId is created from a hash of the concatenation of 0th derived spk and a flag for wildcard/non-wildcard. This makes the KeychainId better for uniquely-identifying a keychain (than the hash of the miniscript descriptor).

evanlinjin · 2024-06-12T13:42:36Z

I'm tempted to remove all instances where we prematurely collect, but can leave it for another PR.

evanlinjin

self-ACK 3070b2d

ValuedMammal

ACK 3070b2d

plus an additional comment about reveal_to_target

crates/chain/src/keychain/txout_index.rs

notmandatory

tACK 3070b2d

I also manually tested the examples with signet descriptors and got back no errors and expected balance after scan and sync.

The underlying SpkTxOutIndex should not use DescriptorIds to index because this loses the ordering relationship of the spks so queries on subranges of keychains work. Along with that we enforce that there is a strict 1-to-1 relationship between descriptors and keychains. Violating this leads to an error in insert_descriptor now. In general I try to make the translation layer between the SpkTxOutIndex and the KeychainTxOutIndex thinner. Ergonomics of this will be improved in next commit. The test from the previous commit passes.

The previous commit b9c5b9d added IndexSpk. This goes further and adds `Indexed` and `KeychainIndexed` type alises (IndexSpk is Indexed<ScriptBuf>) and attempts to standardize the structure of return types more generally.

Co-authored-by: ValuedMammal <[email protected]>

see: https://github.com/bitcoindevkit/bdk/pull/1463/files/4eb1e288a9362803b034590e0c56c8cf9cf8b0c2#r1630943639

Instead of `BTreeMap` which is less performant.

evanlinjin

ACK 8dd1744

notmandatory assigned LLFourn Jun 6, 2024

notmandatory added the module-blockchain label Jun 6, 2024

notmandatory added this to the 1.0.0-alpha milestone Jun 6, 2024

LLFourn force-pushed the no-descriptor-ids-in-spk-txout-index branch 2 times, most recently from a652892 to 165f06b Compare June 6, 2024 04:59

notmandatory requested changes Jun 6, 2024

View reviewed changes

evanlinjin reviewed Jun 7, 2024

View reviewed changes

crates/chain/src/keychain/txout_index.rs Outdated Show resolved Hide resolved

evanlinjin reviewed Jun 8, 2024

View reviewed changes

ValuedMammal reviewed Jun 9, 2024

View reviewed changes

crates/chain/src/keychain/txout_index.rs Show resolved Hide resolved

ValuedMammal reviewed Jun 9, 2024

View reviewed changes

crates/chain/src/keychain/txout_index.rs Outdated Show resolved Hide resolved

ValuedMammal reviewed Jun 11, 2024

View reviewed changes

crates/chain/src/keychain.rs Outdated Show resolved Hide resolved

LLFourn force-pushed the no-descriptor-ids-in-spk-txout-index branch from 403620e to d2629cb Compare June 11, 2024 04:31

evanlinjin force-pushed the no-descriptor-ids-in-spk-txout-index branch from 1b6fe8c to f456a04 Compare June 12, 2024 10:59

evanlinjin approved these changes Jun 12, 2024

View reviewed changes

evanlinjin mentioned this pull request Jun 12, 2024

Various improvements not included in #1203 (new) #1451

Closed

8 tasks

ValuedMammal reviewed Jun 12, 2024

View reviewed changes

crates/chain/src/keychain/txout_index.rs Show resolved Hide resolved

notmandatory approved these changes Jun 13, 2024

View reviewed changes

LLFourn and others added 6 commits June 13, 2024 22:52

Write failing test for keychain range querying

3b2ff0c

chore(chain): misc docs and insert_descriptor fixes

4d2442c

chore(chain): add type IndexSpk, fix clippy type complexity warning

bce070b

chore(chain): Standardise KeychainTxOutIndex return types

101a09a

The previous commit b9c5b9d added IndexSpk. This goes further and adds `Indexed` and `KeychainIndexed` type alises (IndexSpk is Indexed<ScriptBuf>) and attempts to standardize the structure of return types more generally.

chore(chain): Improve documentation of keychain::ChangeSet

b8ba5a0

LLFourn and others added 7 commits June 13, 2024 22:52

chore(chain): Fix Indexed and KeychainIndexed documentaion

5a584d0

Co-authored-by: ValuedMammal <[email protected]>

refactor(chain): improve replenish lookeahd internals

69f2a69

see: https://github.com/bitcoindevkit/bdk/pull/1463/files/4eb1e288a9362803b034590e0c56c8cf9cf8b0c2#r1630943639

chore(chain): document insert_descriptor invariants better

8779afd

chore(chain): s/replenish_lookahead/replenish_inner_index/

4d3846a

refactor(chain): KeychainTxOutIndex use HashMap for fields

c77e12b

Instead of `BTreeMap` which is less performant.

docs(chain): fix docs

5a02f40

refactor(chain): change field names to be more sane

639d735

evanlinjin force-pushed the no-descriptor-ids-in-spk-txout-index branch from 3070b2d to 0017037 Compare June 13, 2024 14:57

evanlinjin approved these changes Jun 13, 2024

View reviewed changes

refactor(chain): compute txid once for KeychainTxOutIndex::index_tx

8dd1744

evanlinjin force-pushed the no-descriptor-ids-in-spk-txout-index branch from 0017037 to 8dd1744 Compare June 13, 2024 15:04

evanlinjin merged commit 1c593a3 into bitcoindevkit:master Jun 13, 2024
12 checks passed

This was referenced Jun 13, 2024

txout_index, Descriptor not needed in keychains_to_descriptors #1439

Closed

Release 1.0.0-alpha.13 #1471

Closed

LLFourn deleted the no-descriptor-ids-in-spk-txout-index branch June 27, 2024 02:44

LLFourn restored the no-descriptor-ids-in-spk-txout-index branch June 27, 2024 02:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No descriptor ids in spk txout index #1463

No descriptor ids in spk txout index #1463

LLFourn commented Jun 6, 2024 •

edited by notmandatory

Loading

notmandatory left a comment

evanlinjin left a comment

evanlinjin Jun 8, 2024

LLFourn Jun 11, 2024

evanlinjin commented Jun 8, 2024

ValuedMammal commented Jun 8, 2024

evanlinjin commented Jun 10, 2024

LLFourn commented Jun 11, 2024 •

edited

Loading

evanlinjin commented Jun 12, 2024 •

edited

Loading

evanlinjin commented Jun 12, 2024

evanlinjin commented Jun 12, 2024

evanlinjin left a comment

ValuedMammal left a comment

notmandatory left a comment

evanlinjin left a comment •

edited

Loading

No descriptor ids in spk txout index #1463

No descriptor ids in spk txout index #1463

Conversation

LLFourn commented Jun 6, 2024 • edited by notmandatory Loading

Checklists

All Submissions:

Bugfixes:

notmandatory left a comment

Choose a reason for hiding this comment

evanlinjin left a comment

Choose a reason for hiding this comment

evanlinjin Jun 8, 2024

Choose a reason for hiding this comment

LLFourn Jun 11, 2024

Choose a reason for hiding this comment

evanlinjin commented Jun 8, 2024

ValuedMammal commented Jun 8, 2024

evanlinjin commented Jun 10, 2024

LLFourn commented Jun 11, 2024 • edited Loading

evanlinjin commented Jun 12, 2024 • edited Loading

evanlinjin commented Jun 12, 2024

evanlinjin commented Jun 12, 2024

evanlinjin left a comment

Choose a reason for hiding this comment

ValuedMammal left a comment

Choose a reason for hiding this comment

notmandatory left a comment

Choose a reason for hiding this comment

evanlinjin left a comment • edited Loading

Choose a reason for hiding this comment

LLFourn commented Jun 6, 2024 •

edited by notmandatory

Loading

LLFourn commented Jun 11, 2024 •

edited

Loading

evanlinjin commented Jun 12, 2024 •

edited

Loading

evanlinjin left a comment •

edited

Loading