Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No descriptor ids in spk txout index #1463

Merged

Conversation

LLFourn
Copy link
Contributor

@LLFourn LLFourn commented Jun 6, 2024

Fixes #1459

This reverts part of the changes in #1203. There the SpkTxOutIndex<(K,u32)> was changed to SpkTxOutIndex<(DescriptorId, u32>). This led to a complicated translation logic in KeychainTxOutIndex (where the API is based on K) to transform calls to it to calls to the underlying SpkTxOutIndex (which now indexes by DescriptorId). The translation layer was broken when it came to translating range queries from the KeychainTxOutIndex. My solution was just to revert this part of the change and remove the need for a translation layer (almost) altogether. A thin translation layer remains to ensure that un-revealed spks are filtered out before being returned from the KeychainTxOutIndex methods.

I feel like this PR could be extended to include a bunch of ergonomics improvements that are easier to implement now. But I think that's the point of #1451 so I held off and should probably go and scope creep that one instead.

Checklists

All Submissions:

  • I've signed all my commits
  • I followed the contribution guidelines
  • I ran cargo fmt and cargo clippy before committing

Bugfixes:

  • This pull request breaks the existing API
  • I've added tests to reproduce the issue which are now passing
  • I'm linking the issue being fixed by this PR

@notmandatory notmandatory added this to the 1.0.0-alpha milestone Jun 6, 2024
@LLFourn LLFourn force-pushed the no-descriptor-ids-in-spk-txout-index branch 2 times, most recently from a652892 to 165f06b Compare June 6, 2024 04:59
Copy link
Member

@notmandatory notmandatory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I have a few suggestions before ACKing.

clippy.toml Outdated Show resolved Hide resolved
crates/chain/src/keychain/txout_index.rs Outdated Show resolved Hide resolved
crates/chain/src/keychain/txout_index.rs Outdated Show resolved Hide resolved
crates/chain/src/keychain/txout_index.rs Outdated Show resolved Hide resolved
crates/chain/src/keychain/txout_index.rs Outdated Show resolved Hide resolved
Copy link
Member

@evanlinjin evanlinjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. The range-based method bugs definitely need to be fixed. However, I'm not sure if a total-revamp of the behavior is the correct way forward?

I'm okay with disallowing reassignment of a keychain's descriptor. However, I still think assigning the same descriptor to multiple keychains is important (for consistency). Let me explain.

Currently, our API allows adding overlapping descriptors. I.e. a non-wildcard descriptor which derives a spk that exists in a wildcard descriptor. There is just no good way to detect for this. With the changes presented in this PR, doing the above (which is impossible to detect) will cause undefined behavior, whereas the previous implementation handles this perfectly.

Therefore, for consistency, I think we should just enable both and scenarios and handle things consistently (as we are doing before).

crates/chain/src/keychain/txout_index.rs Show resolved Hide resolved
crates/chain/src/keychain/txout_index.rs Show resolved Hide resolved
crates/chain/src/keychain/txout_index.rs Outdated Show resolved Hide resolved
Comment on lines +524 to +519
// The reason we need a tricky algorithm is because of the "lookahead" feature which means
// that some of the spks in the SpkTxoutIndex will not have been revealed yet. So we need to
// filter out those spks that are above the last_revealed for that keychain. To do this we
// iterate through the last_revealed for each keychain and the spks for each keychain in
// tandem. This minimizes BTreeMap queries.
core::iter::from_fn(move || loop {
let ((keychain, index), spk) = iter_spks.next()?;
// We need to find the last revealed that matches the current spk we are considering so
// we skip ahead.
while current_keychain?.0 < keychain {
current_keychain = iter_last_revealed.next();
}
let (current_keychain, last_revealed) = current_keychain?;

if current_keychain == keychain && Some(*index) <= last_revealed {
break Some(((keychain.clone(), *index), spk.as_script()));
}
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand the performance implications of this (since this complex-ish logic seems to be here for performance reasons, otherwise a .filter can work too).

I.e. let's say the caller is using a large lookahead (1000+) with a large amount of keychains. Would it be more performant to loop through all lookahead spks (except for the last keychain) - as we are doing here. Or, would it be more performant to do multiple .range calls on inner.all_spks() per keychain?

My argument is that performance is more of an issue for usecases with larger keychains-counts/spk-counts/lookahead-values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed this is written so it hits the optimal time complexity in the number of keychains.

Let $k$ is the number of keychains total, $s$ is the average number of spks per keychain and $h$ is the lookahead value.

Let's consider the worst case performance when the query matches all keychains and $h = 0$. For the code above, there are two map queries which take negligible time and then we iterate through all the spks of every matched keychain so the time is dominated by the $sk$ spks you iterate through.

For the previous implementation where you do two lookups you have to do $sk$ iterations but you have another term that could take more time: $k\log(sk)$ time to do the btree map lookups which domainates when $\log(sk) &gt; s$. The bottom line is that keeping everything other than $k$ constant the complexity is of PR impl is $O(k)$ while the previous impl was $O(k\log(k))$ so it's a significant improvement in theory.

Setting $h &gt; 0$ it becomes $k(s+h)$ (this PR) vs $k\log(k(s+h))$ (prev). In the case that $h$ dominates $s$ which is maximally advantages the former implementation you get $kh$ vs $k(\log(k) + \log(h))$. Therefore the prev impl would always be faster where the lookahead dominates the actual number of spks. To improve performance for applications where $h &gt;&gt; s$ I'd say we should simply add a revleaed spks cache BTreeMap.

@evanlinjin
Copy link
Member

However, I really like the API changes done in this PR. I.e.

  • Returning a Vec of revealed spks (instead of iterator)
  • Introducing KeychainIndexed and Indexed type parameters (however, can we think of a better name?)

@ValuedMammal
Copy link
Contributor

There is just no good way to detect for this.

The way to detect it is to check the value returned by self.inner.insert_spk() instead of ignoring it.

@evanlinjin
Copy link
Member

There is just no good way to detect for this.

The way to detect it is to check the value returned by self.inner.insert_spk() instead of ignoring it.

Yes, but that means we need to error when that happens (since it's non-viable to iterate through all of the wildcard-descriptor's spks). This means all methods that may need to derive more spks have to return an error. I don't think this is a reasonable API.

@LLFourn
Copy link
Contributor Author

LLFourn commented Jun 11, 2024

With the changes presented in this PR, doing the above (which is impossible to detect) will cause undefined behavior

The two claims made here are wrong in my view:

  1. impossible to detect: Figuring out if two descriptors can produce the same spk is not a hard problem of computer science. I bet you can come up with a pretty good algorithm and implementation for this in less than a day. You can do a decent job of it by just by replacing the definite derivation indices with wildcards of any descriptor your adding and using that as a key in a lookup. I really don't think you should do this -- it's not worth your time because checking whether they're equal and documenting that the thing should not be used that way is good enough. This is just the place to solve the problem. Making the internals here complicated is certainly not.
  2. undefined behavior: The first first keychain to derive that descriptor will be the owner of it. To me this seems like a simple, good and well defined behavior. Perhaps you mean it hasn't been defined in terms of writing a test. Feel free to do so.

update: I improved docs around this in d2629cb also mentioning that it's up to the user not to insert overlapping descriptors.

@LLFourn LLFourn force-pushed the no-descriptor-ids-in-spk-txout-index branch from 403620e to d2629cb Compare June 11, 2024 04:31
@evanlinjin
Copy link
Member

evanlinjin commented Jun 12, 2024

I've been thinking about KeychainTxOutIndex overnight, and the direction that this PR takes us is growing on me.

The keychain K is useful for the lifetime of KeychainTxOutIndex so that we can do range queries using K or (K, u32) for spks & indexed outputs.

For persisting, it's a more accurate representation to persist the last_revealed_index with the corresponding descriptor (instead of the keychain K). It's also handy if the user wishes to reconstruct the KeychainTxOutIndex later with a different set of keychains.

In addition, I was in discussion with @LLFourn yesterday. He suggested that we should consider NOT persisting descriptors via the keychain::ChangeSet since some users do not care about persisting the descriptor here.

@evanlinjin evanlinjin force-pushed the no-descriptor-ids-in-spk-txout-index branch from 1b6fe8c to f456a04 Compare June 12, 2024 10:59
@evanlinjin
Copy link
Member

Steps after this PR (not in any specific order):

  • Rename generic K (keychain) to L (label). Label represents how we want to "label" the keychains for the lifetime of KeychainTxOutIndex.
  • Introduce trait Keychain that produces KeychainId (which replaces DescriptorId) and a method that produces spks. A keychain always produces the same spks.
  • KeychainId is created from a hash of the concatenation of 0th derived spk and a flag for wildcard/non-wildcard. This makes the KeychainId better for uniquely-identifying a keychain (than the hash of the miniscript descriptor).

@evanlinjin
Copy link
Member

I'm tempted to remove all instances where we prematurely collect, but can leave it for another PR.

Copy link
Member

@evanlinjin evanlinjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self-ACK 3070b2d

Copy link
Contributor

@ValuedMammal ValuedMammal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 3070b2d

plus an additional comment about reveal_to_target

crates/chain/src/keychain/txout_index.rs Show resolved Hide resolved
Copy link
Member

@notmandatory notmandatory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK 3070b2d

I also manually tested the examples with signet descriptors and got back no errors and expected balance after scan and sync.

LLFourn and others added 6 commits June 13, 2024 22:52
The underlying SpkTxOutIndex should not use DescriptorIds to index
because this loses the ordering relationship of the spks so queries on
subranges of keychains work.

Along with that we enforce that there is a strict 1-to-1 relationship
between descriptors and keychains. Violating this leads to an error in
insert_descriptor now.

In general I try to make the translation layer between the SpkTxOutIndex
and the KeychainTxOutIndex thinner. Ergonomics of this will be improved
in next commit.

The test from the previous commit passes.
The previous commit b9c5b9d added
IndexSpk. This goes further and adds `Indexed` and `KeychainIndexed`
type alises (IndexSpk is Indexed<ScriptBuf>) and attempts to standardize
the structure of return types more generally.
@evanlinjin evanlinjin force-pushed the no-descriptor-ids-in-spk-txout-index branch from 3070b2d to 0017037 Compare June 13, 2024 14:57
Copy link
Member

@evanlinjin evanlinjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 8dd1744

@evanlinjin evanlinjin force-pushed the no-descriptor-ids-in-spk-txout-index branch from 0017037 to 8dd1744 Compare June 13, 2024 15:04
@evanlinjin evanlinjin merged commit 1c593a3 into bitcoindevkit:master Jun 13, 2024
12 checks passed
@LLFourn LLFourn deleted the no-descriptor-ids-in-spk-txout-index branch June 27, 2024 02:44
@LLFourn LLFourn restored the no-descriptor-ids-in-spk-txout-index branch June 27, 2024 02:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

KeychainTxOutIndex range-based methods are broken
4 participants