Skip to content
This repository has been archived by the owner on Dec 2, 2024. It is now read-only.

PLT-172: Marconi index transactions that have been issued to a particular script #629

Conversation

eyeinsky
Copy link
Contributor

@eyeinsky eyeinsky commented Jul 27, 2022

This is a draft PR for the jira issue marconi indexer: transactions that have been issued to a particular script. (This PR is also wrongly based off of next-node branch because it can understand the blocks since Vasil hard-fork -- will rebase once I manage to resolve the issues below.)

A concrete code thing I need help with is extracting script hashes from a transaction here: how/where would I find the type(s)?

Another question is that the Jira issue says to set script address as the primary key, but if this is an index from transaction to script, then this is an 1:n relationship because (as far as I understand) a single script will be possibly targeted/ran by many transactions.

I've also read the indexer's code (the hysterical-screams) but can't say it's fully clear yet :), so would be good to see if the onInsert/store/query triple is correct.

Any other comments welcome as well!

(The main commits are 9a2e24e and 79a9d7f, the two previous ones already exist as separate PRs.)


Pre-submit checklist:

  • Branch
    • Tests are provided (if possible)
    • Commit sequence broadly makes sense
    • Key commits have useful messages
    • Formatting, PNG optimization, etc. are updated
  • PR
    • Self-reviewed the diff
    • Useful pull request description
    • Reference the ADR in the PR and reference the PR in the ADR (if revelant)
    • Reviewer requested

@eyeinsky eyeinsky requested a review from raduom July 27, 2022 11:03
@koslambrou
Copy link
Contributor

(This PR is also wrongly based off of next-node branch because it can understand the blocks since Vasil hard-fork -- will rebase once I manage to resolve the issues below.)

I don't think you need to understand blocks from Vasil HF to make this PR work. You can just run the indexer on Cardano mainnet (which there is no Vasil HF), but not on testnet.

instance SQL.ToField ScriptAddress where
toField (ScriptAddress hash) = SQL.SQLBlob . toStrict . serialise $ hash
instance SQL.FromField ScriptAddress where
fromField _ = undefined -- todo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do this before merging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's finding the script hash (linked in the PR description) that I haven't been able to find yet (but working on it!).

open dbPath (Depth k) = do
ix <- fromJust <$> Ix.newBoxed query store onInsert k ((k + 1) * 2) dbPath
let c = ix ^. Ix.handle
SQL.execute_ c "CREATE TABLE IF NOT EXISTS script_transactions (scriptAddress TEXT NOT NULL, txCbor BLOB NOT NULL)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would rather have a primary key index on scriptAddress. Is there a reason why we are not doing that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong here, but since a script can be targeted by many transactions (?), then the scriptAddress won't be unique and so it can't be a primary key. (Though it still can have a index on it though, so it would be faster to search for with sql).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The scriptAddress primary key issue is also mentioned in the PR description.)


store :: ScriptTxIndex -> IO ()
store ix = do
persisted <- Ix.getEvents $ ix ^. Ix.storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug I just found. We only want to save things in the buffer, not in the events.

type Query = ScriptAddress
type Result = [TxCbor]

data ScriptTxUpdate = ScriptTxUpdate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can probably make everything strict since these will eventually be persisted to disk.

map fst $ filter (\(_, addrs) -> scriptAddress' `elem` addrs) update

both :: [TxCbor]
both = buffered <> map (\(SQL.Only txCbor') -> txCbor') persisted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an Iso may look better here, but this also works.

@@ -14,7 +14,7 @@ You need to download the configurations for the node, genesis blocks and the top
I am using a shell script to start the node that I will paste here:

```shell
#!/bin/bash
#!/usr/bin/env bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future if you see a quick fix that is possible create a new PR for it as there are many more people who can review and accept it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, already did #618! (As that one is now merged, then can rebase it now and the commit won't be part of this PR anymore)

@eyeinsky eyeinsky force-pushed the PLT-172-marconi-indexer-transactions-that-have-been-issued-to-a-particular-script branch from 18abf13 to 3139436 Compare July 28, 2022 12:49
@eyeinsky eyeinsky changed the base branch from next-node to main July 28, 2022 15:11
@eyeinsky eyeinsky force-pushed the PLT-172-marconi-indexer-transactions-that-have-been-issued-to-a-particular-script branch 3 times, most recently from 28081a4 to 8209c09 Compare August 1, 2022 16:40
@eyeinsky
Copy link
Contributor Author

eyeinsky commented Aug 2, 2022

@koslambrou @raduom With regarding to adding tests: marconi is currently an executable within the plutus-chain-index package, but to add tests I think it needs to be converted into a library (+ executable)? so that in the tests I could import functions from it.

@eyeinsky
Copy link
Contributor Author

eyeinsky commented Aug 2, 2022

In discussion with @raduom we found that:

  • there isn't anything to property-test specifically in the script transactions indexer and that an integration test would be what is needed
  • there is a PR Transaction state indexer. #524 that moves Marconi to the plutus-chain-index-core package, and which makes the Marconi.* modules a library (so that they could be imported by tests)

Thus, would it make sense to wait for #524 to be merged and then add an integration in a later PR/Jira to test that all works as expected? @koslambrou

@eyeinsky eyeinsky force-pushed the PLT-172-marconi-indexer-transactions-that-have-been-issued-to-a-particular-script branch from 8209c09 to 793e904 Compare August 2, 2022 16:24
@eyeinsky eyeinsky requested a review from koslambrou August 3, 2022 10:09
@eyeinsky
Copy link
Contributor Author

eyeinsky commented Aug 3, 2022

This branch is ready for review, given that:

  • attempted to replace this with this, but it adds dependencies and doesn't actually improve performance much (plutusScriptsFromTxBody isn't doing extra work to also return scritps as map values)
  • tests will come in a future PR

@eyeinsky eyeinsky marked this pull request as ready for review August 3, 2022 10:11
@koslambrou
Copy link
Contributor

@eyeinsky

there isn't anything to property-test specifically in the script transactions indexer and that an integration test would be what is needed

Right, not in the indexer itself. The pure function that would be nice to test is txScripts, because then we can actually use in the integration test itself. You're using plutusScriptsFromTxBody , but I don't think Marconi should be using that. I'll comment more on the code itself.

Thus, would it make sense to wait for #524 to be merged and then add an integration in a later PR/Jira to test that all works as expected? @koslambrou

That's right, you can work on the integration test on a later PR :)

txScripts :: forall era . Tx era -> [ScriptTx.ScriptAddress]
txScripts tx = let
Tx (body :: C.TxBody era) _ws = tx
map' = plutusScriptsFromTxBody body :: M.Map Ledger.ScriptHash Ledger.Script
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't suggest using plutusScriptsFromTxBody. Marconi is an off-chain component, so it should be using types for off-chain use (like cardano-api). Anything with with plutus-ledger-api types are really meant for on-chain use.

I suggest reimplementing it in Marconi (something named like getScriptHashesFromTx) and write tests for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plutusScriptsFromTxBody hashes the script by

  • deserialising it into a plutus-ledger Script (defined in Plutus.V1.Ledger.Scripts)
  • scriptHash serialises it again to bytes and makes a cardano-api
    Script (defined in Cardano.Api.Script; that script wraps the
    serialised bytes), then hashes that script

If the goal is to skip plutus-ledger Script then is it ok if I wrap the received bytes into cardano-api's Script directly? I do this here, specifically within the mkCardanoApiScript which wraps the incoming bytes into a cardano-api Script (the previous way to get this script included going through deserialisation/serialisation steps).

Otherwise, I haven't found a way to go from the bytes I receive to a cardano-api Script without using anything from the ledger.


Regarding tests, since it's a property test, then if I generate arbitrary scripts, serialize them and get a hash, and then put the scripts into a transaction to see if I get the same scripts (= same hashes) out again, then what would it be testing? (they are the same by definition) Or, perhaps there are multiple locations to put those scripts and I'd see if I lose any by not handling a branch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... ideally we would use types from cardano-api, not cardano-ledger. I recommend not pattern matching on the body of the transaction. I recommend pattern matching with pattern TxBody :: TxBodyContent ViewTx era -> TxBody era in Cardano.Api.TxBody.

There are some examples in Ledger.Tx.CardanoAPI.

Regarding property testing, here's a starting point.
You can test the txScripts function as follows. Generate a random number of plutus scripts (see genPlutusScript from Gen.Cardano.Api.Typed), then generate an TxIns which spends funds from these scripts, then generate a transaction with this TxIns (based on genTxBodyContent), run your txScripts on this generated tx, and verify that the initial generated plutus scripts are part of the function's output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the scripts I'm picking out and using are not included in the pattern TxBody: https://github.com/input-output-hk/cardano-node/blob/master/cardano-api/src/Cardano/Api/TxBody.hs#L2070

Question is, if the scripts that I pick out are the right ones? As I found also that inside TxBody the Script type also exist under:

  • TxBodyContent{txOuts} -> TxOut -> ReferenceScript -> ScriptInAnyLang
  • TxBodyContent{txReturnCollateral} -> TxReturnCollateral -> TxOut
  • TxBodyContent{txAuxScripts} -> TxAuxScripts -> ScriptInEra

I would guess the scripts at the inputs (the ones that I'm using already and that come in as serialised) are the important ones, as those are the ones that are run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea is that maybe plutus-streaming library (the one that provides the withChainSyncEventStream that provides the blocks) can convert scripts to the correct type such that only downstream using cardano-api is sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@koslambrou After much confusion managed to boil it down to two questions:

generate an TxIns which spends funds from these scripts

When a TxIn spends funds from a script, does that mean that its TxId field must be set to the script's hash? (TxIn is defined as data TxIn = TxIn TxId TxIx and the only other field in it, TxIx, is a newtype for a Word so it can't possibly refer to a script)

generate a transaction with this TxIns (based on genTxBodyContent),

Does this imply using makeTransactionBody which will create a "ledger transaction" out of the cardano-api's TxBodyContent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@koslambrou Have managed to come up with this property test:
https://github.com/input-output-hk/plutus-apps/blob/edc8da432275c017ceba3a3aea24bbe9cb8e5e4b/plutus-chain-index-core/test/MarconiSpec.hs#L26-L66

There is still a piece missing: what does it mean to "spend a script that I created"? In the above I've tried to generate the same number of TxIns as there are scripts and replace the scripts' hashes into the TxId field of TxIn -- is this the correct way to think about "spending a script"?

@eyeinsky eyeinsky force-pushed the PLT-172-marconi-indexer-transactions-that-have-been-issued-to-a-particular-script branch 3 times, most recently from 2581d7c to 79d644d Compare August 12, 2022 16:21
@eyeinsky eyeinsky force-pushed the PLT-172-marconi-indexer-transactions-that-have-been-issued-to-a-particular-script branch 4 times, most recently from b050fef to edc8da4 Compare August 16, 2022 14:53
@eyeinsky
Copy link
Contributor Author

Pushed another iteration where I create the first tx in which the scripts I've generated are part of the outputs. Then create the second transaction where the inputs are the outputs of the previous transaction, so I could "spend the scripts".

It's still very much WIP as the test can't possibly work as running makeTransactionBody on the second transaction doesn't have access to the generated scripts nor the hashes, because the inputs (TxIn = TxId X TxIx) only refer to the scripts indirectly.

Thus I wonder: is it implied that I need to run a local testnet in this property test too? so that when running the second transaction the TxIns could at least be looked up and the script field thus populated?

@eyeinsky eyeinsky force-pushed the PLT-172-marconi-indexer-transactions-that-have-been-issued-to-a-particular-script branch 5 times, most recently from ac12373 to d253cfb Compare August 23, 2022 16:49
@eyeinsky
Copy link
Contributor Author

Ready for review; I now index all types of scripts and also test them, and the tests pass.


-- * Copy-paste
--
-- | TODO: Remove when the following function is exported from Cardano.Api.Script
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point the link to the PR here.

Copy link
Contributor

@koslambrou koslambrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@eyeinsky eyeinsky force-pushed the PLT-172-marconi-indexer-transactions-that-have-been-issued-to-a-particular-script branch from 7913b1d to 59b2d09 Compare August 29, 2022 10:38
@eyeinsky eyeinsky changed the title Plt 172 marconi indexer transactions that have been issued to a particular script PLT-172: Marconi index transactions that have been issued to a particular script Aug 31, 2022
@eyeinsky eyeinsky force-pushed the PLT-172-marconi-indexer-transactions-that-have-been-issued-to-a-particular-script branch from 59b2d09 to efc9834 Compare August 31, 2022 10:41
@eyeinsky eyeinsky enabled auto-merge (squash) August 31, 2022 10:43
@eyeinsky eyeinsky force-pushed the PLT-172-marconi-indexer-transactions-that-have-been-issued-to-a-particular-script branch 3 times, most recently from f604ffb to ca502a5 Compare August 31, 2022 18:04
@koslambrou koslambrou disabled auto-merge August 31, 2022 23:33
@koslambrou koslambrou enabled auto-merge (squash) August 31, 2022 23:34
@raduom raduom disabled auto-merge September 1, 2022 09:38
Copy link
Contributor

@raduom raduom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there may be some bugs in the indexer.

where
txScripts' = map (\tx -> (TxCbor $ C.serialiseToCBOR tx, getTxScripts tx)) txs

getTxBodyScripts :: forall era . C.TxBody era -> [ScriptAddress]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may be useful to someone else. Should we not move this to something like CardanoAPI?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! For now, I'd say we keep it here until we decide how to structure cardano-api related code. For example, might make sense to create a decidated cardano-api-extended package which contains a bunch of these functions which can then be pushed upstream when stabilized and tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could be done in another PR?

But I guess yes, if the function would just return Shelley.ScriptHash (ScriptAddress is a newtype around it) then I guess Cardano.Api.Shelley could be a good location for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A related thought: if we will get a hoogle, then writing this function could be done in two steps: one would be getTxBodyScriptHashes :: forall era . C.TxBody era -> [ScriptHash] and the other one would be the function we currently have, which just wraps ScriptHash into a newtype. This way hooglers can find the function, and if useful enough, can move them into a more suitable module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leave it for another PR.


store :: ScriptTxIndex -> IO ()
store ix = do
persisted <- Ix.getEvents $ ix ^. Ix.storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not store the events, only what we have in the buffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now only store the buffered events here 83ac994 and think there was the same bug in the Utxo indexer as well, does it look ok now there as well?

"SELECT txCbor FROM utxos WHERE scriptAddress = ?" (SQL.Only scriptAddress')

let
buffered :: [TxCbor]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When querying we also need to account for buffered events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updates parameter only contains the events though. So basically you need to do what you did before for store. The way this works right now is that we keep K events in memory plus some buffered events so we can batch updates to the database (which makes inserting rows much more efficient than if we would do it one by one).

So when we store things on disk, we only store the things that are buffered and leave the K blocks in memory (since they can still be rolled back); but when we query we want to account for all events, including the ones that are currently being buffered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, fixes here 1886e12, 800d8f9

let hashesFound = map coerce $ ScriptTx.getTxBodyScripts txBody :: [ScriptHash]
assert $ S.fromList scriptHashes == S.fromList hashesFound

genTxBodyWithTxIns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the node export generators for these data structures already?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly. They are very limited and would really need to be reworked on. Again, it would make sense to put these generators in a common package so that they can be pushed upstream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that generators are pretty difficult to get right. You need to think about efficient ways of shrinking them, and what statistical distribution you need to test your scenarios, and how to check that the distribution you selected is actually testing those scenarios. I am not sure we are the ones that should be doing this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. The problem is that this is not a priority for the node team. We can probably do what the Hydra team did, use the generators from cardano-ledger and reconvert the datatype to a cardano-api type.

Ultimately, cardano-api would need to have good generators in them, and I think this is a cross-team effort (Hydra, Djed, Plutus Tools). So I think it's okay for us to start contributing in that direction IMO.

@eyeinsky eyeinsky force-pushed the PLT-172-marconi-indexer-transactions-that-have-been-issued-to-a-particular-script branch 2 times, most recently from d40d7e0 to 800d8f9 Compare September 2, 2022 08:07
@eyeinsky eyeinsky force-pushed the PLT-172-marconi-indexer-transactions-that-have-been-issued-to-a-particular-script branch from 800d8f9 to 6a56e37 Compare September 2, 2022 08:16
@eyeinsky
Copy link
Contributor Author

eyeinsky commented Sep 2, 2022

@koslambrou This MR is now ready merge!

@koslambrou koslambrou merged commit f592a7b into main Sep 2, 2022
@koslambrou koslambrou deleted the PLT-172-marconi-indexer-transactions-that-have-been-issued-to-a-particular-script branch September 2, 2022 13:32
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants