Chain Index: Allow more user configurations #73

kk-hainq · 2021-10-31T17:33:16Z

Area

[x] Plutus Application Framework Related to the Plutus application backend (PAB), emulator, Plutus libraries

Describe the feature you'd like

We should extend the configuration file to let users decide what they want to track and store. For example, we only care for Alonzo's data for our current security work. The ability to store data by era would likely be helpful. For dApp work, we mainly care for transactions that interact with our protocol. The ability to not store irrelevant data would be excellent.

For example, a small addition to AppendBlock like in #72 allows applications to configure the choice to store a batch of transactions to the DB or not. Given that tip and UTXO processing is unaffected, most of these configurations are pretty safe to the functioning of the chain index?

Describe alternatives you've considered

We have been customizing the chain index to our needs, but it is prone to upstream changes. I guess the more people can share and have in upstream, the better as well.

Additional context / screenshots

This issue is part of the #4 mega-thread.

The text was updated successfully, but these errors were encountered:

kk-hainq · 2021-11-26T04:36:37Z

There are over 8 million transactions post-Alonzo on mainnet at the moment, many if not most are irrelevant to any specific dApp. As a modular next step, I think we should add a config to not store transactions that don't reference any scripts (typical transfers of assets, NFT drops, SPO voting, etc).

What do you think? (cc @silky @sjoerdvisscher).

sjoerdvisscher · 2021-12-01T13:14:30Z

I'm not a fan of config options. Our plan is to make the chain index available as a library, which would allow you to filter out transactions with Haskell code. Would that work for you?

kk-hainq · 2021-12-14T11:52:34Z

@sjoerdvisscher Sorry for the late reply, I have been really useless lately...

Anyway, I have suggested both config options (no 7) and library API (no 8) in the original #4 issue. The separation is critical as config options are a lot friendlier to maintain than writing Haskell to achieve the same thing. Especially when one would need to maintain a scary cabal.project with heavy-weight dependencies just to add a simple filtering logic.

On the other hand, let's say we have a Dockerfile for the chain index like:

FROM haskell:8.10.7-buster AS builder
RUN apt-get update && apt-get install --no-install-recommends -y liblzma-dev libsodium-dev libsystemd-dev pkg-config r-base && rm -rf /var/lib/apt/lists/*

ARG PLUTUS_APPS_COMMIT=c4570793b1251e1a8e79e33a5a40f3e2776c5691
RUN git clone https://github.com/input-output-hk/plutus-apps && cd plutus-apps && git checkout ${PLUTUS_APPS_COMMIT}

WORKDIR ./plutus-apps
RUN echo "\npackage cardano-crypto-praos\n  flags: -external-libsodium-vrf\n" >> cabal.project
RUN cabal update
RUN cabal install exe:plutus-chain-index

FROM debian:buster-slim
RUN apt-get update && apt-get install --no-install-recommends -y libsodium-dev && rm -rf /var/lib/apt/lists/*
WORKDIR /opt/chain-index
COPY --from=builder /root/.cabal/bin/plutus-chain-index .
EXPOSE 9083
CMD ./plutus-chain-index start-index -c ./data/config.json

Then with the following docker-compose.yml and chain-index-config.json:

version: "3"

services:
  cardano-node:
    image: inputoutput/cardano-node:1.31.0
    environment:
      - NETWORK=${NETWORK:-testnet}
    volumes:
      - node-db:/data/db
      - node-ipc:/ipc
    restart: on-failure

  chain-index:
    image: chain-index:latest
    ports:
      - ${PORT:-9083}:9083
    volumes:
      - node-ipc:/node-ipc
      - ./data/${NETWORK:-testnet}:/opt/chain-index/data
    depends_on:
      - cardano-node
    restart: on-failure

volumes:
  node-db:
  node-ipc:

{
  "cicSlotConfig": {
    "scSlotLength": 1000,
    "scSlotZeroTime": 1591566291000
  },
  "cicPort": 9083,
  "cicSocketPath": "/node-ipc/node.socket",
  "cicDbPath": "/opt/plutus-index/data/chain-index.db",
  "cicNetworkId": {
    "contents": {
      "unNetworkMagic": 1097911063
    },
    "tag": "Testnet"
  },
  "cicSecurityParam": 2160,
  "cicStoreFrom": {
    "unBlockNo": 2877844
  }
}

Anyone can spin up and maintain a working chain index setup without the need to know Haskell, especially when #129 is done (we would love to join hands there). If we also maintain Docker images like for cardano-node, cardano-db-sync, etc. then a DevOps engineer can just read the documentation once, version control a few config files to bring several chain index configurations to infrastructure in 30 minutes. Setting up a development environment to customize the chain index would already take hours. Updates are also so much easier in the former case while bumping a commit hash in cabal.project alone is scary and time-consuming. Not supporting common configs would risk duplicated code and tests among dApp projects as well.

In conclusion, we have been customizing the chain index since it was still buried in PAB 1.0 in early May. Even if we had the resources to do so (we barely did...), it would be much better to only maintain our specific application logic, with all generally useful configs maintained through config files. Both writing code and finding Haskell engineers are hard. Requiring both for simple configurations would be a bottleneck to most if not all dApp developers, who need to maintain so many other things in code already :(.

P/s: I have been intimidated by nginx.conf and its friends my whole life. Nevertheless, I would rather spend hours reading the doc to get it right, than having the thought of maintaining an nginx fork. I pushed a bug last week and still hear colleagues' blame and laughter in my dreams today...

sjoerdvisscher · 2021-12-14T13:01:19Z

I hear what you're saying, but there's a balance here between your speed and our speed. Having lots of configuration options can really slow our development, since you basically have a combinatorial explosion of small variations of the product. Sure it's nice if we optimise everything so you can deploy a new chain index in 30 minutes, but if you then have to wait for another month until we've finally delivered that new feature you desperately need, it's kind of pointless.

That's hyperbole of course and the truth is somewhere in the middle. In the mean time, we have just merged the start on chain index as a library, take a look: #186

kk-hainq · 2021-12-14T14:41:41Z

True. In a perfect world, dApp developers would port more and help maintain shared patterns upstream, but my previous 2-week late reply proves that the pressure is always on the team at the end of the day.

#186 is definitely nice (thanks!) in that it reduces the amount of code one needs to maintain a customized chain index. That said, the biggest bottleneck of having to set up a development environment, and manage dependencies is still there. Let's use plutus-pab/test-node as a case study.

https://github.com/input-output-hk/plutus-apps/blob/835ce24b7c89fa0d49e985029defc15b6732785e/plutus-pab/test-node/README.md#L133-L137

This shows that the chain index is the bottleneck of the already complicated setup. There have also been issues with synchronization (#171, #189, memory leaks, etc). I think explaining a few JSON configurations to try to sync faster (to not sync uninteresting data to dApps, etc.) is friendly enough. One already has to maintain config files for the wallet.

https://github.com/input-output-hk/plutus-apps/blob/835ce24b7c89fa0d49e985029defc15b6732785e/plutus-pab/test-node/README.md#L62-L71

However, requiring dApp users to clone a template, build a ton of dependencies, and write their own chain index executable in Haskell to sync faster for their application, or just to test faster in plutus-pab/test-node is too heavy-weight.

I hope a balance can still be found where we discuss (and be strict about it) which configs to support. The ones we've found most critical thus far are:

The ability to not store transactions that do not reference a script.
The ability to only sync transactions that reference an address in a pre-defined list of addresses.

(1) can even replace cicStoreFrom, while (2) allows dApps to only sync data in their protocol (many if not most dApps do this), while not wasting precious live resources on others'.

We can implement and write tests for both if accepted.

sjoerdvisscher · 2021-12-14T15:57:37Z

You have convinced me, I agree being able to quickly play around with the chain index without either waiting for hours to sync or building a custom chain index is very valuable. So if you want to implement it, please go ahead!

Also I realise now that with #186 the effect of the new config options can be localised to one function (just like storeFromBlockNo, probably replacing it and using filterTxs), which is comforting.

As for which options to have, I trust you on this one. I don't think you'd need to remove cicStoreFrom since several people had suggested this way of filtering.

kk-hainq · 2021-12-15T01:49:41Z

Awesome, thanks! We'll continue this line of work soon 👍.

kk-hainq · 2022-03-10T04:05:39Z

We have realized that this is not a worthwhile direction anymore. One would need a very fast chain index first :(.

kk-hainq added the enhancement New feature or request label Oct 31, 2021

This was referenced Oct 31, 2021

Chain Index questions and improvement proposals #4

Closed

Chain Index: users can configure to only store txs from a block no onward #72

Merged

silky assigned kk-hainq Dec 20, 2021

silky added the chain-index Issue related to the chain-index component label Dec 20, 2021

kk-hainq mentioned this issue Dec 29, 2021

More chain index configs #222

Closed

8 tasks

kk-hainq closed this as completed Mar 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chain Index: Allow more user configurations #73

Chain Index: Allow more user configurations #73

kk-hainq commented Oct 31, 2021 •

edited

Loading

kk-hainq commented Nov 26, 2021 •

edited

Loading

sjoerdvisscher commented Dec 1, 2021

kk-hainq commented Dec 14, 2021 •

edited

Loading

sjoerdvisscher commented Dec 14, 2021 •

edited

Loading

kk-hainq commented Dec 14, 2021 •

edited

Loading

sjoerdvisscher commented Dec 14, 2021

kk-hainq commented Dec 15, 2021

kk-hainq commented Mar 10, 2022

Chain Index: Allow more user configurations #73

Chain Index: Allow more user configurations #73

Comments

kk-hainq commented Oct 31, 2021 • edited Loading

Area

Describe the feature you'd like

Describe alternatives you've considered

Additional context / screenshots

kk-hainq commented Nov 26, 2021 • edited Loading

sjoerdvisscher commented Dec 1, 2021

kk-hainq commented Dec 14, 2021 • edited Loading

sjoerdvisscher commented Dec 14, 2021 • edited Loading

kk-hainq commented Dec 14, 2021 • edited Loading

sjoerdvisscher commented Dec 14, 2021

kk-hainq commented Dec 15, 2021

kk-hainq commented Mar 10, 2022

kk-hainq commented Oct 31, 2021 •

edited

Loading

kk-hainq commented Nov 26, 2021 •

edited

Loading

kk-hainq commented Dec 14, 2021 •

edited

Loading

sjoerdvisscher commented Dec 14, 2021 •

edited

Loading

kk-hainq commented Dec 14, 2021 •

edited

Loading