Skip to content
This repository has been archived by the owner on Dec 2, 2024. It is now read-only.

Chain Index: Allow more user configurations #73

Closed
kk-hainq opened this issue Oct 31, 2021 · 8 comments
Closed

Chain Index: Allow more user configurations #73

kk-hainq opened this issue Oct 31, 2021 · 8 comments
Assignees
Labels
chain-index Issue related to the chain-index component enhancement New feature or request

Comments

@kk-hainq
Copy link
Contributor

kk-hainq commented Oct 31, 2021

Area

[x] Plutus Application Framework Related to the Plutus application backend (PAB), emulator, Plutus libraries

Describe the feature you'd like

We should extend the configuration file to let users decide what they want to track and store. For example, we only care for Alonzo's data for our current security work. The ability to store data by era would likely be helpful. For dApp work, we mainly care for transactions that interact with our protocol. The ability to not store irrelevant data would be excellent.

For example, a small addition to AppendBlock like in #72 allows applications to configure the choice to store a batch of transactions to the DB or not. Given that tip and UTXO processing is unaffected, most of these configurations are pretty safe to the functioning of the chain index?

Describe alternatives you've considered

We have been customizing the chain index to our needs, but it is prone to upstream changes. I guess the more people can share and have in upstream, the better as well.

Additional context / screenshots

This issue is part of the #4 mega-thread.

@kk-hainq
Copy link
Contributor Author

kk-hainq commented Nov 26, 2021

There are over 8 million transactions post-Alonzo on mainnet at the moment, many if not most are irrelevant to any specific dApp. As a modular next step, I think we should add a config to not store transactions that don't reference any scripts (typical transfers of assets, NFT drops, SPO voting, etc).

What do you think? (cc @silky @sjoerdvisscher).

@sjoerdvisscher
Copy link
Contributor

I'm not a fan of config options. Our plan is to make the chain index available as a library, which would allow you to filter out transactions with Haskell code. Would that work for you?

@kk-hainq
Copy link
Contributor Author

kk-hainq commented Dec 14, 2021

@sjoerdvisscher Sorry for the late reply, I have been really useless lately...

Anyway, I have suggested both config options (no 7) and library API (no 8) in the original #4 issue. The separation is critical as config options are a lot friendlier to maintain than writing Haskell to achieve the same thing. Especially when one would need to maintain a scary cabal.project with heavy-weight dependencies just to add a simple filtering logic.

On the other hand, let's say we have a Dockerfile for the chain index like:

FROM haskell:8.10.7-buster AS builder
RUN apt-get update && apt-get install --no-install-recommends -y liblzma-dev libsodium-dev libsystemd-dev pkg-config r-base && rm -rf /var/lib/apt/lists/*

ARG PLUTUS_APPS_COMMIT=c4570793b1251e1a8e79e33a5a40f3e2776c5691
RUN git clone https://github.com/input-output-hk/plutus-apps && cd plutus-apps && git checkout ${PLUTUS_APPS_COMMIT}

WORKDIR ./plutus-apps
RUN echo "\npackage cardano-crypto-praos\n  flags: -external-libsodium-vrf\n" >> cabal.project
RUN cabal update
RUN cabal install exe:plutus-chain-index

FROM debian:buster-slim
RUN apt-get update && apt-get install --no-install-recommends -y libsodium-dev && rm -rf /var/lib/apt/lists/*
WORKDIR /opt/chain-index
COPY --from=builder /root/.cabal/bin/plutus-chain-index .
EXPOSE 9083
CMD ./plutus-chain-index start-index -c ./data/config.json

Then with the following docker-compose.yml and chain-index-config.json:

version: "3"

services:
  cardano-node:
    image: inputoutput/cardano-node:1.31.0
    environment:
      - NETWORK=${NETWORK:-testnet}
    volumes:
      - node-db:/data/db
      - node-ipc:/ipc
    restart: on-failure

  chain-index:
    image: chain-index:latest
    ports:
      - ${PORT:-9083}:9083
    volumes:
      - node-ipc:/node-ipc
      - ./data/${NETWORK:-testnet}:/opt/chain-index/data
    depends_on:
      - cardano-node
    restart: on-failure

volumes:
  node-db:
  node-ipc:
{
  "cicSlotConfig": {
    "scSlotLength": 1000,
    "scSlotZeroTime": 1591566291000
  },
  "cicPort": 9083,
  "cicSocketPath": "/node-ipc/node.socket",
  "cicDbPath": "/opt/plutus-index/data/chain-index.db",
  "cicNetworkId": {
    "contents": {
      "unNetworkMagic": 1097911063
    },
    "tag": "Testnet"
  },
  "cicSecurityParam": 2160,
  "cicStoreFrom": {
    "unBlockNo": 2877844
  }
}

Anyone can spin up and maintain a working chain index setup without the need to know Haskell, especially when #129 is done (we would love to join hands there). If we also maintain Docker images like for cardano-node, cardano-db-sync, etc. then a DevOps engineer can just read the documentation once, version control a few config files to bring several chain index configurations to infrastructure in 30 minutes. Setting up a development environment to customize the chain index would already take hours. Updates are also so much easier in the former case while bumping a commit hash in cabal.project alone is scary and time-consuming. Not supporting common configs would risk duplicated code and tests among dApp projects as well.

In conclusion, we have been customizing the chain index since it was still buried in PAB 1.0 in early May. Even if we had the resources to do so (we barely did...), it would be much better to only maintain our specific application logic, with all generally useful configs maintained through config files. Both writing code and finding Haskell engineers are hard. Requiring both for simple configurations would be a bottleneck to most if not all dApp developers, who need to maintain so many other things in code already :(.

P/s: I have been intimidated by nginx.conf and its friends my whole life. Nevertheless, I would rather spend hours reading the doc to get it right, than having the thought of maintaining an nginx fork. I pushed a bug last week and still hear colleagues' blame and laughter in my dreams today...

@sjoerdvisscher
Copy link
Contributor

sjoerdvisscher commented Dec 14, 2021

I hear what you're saying, but there's a balance here between your speed and our speed. Having lots of configuration options can really slow our development, since you basically have a combinatorial explosion of small variations of the product. Sure it's nice if we optimise everything so you can deploy a new chain index in 30 minutes, but if you then have to wait for another month until we've finally delivered that new feature you desperately need, it's kind of pointless.

That's hyperbole of course and the truth is somewhere in the middle. In the mean time, we have just merged the start on chain index as a library, take a look: #186

@kk-hainq
Copy link
Contributor Author

kk-hainq commented Dec 14, 2021

True. In a perfect world, dApp developers would port more and help maintain shared patterns upstream, but my previous 2-week late reply proves that the pressure is always on the team at the end of the day.

#186 is definitely nice (thanks!) in that it reduces the amount of code one needs to maintain a customized chain index. That said, the biggest bottleneck of having to set up a development environment, and manage dependencies is still there. Let's use plutus-pab/test-node as a case study.

https://github.com/input-output-hk/plutus-apps/blob/835ce24b7c89fa0d49e985029defc15b6732785e/plutus-pab/test-node/README.md#L133-L137

This shows that the chain index is the bottleneck of the already complicated setup. There have also been issues with synchronization (#171, #189, memory leaks, etc). I think explaining a few JSON configurations to try to sync faster (to not sync uninteresting data to dApps, etc.) is friendly enough. One already has to maintain config files for the wallet.

https://github.com/input-output-hk/plutus-apps/blob/835ce24b7c89fa0d49e985029defc15b6732785e/plutus-pab/test-node/README.md#L62-L71

However, requiring dApp users to clone a template, build a ton of dependencies, and write their own chain index executable in Haskell to sync faster for their application, or just to test faster in plutus-pab/test-node is too heavy-weight.

I hope a balance can still be found where we discuss (and be strict about it) which configs to support. The ones we've found most critical thus far are:

  1. The ability to not store transactions that do not reference a script.
  2. The ability to only sync transactions that reference an address in a pre-defined list of addresses.

(1) can even replace cicStoreFrom, while (2) allows dApps to only sync data in their protocol (many if not most dApps do this), while not wasting precious live resources on others'.

We can implement and write tests for both if accepted.

@sjoerdvisscher
Copy link
Contributor

You have convinced me, I agree being able to quickly play around with the chain index without either waiting for hours to sync or building a custom chain index is very valuable. So if you want to implement it, please go ahead!

Also I realise now that with #186 the effect of the new config options can be localised to one function (just like storeFromBlockNo, probably replacing it and using filterTxs), which is comforting.

As for which options to have, I trust you on this one. I don't think you'd need to remove cicStoreFrom since several people had suggested this way of filtering.

@kk-hainq
Copy link
Contributor Author

Awesome, thanks! We'll continue this line of work soon 👍.

@silky silky added the chain-index Issue related to the chain-index component label Dec 20, 2021
@kk-hainq kk-hainq mentioned this issue Dec 29, 2021
8 tasks
@kk-hainq
Copy link
Contributor Author

We have realized that this is not a worthwhile direction anymore. One would need a very fast chain index first :(.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
chain-index Issue related to the chain-index component enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants