Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contract log fetching / querying is unnecessarily inefficient in multiple respects #1119

Open
fubuloubu opened this issue Nov 2, 2022 · 1 comment

Comments

@fubuloubu
Copy link
Member

When iterating over event logs, it is very slow to collect them together, because it tries to iterate over the entire blockchain's history, when the contract only deployed at some block within that history:

def get_contract_logs(self, log_filter: LogFilter) -> Iterator[ContractLog]:
height = self.chain_manager.blocks.height
start_block = log_filter.start_block
stop_block_arg = log_filter.stop_block if log_filter.stop_block is not None else height
stop_block = min(stop_block_arg, height)
block_ranges = self.block_ranges(start_block, stop_block, self.block_page_size)
def fetch_log_page(block_range):
start, stop = block_range
page_filter = log_filter.copy(update=dict(start_block=start, stop_block=stop))
# eth-tester expects a different format, let web3 handle the conversions for it
raw = "EthereumTester" not in self.client_version
logs = self._get_logs(page_filter.dict(), raw)
return self.network.ecosystem.decode_logs(logs, *log_filter.events)
with ThreadPoolExecutor(self.concurrency) as pool:
for page in pool.map(fetch_log_page, block_ranges):
yield from page

  1. Event log start block should be bound to whenever the contract was first deployed, to the point it last existed (SELFDESTRUCT). If there is a change in contract code within using a CREATE2 proxy, those might make changes to end block as well
  2. How batching works can be extremely inefficient. Some blocks have no events, and some have a lot.
  3. Web3py provides a direct API for filtering events, some research should be done to determine if that might improve this section of code
  4. Further work should continue to be done on the query layer, which might vastly speed up the process of obtaining logs via data pipeline plugins
@antazoey antazoey changed the title Event logs don't work very well Event logs is unnecessarily inefficient in multiple respects Nov 3, 2022
@antazoey antazoey changed the title Event logs is unnecessarily inefficient in multiple respects Contract log fetching / querying is unnecessarily inefficient in multiple respects Nov 3, 2022
@fubuloubu
Copy link
Member Author

Some elements were resolved in #1548, but further research is required if the above mentioned elements are still valid (and if there are more)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant