Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid skipping of events #2274

Closed
wants to merge 2 commits into from
Closed

Conversation

claasahl
Copy link

@claasahl claasahl commented Nov 7, 2021

Background

One my projects subscribes to Sync events from Uniswap-like pairs on Binance Smart Chain (BSC). Over time I realised that some of these events were skipped for no obvious reason (e.g. no network issues, no memory issues, etc.).

This example demonstrates this behavior. It requests Sync events for every new block twice using provider.getLogs(...). In an ideal world, both requests yield the same result. In practice, this is not always the case and the results might show a different number of events (e.g. one request has 5 events and the other has zero events).

The simplest explanation that I could come up with, is that the official JSON RPC endpoint for BSC (https://bsc-dataseed.binance.org/) is a load balancer, which distributes requests to various nodes. Not all of these nodes necessarily have the exact same state. Since BSC's average block length is about 3 seconds (https://bscscan.com/chart/blocktime), it can happen that some nodes lag slightly (i.e. they might not yet have caught up with the most recent block).

Every now and then, one of the requests in my example would go to an "up-to-date" node while the other request would go to an "out-of-date" node. Thus creating a situation where the requests return different results. I expect this effect to be most pronounced on smart contracts which frequently emit events (i.e. one or more events per block) and less pronounced on smart contracts which emit few events. The above example uses a smart contract which emits events every few seconds (https://bscscan.com/address/0x8fa59693458289914db0097f5f366d771b7a7c3f#events).

The fact that some nodes can be ever so slightly "out-of-date" is not taken into account in the following snippet. Instead, it is assumed that all requested events are returned regardless of the fact whether the requested data is available (in the node) or not. For slightly "out-of-date" nodes, this is a bit like requesting data for a future block.

https://github.com/ethers-io/ethers.js/blob/master/packages/providers/src.ts/base-provider.ts#L871-L887

Solution

My proposed change is to inspect the returned data, keeping track of the latest returned block number rather than assuming that all events up to toBlock were included (i.e. returned block number <= toBlock). For those cases where no new events become available for a longer period of time, a maximum range of 12 blocks is proposed (i.e. just like when removing obsolete events).

https://github.com/ethers-io/ethers.js/blob/master/packages/providers/src.ts/base-provider.ts#L841-L845


I suppose, it can be argued that this is a problem with BSC. My take is that this is not unique to BSC and an avoidable source of very subtle bugs.

@claasahl claasahl marked this pull request as ready for review November 7, 2021 09:48
@claasahl
Copy link
Author

claasahl commented Nov 7, 2021

related to: #2167

@timaiv
Copy link

timaiv commented Nov 12, 2021

Ethers.js misses logs, cuz GETH nodes return "[]";
OpenEthereum nodes give error "Invalid range".
We fixed this in our apps by polling with filter.fromBlock "_lastBlockNumber-confirmation for chain".
Our confirmations:

  • ETH: 12
  • POL, BSC-30 (12 and 25 even not enough due to chain reorganization by our internal tests, (25 misses 1 log per day, 12 misses 50+ logs))

We store hashes per block with events, and active polling intervals, to prevent raising duplicated event (if hash and block are same).
On new polling, if no any active interval older than filter.fromBlock exists, we delete block from memory.
After this we can detect blockchain reorganization (ex: new tx in old block) with depth equal to our confirmation.

@claasahl
Copy link
Author

Thanks for your message @timaiv That is fair solution for those cases one manually polls logs 🙂

In my case, I would like to be able to rely on ethers.js's internal polling mechanisms and use something like the following without having to worry about missing events.

const contract = new ethers.Contract("0x8fa59693458289914db0097f5f366d771b7a7c3f", [
    "event Sync(uint112 reserve0, uint112 reserve1)"
], provider);
contract.on("Sync", (reserve0, reserve1) => {
  // ...
});

@timaiv
Copy link

timaiv commented Nov 12, 2021

Thanks for your message @timaiv That is fair solution for those cases one manually polls logs 🙂

In my case, I would like to be able to rely on ethers.js's internal polling mechanisms and use something like the following without having to worry about missing events.

const contract = new ethers.Contract("0x8fa59693458289914db0097f5f366d771b7a7c3f", [
    "event Sync(uint112 reserve0, uint112 reserve1)"
], provider);
contract.on("Sync", (reserve0, reserve1) => {
  // ...
});

Our solutions allows this and we use this actually.
We created class that overrides poll function and has some helper functions

export class BaseProviderEx extends ethers.providers.BaseProvider 
{
..
public static pollingFuncNamesToOverride(): string[]
    {
        return [
            BaseProviderEx.prototype.poll.name,
            BaseProviderEx.prototype._logAsync.name, 
            BaseProviderEx.prototype._traceProblem.name, 
            BaseProviderEx.prototype._onLogRangeProcessed.name, 
            BaseProviderEx.prototype._onLogRangeRequested.name,
            BaseProviderEx.prototype._isBlockExistsInAnyActiveRange.name
            ];
    }
}

And helper that changes prototype functions in BaseProvider

export class EthersLibFixer
{
    /**
     * Fixes missed logs due to chain reorganization or empty list from GETH based nodes.
     * @param chainPastOffsets_ Shifts in past to still take logs, should be equal to confirmation in logic per chain.
     */
    public static fixPolling(chainPastOffsets_: Map<number, number>)
    {
        EthersFixerSettings.defaultChainLogPollingPastOffset = chainPastOffsets_;
        let baseProviderPrototype: any = ethers.providers.BaseProvider.prototype;
        let baseProviderExPrototype: any = BaseProviderEx.prototype;

        let functions = BaseProviderEx.pollingFuncNamesToOverride();
        for (let functionName of functions)
        {
            baseProviderPrototype[functionName] = baseProviderExPrototype[functionName];
        }
    }
}

Usage

        EthersLibFixer.fixPolling(confirmationsForPolling)

@claasahl
Copy link
Author

Thank you for sharing @timaiv

I don't doubt that your solution works. It just seems odd to me that such a solution is needed in the first place. If I could choose between patching ethers.js at runtime and simply using ethers.js out-of-the-box, I would prefer the latter.

Have you tried to get your solution integrated into ethers.js?

@claasahl
Copy link
Author

Or is ethers.js behavior (with regard to this PR) correct and intended?

@ricmoo
Copy link
Member

ricmoo commented Nov 21, 2021

Sorry, just saw this issue now for some reason.

Are you using a FallbackProvider? The FallbackProvider uses stallUntilSync to ensure the backends are synced to matching blocks before their response counts towards the quorum.

If there is something special for BSC, we can add the custom logic to the ethers-ancillary/bsc package.

In v6, there is a larger variety of Subscriber models, including one which uses getChanges. Does the BSC load balancer honour the getChanges for a given filter ID? The INFURA backends have the same issue with their load balancer.

@timaiv
Copy link

timaiv commented Nov 21, 2021

Sorry, just saw this issue now for some reason.

Are you using a FallbackProvider? The FallbackProvider uses stallUntilSync to ensure the backends are synced to matching blocks before their response counts towards the quorum.

If there is something special for BSC, we can add the custom logic to the ethers-ancillary/bsc package.

In v6, there is a larger variety of Subscriber models, including one which uses getChanges. Does the BSC load balancer honour the getChanges for a given filter ID? The INFURA backends have the same issue with their load balancer.

I spent about two weeks in total in investigating on problems with events.
There are 2 different problems.

  1. Loss of events due to desynchronization (racing)
  2. Loss of events due to blockchain reorganization

Desynchronization
ALL the following providers: alchemy, infura, etherscan, bsc. (not only infura)
they say that although the current block is X, but when loading the logs {from:X,to:X}, they return or an empty array (GETH based) or an error(Parity based).
From tests, providers can return empty array even after 8 seconds after block mined (exists on provider by eth_blockNumber).

Blockchain reorganization
I ran tests every night and I can say that this is a very common occurrence.
Ex:
-i detected log, (tx) in block 100, but actually (after 1 min) this tx in block 105.
-i downloaded 2 logs (2 tx) in block 100, but after 1 min, block actually 100 has 5 logs, (5 tx), so i missed 3 events.
__
And yes, i use FallbackProvider also. I can share full source code from our fix, that doesn't skip event.

@claasahl
Copy link
Author

Thanks for the replies @ricmoo and @timaiv

Are you using a FallbackProvider? The FallbackProvider uses stallUntilSync to ensure the backends are synced to matching blocks before their response counts towards the quorum.

No, I have not been using FallbackProvider. It should make it less likely for events to get skipped, but comes at the cost of increased network traffic without fully addressing the issue (i.e. there will still be skipped events, just fewer of them).

If there is something special for BSC, we can add the custom logic to the ethers-ancillary/bsc package.

I don't think this specific to BSC. It is just more obvious in the case of BSC, because of the rather short block length.

In v6, there is a larger variety of Subscriber models, including one which uses getChanges. Does the BSC load balancer honour the getChanges for a given filter ID? The INFURA backends have the same issue with their load balancer.

I am assuming that you are referring to something like this #1768 (comment). If I remember correctly, I did not manage to get that working, but it might be worthwhile to give it another try.

@atropos0902
Copy link

atropos0902 commented Feb 4, 2022

@claasahl I have a similar issue as you.
Almost 20% of events are missing with contract.on(...).
I am also on BSC and using https://bsc-dataseed.binance.org/ as Json rpc provider.
Can you please let me know how to resolve the issue?
Thanks!

@atropos0902
Copy link

@claasahl I tried with your updates and seems like it's being triggered 2 times for one event.

@claasahl
Copy link
Author

claasahl commented Feb 5, 2022

@atropos0902 thanks for your reply (and pings in #2652 and #2167). The quickest solution is probably to implement your own polling strategy (i.e. something that repeatedly polls the latest events with getLogs). However, a better solution would be to update ethers's polling strategy to take the returned events into account.

Problem

case "filter": {
const filter = event.filter;
filter.fromBlock = this._lastBlockNumber + 1;
filter.toBlock = blockNumber;
const runner = this.getLogs(filter).then((logs) => {
if (logs.length === 0) { return; }
logs.forEach((log: Log) => {
this._emitted["b:" + log.blockHash] = log.blockNumber;
this._emitted["t:" + log.transactionHash] = log.blockNumber;
this.emit(filter, log);
});
}).catch((error: Error) => { this.emit("error", error); });
runners.push(runner);
break;
}
}
});
this._lastBlockNumber = blockNumber;

The above code snippet also uses getLogs to poll events, but it "blindly" assumes that all events up to blockNumber (see filter.toBlock = blockNumber; at line 912 and this._lastBlockNumber = blockNumber; at line 929) are included in the response. In practise this assumption does not hold for BSC. The reason why this assumption does not hold is that the node, which processes the getLogs request, may not have caught up with the latest block (e.g. the node is missing the latest block or two) and, thus cannot return all the events that were requested by ethers.

While this issue is more noticable on blockchains with short block lengths (e.g. BSC with its 3 second block length), it is not limited to them. The greatest potential for this issue to occur is whenever a new block was just mined. The shorter the time span since the mining of a new block, the higher the risk of this issue occurring (i.e. the risk that a node is lagging behind is greatest at this point). The more time passes, the greater the chance that all / enough nodes in the blockchain will have caught with the newly mined block and thus reducing the frequency of this issue.

Example

  • assuming BSC's latest block is 1000
  • there is 1 "some event"-event in block 995
  • there are 5 "some event"-events in block 1000
  • node A has fully caught up (i.e. it is at block 1000)
  • node B is slightly behind (i.e. it is still at block 999)

We are subscribing to "some event"-events like so:

const contract = new ethers.Contract(...);
contract.on('some event', () => {
 // handle event
});

ethers will poll BSC for "some event"-events. Let's say that ethers is polling from block 990 to block 1000.

If the request goes to node A, the response will include all 6 "some event"-events (i.e. the events from block 995 and 1000). ethers will correctly assume that it got all events up to and including block 1000. The next poll will start from block 1001.

If the request goes to node B, the response will only include 1 "some event"-event (i.e. the event from block 995). ethers will incorrectly assume that it got all events up to and including block 1000. The next poll will also start from block 1001 and ignore / skip the 5 "some event"-events in block 1000.

Solution

ethers should inspect the returned events and know that it only got events up to a certain block (i.e. block 995 in this example). During the next poll, ethers can poll from the next block (i.e. 996).

@ricmoo
Copy link
Member

ricmoo commented Mar 10, 2022

Ok... It was a lot of experimentation and research, but I think this has been fixed in v5.6.

Please give it a try and let me know if you still have any problems. :)

@ricmoo ricmoo added enhancement New feature or improvement. fixed/complete This Bug is fixed or Enhancement is complete and published. labels Mar 10, 2022
@atropos0902
Copy link

Ok... It was a lot of experimentation and research, but I think this has been fixed in v5.6.

Please give it a try and let me know if you still have any problems. :)

Amazing!!!
I've been waiting for this moment for 2 months.
Let me try now!!!
Thank you @ricmoo

@ricmoo
Copy link
Member

ricmoo commented Mar 17, 2022

This was fixed, so I'll close it. Please re-open or create a new issue if there are any problems.

Thanks! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or improvement. fixed/complete This Bug is fixed or Enhancement is complete and published.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants