Avoid skipping of events #2274

claasahl · 2021-11-07T09:46:48Z

Background

One my projects subscribes to Sync events from Uniswap-like pairs on Binance Smart Chain (BSC). Over time I realised that some of these events were skipped for no obvious reason (e.g. no network issues, no memory issues, etc.).

This example demonstrates this behavior. It requests Sync events for every new block twice using provider.getLogs(...). In an ideal world, both requests yield the same result. In practice, this is not always the case and the results might show a different number of events (e.g. one request has 5 events and the other has zero events).

The simplest explanation that I could come up with, is that the official JSON RPC endpoint for BSC (https://bsc-dataseed.binance.org/) is a load balancer, which distributes requests to various nodes. Not all of these nodes necessarily have the exact same state. Since BSC's average block length is about 3 seconds (https://bscscan.com/chart/blocktime), it can happen that some nodes lag slightly (i.e. they might not yet have caught up with the most recent block).

Every now and then, one of the requests in my example would go to an "up-to-date" node while the other request would go to an "out-of-date" node. Thus creating a situation where the requests return different results. I expect this effect to be most pronounced on smart contracts which frequently emit events (i.e. one or more events per block) and less pronounced on smart contracts which emit few events. The above example uses a smart contract which emits events every few seconds (https://bscscan.com/address/0x8fa59693458289914db0097f5f366d771b7a7c3f#events).

The fact that some nodes can be ever so slightly "out-of-date" is not taken into account in the following snippet. Instead, it is assumed that all requested events are returned regardless of the fact whether the requested data is available (in the node) or not. For slightly "out-of-date" nodes, this is a bit like requesting data for a future block.

https://github.com/ethers-io/ethers.js/blob/master/packages/providers/src.ts/base-provider.ts#L871-L887

Solution

My proposed change is to inspect the returned data, keeping track of the latest returned block number rather than assuming that all events up to toBlock were included (i.e. returned block number <= toBlock). For those cases where no new events become available for a longer period of time, a maximum range of 12 blocks is proposed (i.e. just like when removing obsolete events).

https://github.com/ethers-io/ethers.js/blob/master/packages/providers/src.ts/base-provider.ts#L841-L845

I suppose, it can be argued that this is a problem with BSC. My take is that this is not unique to BSC and an avoidable source of very subtle bugs.

claasahl · 2021-11-07T09:52:22Z

related to: #2167

timaiv · 2021-11-12T12:52:08Z

Ethers.js misses logs, cuz GETH nodes return "[]";
OpenEthereum nodes give error "Invalid range".
We fixed this in our apps by polling with filter.fromBlock "_lastBlockNumber-confirmation for chain".
Our confirmations:

ETH: 12
POL, BSC-30 (12 and 25 even not enough due to chain reorganization by our internal tests, (25 misses 1 log per day, 12 misses 50+ logs))

We store hashes per block with events, and active polling intervals, to prevent raising duplicated event (if hash and block are same).
On new polling, if no any active interval older than filter.fromBlock exists, we delete block from memory.
After this we can detect blockchain reorganization (ex: new tx in old block) with depth equal to our confirmation.

claasahl · 2021-11-12T18:23:43Z

Thanks for your message @timaiv That is fair solution for those cases one manually polls logs 🙂

In my case, I would like to be able to rely on ethers.js's internal polling mechanisms and use something like the following without having to worry about missing events.

const contract = new ethers.Contract("0x8fa59693458289914db0097f5f366d771b7a7c3f", [
    "event Sync(uint112 reserve0, uint112 reserve1)"
], provider);
contract.on("Sync", (reserve0, reserve1) => {
  // ...
});

timaiv · 2021-11-12T18:45:01Z

Thanks for your message @timaiv That is fair solution for those cases one manually polls logs 🙂

In my case, I would like to be able to rely on ethers.js's internal polling mechanisms and use something like the following without having to worry about missing events.
const contract = new ethers.Contract("0x8fa59693458289914db0097f5f366d771b7a7c3f", [
    "event Sync(uint112 reserve0, uint112 reserve1)"
], provider);
contract.on("Sync", (reserve0, reserve1) => {
  // ...
});

Our solutions allows this and we use this actually.
We created class that overrides poll function and has some helper functions

export class BaseProviderEx extends ethers.providers.BaseProvider 
{
..
public static pollingFuncNamesToOverride(): string[]
    {
        return [
            BaseProviderEx.prototype.poll.name,
            BaseProviderEx.prototype._logAsync.name, 
            BaseProviderEx.prototype._traceProblem.name, 
            BaseProviderEx.prototype._onLogRangeProcessed.name, 
            BaseProviderEx.prototype._onLogRangeRequested.name,
            BaseProviderEx.prototype._isBlockExistsInAnyActiveRange.name
            ];
    }
}

And helper that changes prototype functions in BaseProvider

export class EthersLibFixer
{
    /**
     * Fixes missed logs due to chain reorganization or empty list from GETH based nodes.
     * @param chainPastOffsets_ Shifts in past to still take logs, should be equal to confirmation in logic per chain.
     */
    public static fixPolling(chainPastOffsets_: Map<number, number>)
    {
        EthersFixerSettings.defaultChainLogPollingPastOffset = chainPastOffsets_;
        let baseProviderPrototype: any = ethers.providers.BaseProvider.prototype;
        let baseProviderExPrototype: any = BaseProviderEx.prototype;

        let functions = BaseProviderEx.pollingFuncNamesToOverride();
        for (let functionName of functions)
        {
            baseProviderPrototype[functionName] = baseProviderExPrototype[functionName];
        }
    }
}

Usage

        EthersLibFixer.fixPolling(confirmationsForPolling)

claasahl · 2021-11-13T09:07:13Z

Thank you for sharing @timaiv

I don't doubt that your solution works. It just seems odd to me that such a solution is needed in the first place. If I could choose between patching ethers.js at runtime and simply using ethers.js out-of-the-box, I would prefer the latter.

Have you tried to get your solution integrated into ethers.js?

claasahl · 2021-11-21T09:15:28Z

Or is ethers.js behavior (with regard to this PR) correct and intended?

ricmoo · 2021-11-21T09:24:19Z

Sorry, just saw this issue now for some reason.

Are you using a FallbackProvider? The FallbackProvider uses stallUntilSync to ensure the backends are synced to matching blocks before their response counts towards the quorum.

If there is something special for BSC, we can add the custom logic to the ethers-ancillary/bsc package.

In v6, there is a larger variety of Subscriber models, including one which uses getChanges. Does the BSC load balancer honour the getChanges for a given filter ID? The INFURA backends have the same issue with their load balancer.

timaiv · 2021-11-21T09:47:42Z

Sorry, just saw this issue now for some reason.

Are you using a FallbackProvider? The FallbackProvider uses stallUntilSync to ensure the backends are synced to matching blocks before their response counts towards the quorum.

If there is something special for BSC, we can add the custom logic to the ethers-ancillary/bsc package.

In v6, there is a larger variety of Subscriber models, including one which uses getChanges. Does the BSC load balancer honour the getChanges for a given filter ID? The INFURA backends have the same issue with their load balancer.

I spent about two weeks in total in investigating on problems with events.
There are 2 different problems.

Loss of events due to desynchronization (racing)
Loss of events due to blockchain reorganization

Desynchronization
ALL the following providers: alchemy, infura, etherscan, bsc. (not only infura)
they say that although the current block is X, but when loading the logs {from:X,to:X}, they return or an empty array (GETH based) or an error(Parity based).
From tests, providers can return empty array even after 8 seconds after block mined (exists on provider by eth_blockNumber).

Blockchain reorganization
I ran tests every night and I can say that this is a very common occurrence.
Ex:
-i detected log, (tx) in block 100, but actually (after 1 min) this tx in block 105.
-i downloaded 2 logs (2 tx) in block 100, but after 1 min, block actually 100 has 5 logs, (5 tx), so i missed 3 events.
__
And yes, i use FallbackProvider also. I can share full source code from our fix, that doesn't skip event.

claasahl · 2021-11-21T10:20:55Z

Thanks for the replies @ricmoo and @timaiv

Are you using a FallbackProvider? The FallbackProvider uses stallUntilSync to ensure the backends are synced to matching blocks before their response counts towards the quorum.

No, I have not been using FallbackProvider. It should make it less likely for events to get skipped, but comes at the cost of increased network traffic without fully addressing the issue (i.e. there will still be skipped events, just fewer of them).

If there is something special for BSC, we can add the custom logic to the ethers-ancillary/bsc package.

I don't think this specific to BSC. It is just more obvious in the case of BSC, because of the rather short block length.

In v6, there is a larger variety of Subscriber models, including one which uses getChanges. Does the BSC load balancer honour the getChanges for a given filter ID? The INFURA backends have the same issue with their load balancer.

I am assuming that you are referring to something like this #1768 (comment). If I remember correctly, I did not manage to get that working, but it might be worthwhile to give it another try.

atropos0902 · 2022-02-04T18:36:15Z

@claasahl I have a similar issue as you.
Almost 20% of events are missing with contract.on(...).
I am also on BSC and using https://bsc-dataseed.binance.org/ as Json rpc provider.
Can you please let me know how to resolve the issue?
Thanks!

atropos0902 · 2022-02-05T00:03:26Z

@claasahl I tried with your updates and seems like it's being triggered 2 times for one event.

claasahl · 2022-02-05T18:17:25Z

@atropos0902 thanks for your reply (and pings in #2652 and #2167). The quickest solution is probably to implement your own polling strategy (i.e. something that repeatedly polls the latest events with getLogs). However, a better solution would be to update ethers's polling strategy to take the returned events into account.

Problem

ethers.js/packages/providers/src.ts/base-provider.ts

Lines 909 to 929 in 7b134bd

    
                   case "filter": { 
        
                       const filter = event.filter; 
        
                       filter.fromBlock = this._lastBlockNumber + 1; 
        
                       filter.toBlock = blockNumber; 
        
                       const runner = this.getLogs(filter).then((logs) => { 
        
                           if (logs.length === 0) { return; } 
        
                           logs.forEach((log: Log) => { 
        
                               this._emitted["b:" + log.blockHash] = log.blockNumber; 
        
                               this._emitted["t:" + log.transactionHash] = log.blockNumber; 
        
                               this.emit(filter, log); 
        
                           }); 
        
                       }).catch((error: Error) => { this.emit("error", error); }); 
        
                       runners.push(runner); 
        
                       break; 
        
                   } 
        
               } 
        
           }); 
        
           this._lastBlockNumber = blockNumber;

The above code snippet also uses getLogs to poll events, but it "blindly" assumes that all events up to blockNumber (see filter.toBlock = blockNumber; at line 912 and this._lastBlockNumber = blockNumber; at line 929) are included in the response. In practise this assumption does not hold for BSC. The reason why this assumption does not hold is that the node, which processes the getLogs request, may not have caught up with the latest block (e.g. the node is missing the latest block or two) and, thus cannot return all the events that were requested by ethers.

While this issue is more noticable on blockchains with short block lengths (e.g. BSC with its 3 second block length), it is not limited to them. The greatest potential for this issue to occur is whenever a new block was just mined. The shorter the time span since the mining of a new block, the higher the risk of this issue occurring (i.e. the risk that a node is lagging behind is greatest at this point). The more time passes, the greater the chance that all / enough nodes in the blockchain will have caught with the newly mined block and thus reducing the frequency of this issue.

Example

assuming BSC's latest block is 1000
there is 1 "some event"-event in block 995
there are 5 "some event"-events in block 1000
node A has fully caught up (i.e. it is at block 1000)
node B is slightly behind (i.e. it is still at block 999)

We are subscribing to "some event"-events like so:

const contract = new ethers.Contract(...);
contract.on('some event', () => {
 // handle event
});

ethers will poll BSC for "some event"-events. Let's say that ethers is polling from block 990 to block 1000.

If the request goes to node A, the response will include all 6 "some event"-events (i.e. the events from block 995 and 1000). ethers will correctly assume that it got all events up to and including block 1000. The next poll will start from block 1001.

If the request goes to node B, the response will only include 1 "some event"-event (i.e. the event from block 995). ethers will incorrectly assume that it got all events up to and including block 1000. The next poll will also start from block 1001 and ignore / skip the 5 "some event"-events in block 1000.

Solution

ethers should inspect the returned events and know that it only got events up to a certain block (i.e. block 995 in this example). During the next poll, ethers can poll from the next block (i.e. 996).

…#2274, #2652).

…1830, #2274, #2652).

ricmoo · 2022-03-10T00:57:51Z

Ok... It was a lot of experimentation and research, but I think this has been fixed in v5.6.

Please give it a try and let me know if you still have any problems. :)

atropos0902 · 2022-03-10T03:45:26Z

Ok... It was a lot of experimentation and research, but I think this has been fixed in v5.6.

Please give it a try and let me know if you still have any problems. :)

Amazing!!!
I've been waiting for this moment for 2 months.
Let me try now!!!
Thank you @ricmoo

ricmoo · 2022-03-17T07:44:21Z

This was fixed, so I'll close it. Please re-open or create a new issue if there are any problems.

Thanks! :)

…ers-io#1814, ethers-io#1830, ethers-io#2274, ethers-io#2652).

…ethers-io#1814, ethers-io#1830, ethers-io#2274, ethers-io#2652).

claasahl added 2 commits October 25, 2021 19:26

revised tracking of lastBlockNumber

21b01e7

npm run build

6ce6947

claasahl marked this pull request as ready for review November 7, 2021 09:48

ricmoo added a commit that referenced this pull request Mar 9, 2022

Fix missing events on certain network conditions (#1798, #1814, #1830, …

f67a9a8

…#2274, #2652).

ricmoo added a commit that referenced this pull request Mar 9, 2022

Fix events getting blocked from fix for missed events (#1798, #1814, #…

f46aa75

…1830, #2274, #2652).

ricmoo added enhancement New feature or improvement. fixed/complete This Bug is fixed or Enhancement is complete and published. labels Mar 10, 2022

ricmoo closed this Mar 17, 2022

Woodpile37 pushed a commit to Woodpile37/ethers.js that referenced this pull request Jan 14, 2024

Fix missing events on certain network conditions (ethers-io#1798, eth…

1bc0905

…ers-io#1814, ethers-io#1830, ethers-io#2274, ethers-io#2652).

Woodpile37 pushed a commit to Woodpile37/ethers.js that referenced this pull request Jan 14, 2024

Fix events getting blocked from fix for missed events (ethers-io#1798, …

bcad0ab

…ethers-io#1814, ethers-io#1830, ethers-io#2274, ethers-io#2652).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid skipping of events #2274

Avoid skipping of events #2274

claasahl commented Nov 7, 2021

claasahl commented Nov 7, 2021

timaiv commented Nov 12, 2021

claasahl commented Nov 12, 2021

timaiv commented Nov 12, 2021

claasahl commented Nov 13, 2021

claasahl commented Nov 21, 2021

ricmoo commented Nov 21, 2021

timaiv commented Nov 21, 2021 •

edited

Loading

claasahl commented Nov 21, 2021

atropos0902 commented Feb 4, 2022 •

edited

Loading

atropos0902 commented Feb 5, 2022

claasahl commented Feb 5, 2022

ricmoo commented Mar 10, 2022

atropos0902 commented Mar 10, 2022

ricmoo commented Mar 17, 2022

Avoid skipping of events #2274

Avoid skipping of events #2274

Conversation

claasahl commented Nov 7, 2021

Background

Solution

claasahl commented Nov 7, 2021

timaiv commented Nov 12, 2021

claasahl commented Nov 12, 2021

timaiv commented Nov 12, 2021

claasahl commented Nov 13, 2021

claasahl commented Nov 21, 2021

ricmoo commented Nov 21, 2021

timaiv commented Nov 21, 2021 • edited Loading

claasahl commented Nov 21, 2021

atropos0902 commented Feb 4, 2022 • edited Loading

atropos0902 commented Feb 5, 2022

claasahl commented Feb 5, 2022

Problem

Example

Solution

ricmoo commented Mar 10, 2022

atropos0902 commented Mar 10, 2022

ricmoo commented Mar 17, 2022

timaiv commented Nov 21, 2021 •

edited

Loading

atropos0902 commented Feb 4, 2022 •

edited

Loading