Unexpected error messages after upgrading from 0.17.0 to 0.21.0 #454

bajtos · 2023-10-24T09:13:26Z

While upgrading rusty-lassie from Lassie 0.17.0 to 0.21.0 (see filecoin-station/rusty-lassie#68), I noticed new error messages in the test output.

These messages seem to be printed only when the retrieval is aborted via "max blocks" or "global timeout" feature.

Is this a known issue? How can I get rid of these messages?

The messages are repeated many times in my test output; I am posting one example for each. I also changed the formatting to make things easier to read.

2023-10-24T11:04:22.574+0200	ERROR	bitswap
providerquerymanager/providerquerymanager.go:344	
Received provider (12D3KooWNy2SmgY384B9EHVHdbSkvXEADwxjzd4ejXQqicxWpxxT) for cid (Qme3KPUshfASLi2kn78dr69hsNsW6DnpnFHnbBHyECiTjp) not requested

2023-10-24T11:04:23.191+0200	ERROR	lassie/retriever	
retriever/parallelpeerretriever.go:277	
Error recording retrieval failure for protocol transport-ipfs-gateway-http: internal error, no such active retrieval 80e3c2b4-0797-4bfc-bc72-80ee3df5fb90

cc @hannahhoward @rvagg

The text was updated successfully, but these errors were encountered:

rvagg · 2023-10-25T00:17:00Z

Thanks for the log output @bajtos; this is likely related to some context handling done across all the parallel workloads where cancellation needs to interrupt it all and there's been a few changes in that version range that adjust the timing subtly such that these errors may pop up. I haven't seen these particular ones but I have been bothered by some odd "context cancelled" log messages that shouldn't be seen!

What I can say is that neither of these errors is fatal, they're more of the type of "I probably shouldn't get in this state". They're annoying and they point to something that should be addressed, but I'm pretty confident they're just related to the retrieval bail state - the timing of "shut it all down" not hitting all the right things in the right order, but that doesn't indicate clean-up problems.

The first error is bitswap - we got a peer from the indexer and fed it in to bitswap but at the same time we also told bitswap to stop bothering with this CID.

The second error is from the SP scoring mechanism—keeping track of how SP's perform so we can make better choices about prioritising. "you told me a candidate failed a retrieval, but you already told me that retrieval had ended".

I'll look in to it and see if I can replicate it these.

bajtos · 2023-10-25T08:09:36Z

Thank you, @rvagg, for the detailed explanation! ❤️

I also assumed these errors were not fatal, so I am not worried. I agree with you these logs are annoying; that's why I opened this issue.

I can reliably reproduce this issue on my machine when I run cargo test in filecoin-station/rusty-lassie@5b0f37c

❯ cargo version
cargo 1.72.0 (103a7ff2e 2023-08-15)
❯ rustc --version
rustc 1.72.0 (5680fa18f 2023-08-23)
❯ go version
go version go1.20.4 darwin/arm64

rvagg · 2023-11-27T02:38:32Z

(a note for myself that I started tinkering with test cases to try and capture this in my local branch rvagg/max-block-limit-probs-tmp, didn't make enough progress to share anything yet though, just don't want to forget what that branch is for!)

rvagg self-assigned this Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected error messages after upgrading from 0.17.0 to 0.21.0 #454

Unexpected error messages after upgrading from 0.17.0 to 0.21.0 #454

bajtos commented Oct 24, 2023 •

edited

Loading

rvagg commented Oct 25, 2023

bajtos commented Oct 25, 2023 •

edited

Loading

rvagg commented Nov 27, 2023

Unexpected error messages after upgrading from 0.17.0 to 0.21.0 #454

Unexpected error messages after upgrading from 0.17.0 to 0.21.0 #454

Comments

bajtos commented Oct 24, 2023 • edited Loading

rvagg commented Oct 25, 2023

bajtos commented Oct 25, 2023 • edited Loading

rvagg commented Nov 27, 2023

bajtos commented Oct 24, 2023 •

edited

Loading

bajtos commented Oct 25, 2023 •

edited

Loading