Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle feed consistency errors more gracefully #40

Open
cryptix opened this issue Dec 20, 2019 · 2 comments
Open

handle feed consistency errors more gracefully #40

cryptix opened this issue Dec 20, 2019 · 2 comments
Labels
bug Something isn't working

Comments

@cryptix
Copy link
Member

cryptix commented Dec 20, 2019

Even though the gossip plugin has strict locking to only fetch each feed once at a time it seems like there is still a problem with this...

What ever the cause may be, if the gossip scheduler throws consistency error: wrong stored message sequence for .. we need to handle that gracefully.

first reported here: sunrise-choir/sunrise-social-android-app#2

@cryptix
Copy link
Member Author

cryptix commented Dec 20, 2019

Since messages are appended to the root offset log first and indexes updated after that, it's hard to diagnose with a good reproduction story. In the normal cases the KV shouldn't complain about having to many messages.

One possible idea is that the log-append (and file system write) call goes through, the message(s) get indexed and the KV updated and THEN the app crashes.

I'm not 100% certain but I think if the offset log is written without a filesystem sync, the index COULD be further along since the new messages never made it to disk.

cc @keks

@cryptix cryptix changed the title handle local storage fuck-ups better handle feed consistency errors more gracefully Dec 20, 2019
cryptix added a commit that referenced this issue Dec 20, 2019
since .Seq() wasn't called on the stored field it printed the internal struct value of the message instead.

updates #40
@cryptix cryptix added the bug Something isn't working label Jun 5, 2020
@cryptix
Copy link
Member Author

cryptix commented Jun 5, 2020

There is no filesystem check (fsck) code on package sbot and the go-sbot tool using -fsck and `-repair.

The root cause is still diagnosed even though I tried multiple times...

The main protection against fetching feeds multiple times curreently should be this activeFetch map. Which should dictate to skip a feed with a running createHistoryStream but something is still off..

The current test base makes it hard to produce this bug. I feel like I need to make a new one, creating a couple hundred feeds, sync them all to a 2nd bot, truncate it a little to get lots of incomplete feeds and sync again to try to cause this bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

No branches or pull requests

1 participant