-
Notifications
You must be signed in to change notification settings - Fork 802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Internal assert on corrupted filesystem caused boot loop #390
Comments
Hi @vonnieda, sorry for such a late response. Hopefully it's still useful to have your questions answered.
The asserts in littlefs are used to indicate a situation that should not happen. littlefs isn't programmed to behave itself if an assert fails, so the assert provides a last-minute safe-gaurd to prevent it from corrupting itself. The asserts should not normally trigger, however there are enough bugs in littlefs still that seeing an assert failure is not uncommon. I'm working on improving this here: #372, but it's built on the v2 branch. I would suggest keeping the asserts on if you can afford the code-space in production, since there's no other reason to disable them. One big thing that's improved in #372 is how littlefs responds to corrupted disks. Currently, littlefs asserts on a corrupted disk, but in #372 has changed these asserts to return errors, giving users a chance to recover (the Unfortunately, there aren't any plans to backport #372 to v1.
It's the second option. If an assert fails, you can't expect littlefs to work afterwards. However, one thing you can do is replace the assert with a condition that returns an error: - LFS_ASSERT(block < lfs->cfg->block_count);
+ if (block < lfs->cfg->block_count) {
+ return LFS_ERR_CORRUPT;
+ } This should work in more cases and cause littlefs to propogate the error upwards. It's important to note you still can't expect littlefs to work in this situation. This is ok:
This is not a good idea and will likely break later in surprising ways:
If you have threads, another option is killing the thread inside the assert and restarting the "process". |
Thanks for the response @geky! I can afford the code space, but the issue is that this causes a boot loop that I can't predict. If the filesystem has become corrupt, the only way to know is to try to use it and then it crashes due to the assert.
Okay, thanks. I'll look into upgrading to 2.
This is similar to how I've fixed it for now. In my case, the only assert I've ever seen has been Given that that's the only assert I've seen, it would probably be safer to modify the code to return the error as you've mentioned, but I don't want to risk another assert causing a boot loop.
If I were to turn asserts back on, do you have any suggestions for a robust check I could perform? Or would you recommend going straight to v2? Thanks, |
If you're able to, I would recommend adopting v2 (once #372 lands). Unfortunately, the only check I have in MCU-friendly code is littlefs's mount, which as you note doesn't fail gracefully (until #372). |
Thanks @geky - appreciate the response. I'll look towards migrating to v2. |
LFS: v1.3 9ee112a
Also reproduced with LFS v1.7.2
I have a filesystem that became corrupt during a power cycle, I believe. When the device next booted it went into a boot loop due to an internal assert:
Assertion failed: (block < lfs->cfg->block_count), function lfs_cache_read, file littlefs/lfs.c, line 18.
The assert is this one:
The corruption seems to have resulted in a stored block number of 1001106, when the disk is only 1024 blocks long.
I can supply the filesystem image privately if that's helpful. It contains proprietary information, so I am not able to post it publicly. I could probably post just the bad block publicly.
I'm not sure how the corruption happened in the first place, but my bigger concern is that lfs_cache_read asserts in this situation instead of returning an error code. If it returned an error code I could capture that and try to reformat the filesystem.
I have disabled the asserts and then I am able to get an error code and continue by reformatting, instead of crashing. The function tries to read that high block number, the flash subsystem returns an error and the error bubbles up.
So, I have a couple questions:
Thanks,
Jason
The text was updated successfully, but these errors were encountered: