Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assert block != LFS_BLOCK_NULL in lfs_bd_read failing in lfs_ctz_traverse #295

Open
Ralim opened this issue Sep 25, 2019 · 10 comments
Open

Comments

@Ralim
Copy link

Ralim commented Sep 25, 2019

Hi,

Currently working on getting Littlefs running on a NAND flash.
The flash is 256MB, with erase pages of 128K and and read/write sizes of 2048 bytes.

Running into an issue, were after a long running series of reads and writes to the flash, something is becoming unhappy and the assert here is failing.

When this assert fails, this is the current call stack:

  • lfs_bd_read
  • lfs_ctz_traverse
  • lfs_alloc
  • lfs_dir_alloc
  • lfs_dir_split
  • lfs_dir_compact
  • lfs_dir_commit
  • lfs_file_opencfg

Looking in lfs_ctz_traverse, at the time of the read call before the one that fails the assert, the read is from block 1820 (head), and the result that is read into the buffer is all 0xFF (validated read is correct by manually reading this block as well).

Any ideas on what could have lead to this?
This appears to be showing up on all units after a decent number of file actions, (around 100 files created and then deleted). Files are mix of small (2k) and large (25MB).

I can look into dumping the entire flash memory out if that would be of use?

This is the config structure:


const struct lfs_config Filesystem::LittleFScfg = {
// block device operations
		context:0, //
		read : Filesystem::raw_read,  //
		prog : Filesystem::raw_prog,  //
		erase : Filesystem::raw_erase,  //
		sync : Filesystem::raw_sync,  //

		// block device configuration
		read_size : RawFlash::get_read_size(),  //minimum read size
		prog_size : RawFlash::get_program_size(),  // minimum programming size
		block_size : RawFlash::get_erase_size(),  //Erase block size
		block_count : RawFlash::size() / RawFlash::get_erase_size(),  // Number of eraseable blocks
		block_cycles:1000,  // Cycle blocks around every 1k writes
		cache_size :LFS_BUFFER_SIZE,  //Size of the cache buffer
		lookahead_size : LFS_LOOKAHEAD_SIZE,  // Size of the lookahead scratch buffer
		read_buffer : lfs_read_buffer,  // Static read buffer
		prog_buffer : lfs_prog_buffer,  // Static programming buffer
		lookahead_buffer:lfs_lookahead_buffer,  // Static lookahead buffer
		name_max: 65,  // Max length of file names
		file_max:LFS_FILE_MAX,  //Max fileSize
		attr_max:LFS_ATTR_MAX, };
@arduzilla
Copy link

I have a very similar issue.

@raulmcriado
Copy link

I get the same assertion at lfs_bd_read : LFS_ASSERT(block != LFS_BLOCK_NULL); but with a different call. It just happens some times when my system reboots and inspect a file directory after mounting the file system without problem:

lfs_bd_read (ASSERT)
lfs_dir_getslice
lfs_dir_get
lfs_dir_getinfo
lfs_dir_read

Once I get the assertion and my system reboots, I don't get the problem anymore. I will update this message with the value of block, as I can't intentionally reproduce the failure as it happens just some times after booting (with a previous use of creation/deletion of files regularly)

@raulmcriado
Copy link

raulmcriado commented Nov 15, 2019

I am going to test it, I will post again with more info then, thank u Ralim for referencing it.

EDIT:
did not solve the assertion for me. I have the same behavior. After some startups (after some run - sleep cycles) I get the assertion and my system restarts again an does the exactly same thing and the error is out for some time, with the file system functional.

@raulmcriado
Copy link

raulmcriado commented Nov 20, 2019

Hi again, it seems that despite I get an assertion message " assertion in lfs.c line 32", the assertion occurs on line 75 :
// load to cache, first condition can no longer fail
LFS_ASSERT(block < lfs->cfg->block_count);

I am using Nuttx RTOS, I updated the lfs.c/h to the version v2.1.2 and then I updated it to the fix.

Here I attach some screenshots of each frame just before I get the assertion in case it is useful:
Assertion_littlefs_readdir

@geky geky added the needs investigation no idea what is wrong label Nov 26, 2019
@geky
Copy link
Member

geky commented Nov 26, 2019

Hi guys, thanks for creating an issue.

Some notes on this: This assertion is the same as assert(block != 0xffffffff) in previous versions and similar to the assert(block < lfs->cfg->block_count). There are a few other posts though I don't think they are related (#12, #142, #12, #271)

Both LFS_ASSERT(block != LFS_BLOCK_NULL) and LFS_ASSERT(block < lfs->cfg->block_count) indicate the same thing: that littlefs attempted to read from an invalid block address.

Unfortunately this is sort of like a hard fault, there's not much info to go on without more context.

This stack trace should help quite a bit, and if you are able to get a filesystem dump that would help. Especially if you're able to consistently trigger the assertion through mounting the same image.

It's also entirely possible your issues are unrelated even though you're both hitting the same assert.

@geky
Copy link
Member

geky commented Dec 5, 2019

Hi all, an update. These asserts are unfortunately very generic, and power-loss bug require the entire of the filesystem work correctly, so they can be difficult to track down.

My current plan is to take a step-back and rework the testing infrastructure to more aggressively cover these sort of power-resilience bugs. Once that is in place, it should be much easier to build reproducible test cases and work through these sort of bugs.

Hopefully most of the bugs will just fall out with the additional coverage.

Anyways, sorry I don't have much immediate help. I just wanted to give an update so you all know these issues aren't being ignored. All of these issues are valuable, I've just had to prioritize issues based on the time I have.

@raulmcriado
Copy link

Thank you for your help! I understand that it is a very generic (and hard to reproduce as it is the case in my environment).

thanks a lot again for your work.

@eastmoutain
Copy link

@raulmcriado have you invoked lfs_rename in your code, i have reproduce the assertion, it's caused by lfs_rename, see 343

@geky
Copy link
Member

geky commented Jan 27, 2020

Ok, sorry about the wait. I think I've been able to hammer out most of the LFS_ASSERT(block != LFS_BLOCK_NULL) issues. The fixes should be on this branch: #372, which I will be working on merging.

If you all find cases where littlefs is still failing, please let me know.

The benefit of the wait is that the testing framework on that branch is now quite a bit more powerful, so I should be better prepared to find these sort of power-loss bugs.

@geky geky added fixed? v2.2 and removed needs investigation no idea what is wrong needs test all fixes need test coverage to prevent regression labels Jan 27, 2020
@Ralim
Copy link
Author

Ralim commented Jan 28, 2020

Very impressed with the new testing framework.

I'll be keen to deploy this to a few test units to see if it performs better :)

Will let you know if I run into any more issues. Currently also chasing winbond nand failing in odd ways 😢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants