Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lfs_file_sync (lfs_file_close) failure, lfs_dir_commitattr returning LFS_ERR_NOSPC; #478

Open
Karlhead opened this issue Oct 28, 2020 · 13 comments
Labels
enospc needs investigation no idea what is wrong

Comments

@Karlhead
Copy link

Hello,

I'm facing som issues which I'm having a hard time understanding. I've been using lfs for some time now without experiencing this issue before, but lately this issue has been surfacing more than once.

I'm downloading data into several files, one at a time, around 10 files with approx. 80 KB data in each file. The first 9 files are succesfully filled with data and closed correctly, but when I'm trying to close the last file, I will get the LFS_ERR_NOSPC error code returned from the lfs_file_close-function, and the same thing happends if I try to call lfs_file_sync before lfs_file_close as well. The problem is consistent until I re-format the filesystem.

As I'm debugging the lfs_dir_commitattr-function I can see that off + dsize is larger than end, resulting in the following return:
dsize = 16
commit->block = 7712
commit->off = 490
commit->begin = 0
commit->end = 504
if (commit->off + dsize > commit->end) {
return LFS_ERR_NOSPC;
}

However, there is no way that my device is out of memory. There is a total of 8388608 blocks (512B each) to lfs's disposal.
When calling lfs_fs_size, I can verify that only 1960 blocks are in use and 8386648 blocks are left.

Any help would be greatly appreciated.

@geky geky added needs investigation no idea what is wrong bug and removed bug labels Nov 14, 2020
@geky
Copy link
Member

geky commented Dec 18, 2020

Hi @Karlhead, sorry for the late response

The fact that NOSPC is comming from lfs_dir_commitattr suggests that the total file metadata can't fit in the metadata block. Do you have fairly large custom attributes attached to the files? The combined total of the custom attributes, file name, and a bit of extra metadata all need to fit in a single metadata block.

One option is to increase the block size to a multiple of the block device's block size. So 1KiB, 2KiB, 4KiB, etc. With 8M blocks this would also slightly improve the allocator performance.

@Karlhead
Copy link
Author

Hi @geky, no worries.

I have no custom attributes attached to the files, but the metadata is as you pointed out not fitting in the block anyways.
I will consider increasing the blocksize to a multiple.

Thanks!

If you have the time: 8M blocks would slightly improve the allocator performance, how so?

@geky
Copy link
Member

geky commented Dec 22, 2020

8M blocks would slightly improve the allocator performance, how so?

Huh, I don't know what I meant by "8M", maybe that was a typo.

With larger (1/2/4KiB) blocks, the performance of the allocator slightly improves improves because there are less blocks in the filesystem. When the allocator runs, it doesn't actually read each block, but it needs to read metadata referencing the blocks. So less blocks == less metadata == faster allocator.

The tradeoff is that the filesystem may waste more space. LittleFS has inline files, but if your file is larger than the inline size (cache size), it will use full blocks for the file.

Other filesystems do something similar for similar reasons:
https://support.microsoft.com/en-us/help/140365/default-cluster-size-for-ntfs-fat-and-exfat

@Karlhead
Copy link
Author

Thanks!

@remfan77
Copy link

Hello, in my system I reproduce a similar scenario.
The bug is triggered from lfs_dir_commitattr.

static int lfs_dir_commitattr(lfs_t *lfs, struct lfs_commit *commit,
        lfs_tag_t tag, const void *buffer) {
    // check if we fit
    lfs_size_t dsize = lfs_tag_dsize(tag);
    if (commit->off + dsize > commit->end) {
        return LFS_ERR_NOSPC;
    }

This happens using a clean formatted partition, so I can exclude power-loss related bugs.
I'm using littlefs-fuse.
Littlefs partition is 128MB.

To trigger the problem I have to copy a folder from my PC to my ARM target using SAMBA connection. I always see the problem.
The folder size is about 18MB.
sector size = 512
dsize = 16
commit->off = 490
commit->end = 504

If I use a sector size = 1024 (as @geky ) I do not see the problem.

Is this a real solution ?

The find -ls output is attached.
folder.txt
The problem is related to copying one of these (maybe the creating of ./plc/TestFastcat_data)
128 1 drwxr-xr-x 1 root root 512 Jan 13 09:03 ./plc/TestFastcat_data
129 1 drwxr-xr-x 1 root root 512 Jan 13 09:03 ./plc/TestFastcat_data/Alarms
130 1 -rwxr-xr-x 1 root root 832 Dec 22 12:05 ./plc/TestFastcat_data/Alarms/Log.a

Thanks for support and this great project.
Best regards,

Paolo

@geky
Copy link
Member

geky commented Jan 15, 2021

Hmm, do you have any custom attributes? If so how many bytes of custom attributes do you have on each file?

In theory if the size of the file name + custom attributes for a single file is < 1/2 the block size you shouldn't see this. The filesystem should split metadata blocks until each file gets its own metadata block worst case. It's possible there is a bug that is leading to the filesystem not splitting metadata blocks when it needs to.

Other info that would help:

  • A stack trace would be beautiful
  • Knowing if this comes from lfs_dir_compact
  • What is your cache_size?

@geky geky reopened this Jan 15, 2021
@remfan77
Copy link

Hello @geky
block_size=512
cache_size=512 (setted the default block_size)
I do not change anything about custom attributes.
To tell the truth I do not exactly know what is. I saw a little the code. Are they used to set some custum values (for example data and time) ?

void TRIGGER_BUG(void)
{
        printf("TRIGGER_BUG\n");
}
 
static int lfs_dir_commitattr(lfs_t *lfs, struct lfs_commit *commit,
        lfs_tag_t tag, const void *buffer) {
    // check if we fit
    lfs_size_t dsize = lfs_tag_dsize(tag);
    printf("%s : commit->off=%d dsize=%d commit->end%d commit_block=%d\n", __FUNCTION__, commit->off, dsize, commit->end, commit->block);
    if (commit->off + dsize > commit->end) {
        {
        TRIGGER_BUG();

gdb --args lfs /dev/mmcblk3p3 /data2 -f
b TRIGGER_BUG
r

(gdb) backtrace
#0 0x0000b8cc in TRIGGER_BUG ()
#1 0x0000b910 in lfs_dir_commitattr ()
#2 0x0000cd66 in lfs_dir_compact ()
#3 0x0000d2b6 in lfs_dir_commit ()
#4 0x0000df0e in lfs_mkdir ()
#5 0xb6fbcfae in fuse_fs_mkdir (fs=0x1a718, path=0x2aa58 "/plc/TestFastcat_data", mode=493) at fuse.c:2224
#6 0xb6fbf358 in fuse_lib_mkdir (req=0x1a520, parent=2, name=0xb6e5b038 "TestFastcat_data", mode=)
at fuse.c:2945
#7 0xb6fc24f2 in do_mkdir (req=, nodeid=, inarg=) at fuse_lowlevel.c:1126
#8 0xb6fc293c in fuse_ll_process_buf (data=, buf=0xbefffbe8, ch=)
at fuse_lowlevel.c:2443
#9 0xb6fc443a in fuse_session_process_buf (se=se@entry=0x1a4f8, buf=buf@entry=0xbefffbe8, ch=)
at fuse_session.c:87
#10 0xb6fc07ee in fuse_session_loop (se=0x1a4f8) at fuse_loop.c:40
#11 0xb6fbc4ea in fuse_loop (f=f@entry=0x1a618) at fuse.c:4322
#12 0xb6fc55b0 in fuse_main_common (argc=, argv=, op=,
op_size=, user_data=user_data@entry=0x0, compat=compat@entry=0) at helper.c:371
#13 0xb6fc5640 in fuse_main_real (argc=, argv=, op=,
op_size=, user_data=0x0) at helper.c:383
#14 0x00009860 in main ()

I hope this helps.
Thanks

@remfan77
Copy link

An other small information...
I made an archive containing the files/folders that show the problem.
If I uncompress on the target (using tar xvfz ...) it works correctly.

If the files/folders are written by smbd (samba daemon ) I see the problem.

@remfan77
Copy link

Tried to add a mutex on all fuse functions, so each call to lfs is serialized.
I see the same problem.

@remfan77
Copy link

remfan77 commented Jan 20, 2021

Today I tried to downgrade littlefs keeping the same littlefs_fuse.
I found that

  • version with tag v2.0.5 I do not see the problem
  • version with tag v2.1.0 I see the problem
    Now I will try to find the commit that introduces the bug.

@remfan77
Copy link

In my case, the commit 0d4c0b1 introduces the problem.

Tried on different versions. If I revert this commit, I do not see the problem anymore.

I don't know and I do not understand well the internals of littlefs. It could not be the real solution.
It is only based on the experience acquired with attempts made by brute force.

@remfan77
Copy link

remfan77 commented Feb 19, 2021

Now I'm able to reproduce the problem very easily on a linux pc.
I get https://github.com/littlefs-project/littlefs-fuse (v2.4)
make

On the same folder of lfs binary generated, I put the following shell script in a file, for example go.sh

mkdir mnt
dd if=/dev/zero of=lfs.img bs=256K count=1
losetup /dev/loop0 lfs.img  
./lfs /dev/loop0 --format  
./lfs /dev/loop0 mnt  
cd mnt
for i in $(seq 1 8192)
do
        if ! mkdir $i; then
                echo error mkdir $i
                exit 1
        fi
        if ! touch _$i; then
                echo error touch _$i
                exit 1
        fi
done
echo all is OK!

It simply creates a 256K image. It format it in. It creates
1 (directory)
_1 (file lenght 0)
2 (directory)
_2 (file lenght 0)
3 (directory)
_3 (file lenght 0)
4 (directory)
_4 (file lenght 0)
....and so on.

If a mkdir or touch command fails, it stops with a message.

I run go.sh.
I see this message
mkdir: cannot create directory '66': No space left on device
This is the bug! There is free space.

If now I digit manually
mkdir _66
it works.

Now also
mkdir 66
works.

@mrchristian6161
Copy link

mrchristian6161 commented Mar 18, 2022

I continue to see this issue with version 2.4.1. Has there been any progress in resolving this issue?

Additional information:
If I have read_size and prog_size set the same as block_size, this issue occurs. If I reduce read_size and prog_size, then this error does not occur. The documentation in the source says that the read_size and prog_size must be a "factor" of block_size. Even though being equal is technically a factor, I found that I must make them smaller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enospc needs investigation no idea what is wrong
Projects
None yet
Development

No branches or pull requests

4 participants