Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in encfs::FileNode::open(int) #214

Closed
m13253 opened this issue Sep 15, 2016 · 60 comments
Closed

Segfault in encfs::FileNode::open(int) #214

m13253 opened this issue Sep 15, 2016 · 60 comments

Comments

@m13253
Copy link

m13253 commented Sep 15, 2016

Description:
After upgrading to encfs 1.9-2, using pam_encfs to automatically decrypt home directory causes coredump.
I can not confirm whether it is a bug of encfs or a bug of pam_encfs.
Log attached in log.txt.
I reproduced it 3 times. But sorry I did not manage to debug the program, because I can not afford any data loss.
Downgrading to 1.8.1-7 solves the problem.

Additional info:
ArchLinux x86_64
encfs: 1.9-2
pam_encfs: 0.1.4.4-4

Cross post: https://bugs.archlinux.org/task/50789

@vgough
Copy link
Owner

vgough commented Sep 18, 2016

I haven't been able to reproduce this issue. Makes me wonder if it depends on a particular version or FUSE or kernel. Committed an additional null check in this path just in case.

@m13253
Copy link
Author

m13253 commented Sep 20, 2016

Thank you.
I will try to reproduct this problem later this week. (I'll need a backup before testing)

Could you please tell me what additional info do I need to help diagnoze this problem? (In addition to a core dump)

@vgough
Copy link
Owner

vgough commented Sep 25, 2016

It would be helpful if you can reproduce this without pam and document the steps you took to see the issue.

Also, please provide the encfs configuration used.

@m13253
Copy link
Author

m13253 commented Sep 26, 2016

Also, please provide the encfs configuration used.

encfs -S --idle=1 -v /home/.Private/brilliant /home/brilliant -- -o allow_other,allow_other,nonempty

Filesystem cipher: "ssl/aes", version 3:0:0 (using 3:0:2)
Filename encoding: "nameio/block", version 4:0:0 (using 4:0:2)
Key Size: 192 bits
Using PBKDF2, with 372806 iterations
Salt Size: 160 bits
Block Size: 1024 bytes
Each file contains 8 byte header with unique IV data.
Filenames encoded using IV chaining mode.
File holes passed through to ciphertext.

It would be helpful if you can reproduce this without pam and document the steps you took to see the issue.

I will try to catch a core dump next time, both with pam and without pam. (Between Oct 1-7, on which I will have holidays)
The crash happens when KDE is loading. It will not crash when I log into it with Virtual Terminal.
Although I have no core dump, there is a log, which is provided in the link.

I am terribly sorry for having you been waiting so long.

@m13253
Copy link
Author

m13253 commented Sep 27, 2016

I reinstalled the old version, and produced a core dump.
I will analyze it. What extra action would you like me to do with my core dump?

@m13253
Copy link
Author

m13253 commented Sep 27, 2016

This is a transcript of the disassembly:

; encfs::FileNode::open(int) const ()
push rbp
push rbx
mov  rbx,rdi
mov  ebp esi
call pthread_mutex_lock@plt
mov  rdi,QWORD PTR [rbx+0x38]
mov  esi,ebp
mov  rax,QWORD PTR [rdi] ; <== crash here, rdi=0x00000000
call QWORD PTR [rax+0x38]
mov  rdi,rbx
mov  ebp,eax
call pthread_mutex_unlock@plt
add  rsp,0x8
mov  eax,ebp
pop  rbx
pop  rbp
ret
mov  rbp,rax
mov  rdi,rbx
call pthread_mutex_unlock@plt
mov  rdi,rbp
call _Unwind_Resume@plt

This is the traceback of the crashing thread:

encfs::FileNode::open(int) const (); // from libencfs.so.1.9
encfs::_do_flush(encfs::FileNode*) (); // from libencfs.so.1.9
?? () // from libencfs.so.1.9
?? () // from libencfs.so.1.9
encfs::encfs_flush(char const*, fuse_file_info*) (); // from libencfs.so.1.9
?? () // from libfuse.so.2
// some more from libfuse.so.2

@m13253
Copy link
Author

m13253 commented Sep 27, 2016

Some clue was that, I was starting some apps, which writes something into ~/.cache before triggering a crash.
I think there might be some corrupt file in ~/.cache, which causes encfs to fail.

I contacted ArchLinux packager, who said starting from 1.9, encfs on ArchLinux is linked to system-wide tinyxml. I will link it against embedded tinyxml and try again.
And I will enable -g, which will provide debugging information for us.

@m13253
Copy link
Author

m13253 commented Sep 27, 2016

I contacted ArchLinux packager, who said starting from 1.9, encfs on ArchLinux is linked to system-wide tinyxml. I will link it against embedded tinyxml and try again.

Rebuilt. Confirmed tinyxml is not the cause.
Now I have debugging information, I will report my analysis several hours later.

But this time, the traceback changed, the crashing function changed to "read":

#0 encfs::FileNode::read (this=0x7fe73c001290, other args optimized out) (FileNode.cpp:209)
#1 encfs::_do_read (args optimized out) (encfs.cpp:603)
#2 std::function<int (encfs::FileNode*)>::operator()(encfs::FileNode*) const (args optimized)
#3 encfs::<lambda(encfs::FileNode*)>::operator()(encfs::FileNode *) const (__closure=0x7fe75a7fbab0, fnode=0x7fe73c001290) (encfs.cpp:136)
#4 encfs::withFileNode(const char*, const char*, fuse_file_info*, std::function<int(encfs::FileNode*)>) (opname="read", path="/.config/fcitx/dbus/a69ac****4dd3-0", fi=fi@entry=0x7fe75a7fbd00, op=...) (encfs.cpp:140)
#5 encfs::encfs_read (path="/.config/fcitx/dbus/a69ac****4dd3-0", buf=optimized, size=optimized, file=0x7fe75a7fbd00) (encfs.cpp:609)
#6 fuse_fs_read_buf()

I examined *this, got the following thing:

*this = {
    mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = -1, __spins = 0, __slision = 0, __list = {__prev = 0, __next = 0}}, _size = '\0' repeat 16, "\xff\xff\xff\xff", '\0' repeat 19, __align = 0},
    fsConfig = std::shared_ptr (count 81, weak 0) 0x1520e50,
    io = std::shared_ptr (empty) 0x0,
    _pname = "",
    _cname = "",
    parent = 0x15223c0
}

It seems that this is an invalid FileNode. I tracked above the creation of this FileNode, and found this:

139     if (fi != nullptr && fi->fh != 0)
140       res = do_op(reinterpret_cast<FileNode *>(fi->fh));
141     else
142       res = do_op(FSRoot->lookupNode(path, opName).get());

So this FileNode is retrieved form the "file handle" property from a fuse data type.
Track back the storage of this property, I found this:

530     std::shared_ptr<FileNode> fnode =
531         FSRoot->openNode(path, "open", file->flags, &res);
...
538         file->fh =
539             reinterpret_cast<uintptr_t>(ctx->putNode(path, std::move(fnode)));

This change was introduced in commit af64702, which happened on April this year, right before 1.9.1 was released.

First, I doubt if it is safe to use rvalue-ref (std::move). Will it corrupt the data? I'm not sure.
Second, it reinterpret_cast a std::shared_ptr into an integer. So C++ loses the the track of the reference count. Maybe the pointer is released earlier than expected, causing random data filled into the memory.

Since it is related to my data integrity. I don't want to lose anything. I didn't try to modify the code blindly and see if it works. I want to hear your instructions.

I will later use valgrind to mount encfs again. I hope I could find something more.

@m13253
Copy link
Author

m13253 commented Sep 27, 2016

I will later use valgrind to mount encfs again. I hope I could find something more.

It seems that valgrind does not work with suid programs (e.g. fuse).

fuse: failed to exec fusermount: Permission denied

I found it difficult to set up a isolated sandbox environment. It seems high load (e.g. the startup of KDE) is required to trigger this bug. Maybe a virtual machine with full KDE installed is a possible method to do experiments.

I will downgrade from 1.9.1-2 to 1.8.1-7 to maintain my everyday work. I hope you can give me further instructions.

@Sesshu
Copy link

Sesshu commented Oct 2, 2016

I'm also using ArchLinux and pam_encfs and having this crash. encfs complains a lot about "getattr error: No such file or directory" before it crashes.

@m13253
Copy link
Author

m13253 commented Oct 9, 2016

I'm also using ArchLinux and pam_encfs and having this crash. encfs complains a lot about "getattr error: No such file or directory" before it crashes.

Try the package encfs18 from AUR repo -- that's the last working version of encfs. And let's hope this problem could be fixed.

It's hard to tell whether it is a "pam_encfs" problem yet -- because I can not make a test without this module: that will break my display manager.
Does your login architecture permit you to disable "pam_encfs"? If so, please conduct a test, with pam_encfs disabled.

@ial0
Copy link
Contributor

ial0 commented Nov 3, 2016

So, I have been attempting to create this segfault using pam_encfs. Using a test copy of my own kde config I got a segfault in encfs_flush, using a new kde configuration did not trigger this segfault.

Next I tried to get this failing on an encfs instance outside of pam_encfs, so far I have not been able to do. However what I did notice was a hang waiting on the logging output to the terminal.
I retested pam_encfs with without verbose logging and did not segfault.

While I cannot pinpoint any particular error, I believe there is a race and could possibly be in the logging system.

@dickerpulli
Copy link

dickerpulli commented Jan 2, 2017

I have the same problem with my MacBook Pro (2016) with OSX Sierra.

Last line with "encfs -v"

2017-01-02 20:26:10,553 VER [encfs.cpp:128] op: flush :
/Users/thomas/bin/mount-encfs-dropbox.sh: line 20: 3189 Segmentation fault: 11 $ENCFS -v -f $SOURCE $TARGET --extpass="security 2>&1 >/dev/null find-generic-password -gl '$KEYCHAIN_PASSWORD' |grep password|cut -d \" -f 2" -ovolname=$VOLUME_TITLE -oallow_other -olocal -ohard_remove -oauto_xattr -onolocalcaches

The problem occurs if I enable spotlight indexing in the encrypted folder.

CrashReport attached.

encfs.txt

@t-dan
Copy link

t-dan commented Feb 18, 2017

I Don't know if this is the same issue (well, seems to be ...), but I am attaching coredump info from my computer.

I am using Arch distro with:
community/encfs 1.9.1-3
extra/fuse-common 3.0.0-1
extra/fuse2 2.9.7-3
Linux version 4.9.8-1-ARCH (builduser@tobias) (gcc version 6.3.1 20170109 (GCC) ) #1 SMP PREEMPT Mon Feb 6 12:59:40 CET 2017

Thank you for dealing with the issue.

encfs.txt

@m13253
Copy link
Author

m13253 commented Feb 20, 2017

@t-dan
Thank you for your coredump.
You can install encfs18 from AUR, as a temporary workaround. That's the last working version.

@benrubson
Copy link
Contributor

Sounds like Easylogging 9.93 solves these issues (#214 & #244).

@benrubson
Copy link
Contributor

#298 merged, issue should be solved 👍

@benrubson
Copy link
Contributor

@m13253 would you test master branch and close this issue if solved ?
Thank U 👍

@m13253
Copy link
Author

m13253 commented Apr 3, 2017

Tested. Not solved.

Encfs crashes after 2 minutes I logged in to Gnome when I tried to open Chrome browser.

Apr 03 14:27:27 brilliant-laptop systemd-coredump[3676]: Process 1418 (encfs) of user 1000 dumped core.
                                                         
                                                         Stack trace of thread 3131:
                                                         #0  0x00007fe10353c2f6 _ZNK5encfs8FileNode4openEi (/usr/lib/libencfs.so.1.9.1)
                                                         #1  0x00007fe103533bdb _ZN5encfs9_do_flushEPNS_8FileNodeE (/usr/lib/libencfs.so.1.9.1)
                                                         #2  0x00007fe103533f30 _ZZN5encfsL12withFileNodeEPKcS1_P14fuse_file_infoSt8functionIFiPNS_8FileNodeEEEENKUlS6_E_clES6_ (/u
sr/lib/libencfs.so.1.9.1)
                                                         #3  0x00007fe1035351fd withFileNode (/usr/lib/libencfs.so.1.9.1)
                                                         #4  0x00007fe1035359ef _ZN5encfs11encfs_flushEPKcP14fuse_file_info (/usr/lib/libencfs.so.1.9.1)
                                                         #5  0x00007fe1031337b7 n/a (libfuse.so.2)
                                                         #6  0x00007fe103133a40 n/a (libfuse.so.2)
                                                         #7  0x00007fe10313a066 n/a (libfuse.so.2)
                                                         #8  0x00007fe10313af91 n/a (libfuse.so.2)
                                                         #9  0x00007fe103137738 n/a (libfuse.so.2)
                                                         #10 0x00007fe102f0d2e7 start_thread (libpthread.so.0)
                                                         #11 0x00007fe1026af54f __clone (libc.so.6)

@benrubson
Copy link
Contributor

benrubson commented Apr 3, 2017

Thx for your test, sorry for its result...
Would be nice I think to make a test getting rid of Easylogging++, so that we would know whether crash comes from this lib or not.

@m13253
Copy link
Author

m13253 commented Apr 3, 2017

Would be nice I think to make a test getting ride of Easylogging++

I couldn't get your point, what should I do now?

@benrubson
Copy link
Contributor

We should try to make an encfs test version which does not include Easylogging++ library (easylogging++.* files).

@benrubson
Copy link
Contributor

Please test this version :
git clone -b test214 https://github.com/benrubson/encfs.git
I have totally removed easylogging++ so that we will know if the issue comes from this lib.
Thank you 👍

@m13253
Copy link
Author

m13253 commented Apr 18, 2017

Thank you for your patch.

But I'm sorry I just messed up my Linux installation and had to reinstall. So I need to squeeze up some time to rebuild my environment and reproduce this bug.

Please wait for me for some days. Then I'll test your patch.

Thank you 👍

@rfjakob
Copy link
Collaborator

rfjakob commented Jul 7, 2017

Good idea. However, I tried this, and libfuse prevents an open file from being unlinked (overwritten by the rename) by renaming it to ".fuse_hiddenXXX":

$ cat >> file-a &
$ cat >> file-b &
$ mv file-b file-a
$ ls -la
total 16
drwxrwxr-x. 2 jakob jakob 4096  7. Jul 22:47 .
drwxrwxr-x. 8 jakob jakob 4096  7. Jul 22:43 ..
-rw-rw-r--. 1 jakob jakob   24  7. Jul 22:46 file-a
-rw-rw-r--. 1 jakob jakob   15  7. Jul 22:47 .fuse_hidden0000000300000001

@rfjakob
Copy link
Collaborator

rfjakob commented Jul 7, 2017

I added a canary value to FileNode and I can now reproduce the issue in seconds. Commit is: rfjakob@1021593

Running fsstress-encfs from the gocryptfs test suite I get in syslog:

Jul 08 00:46:34 brikett encfs[23277]: ERROR canary=CANARY_RELEASED. File node accessed after it was released.
Jul 08 00:46:34 brikett encfs[23277]: ERROR withFileNode: error caught in flush: dead canary
Jul 08 00:46:34 brikett encfs[23277]: ERROR canary=CANARY_RELEASED. File node accessed after it was released.
Jul 08 00:46:34 brikett encfs[23277]: ERROR withFileNode: error caught in flush: dead canary
Jul 08 00:46:34 brikett encfs[23277]: ERROR canary=CANARY_RELEASED. File node accessed after it was released.
Jul 08 00:46:34 brikett encfs[23277]: ERROR withFileNode: error caught in flush: dead canary
Jul 08 00:46:36 brikett encfs[23277]: ERROR canary=CANARY_RELEASED. File node accessed after it was released.
Jul 08 00:46:36 brikett encfs[23277]: ERROR withFileNode: error caught in flush: dead canary

Unless my canary is buggy, this should give us some leverage.

@erikman
Copy link

erikman commented Jul 8, 2017

@rfjakob Nice! Reproduction is a key in solving this. Should you also set CANARY_OK in EncFS_Context::putNode? I can't really wrap my head around all the different code paths.

@rfjakob
Copy link
Collaborator

rfjakob commented Jul 16, 2017

@DominikChmiel Can you test if rfjakob@1021593 fixes the crash your were seeing and/or the dead canary error triggers?

@DominikChmiel
Copy link

@rfjakob ran the build process that previously had a good chance of crashing encfs with your version 5x now. No crash so far, but the following in syslog:


Jul 17 11:44:51 _hostname_ encfs[22348]: ERROR canary=CANARY_RELEASED. File node accessed after it was released.
Jul 17 11:44:51 _hostname_ encfs[22348]: ERROR withFileNode: error caught in getattr: dead canary
Jul 17 11:44:51 _hostname_ encfs[22348]: ERROR canary=CANARY_RELEASED. File node accessed after it was released.
Jul 17 11:44:51 _hostname_ encfs[22348]: ERROR withFileNode: error caught in flush: dead canary
Jul 17 11:44:51 _hostname_ encfs[22348]: ERROR canary=CANARY_DESTROYED. File node accessed after it was destroyed.
Jul 17 11:44:51 _hostname_ encfs[22348]: ERROR withFileNode: error caught in flush: dead canary
Jul 17 11:45:13 _hostname_ encfs[22348]: ERROR canary=0x6c000078. Corruption?
Jul 17 11:45:13 _hostname_ encfs[22348]: ERROR withFileNode: error caught in read: dead canary
Jul 17 11:45:13 _hostname_ encfs[22348]: ERROR canary=0xe8f7f560. Corruption?
Jul 17 11:45:13 _hostname_ encfs[22348]: ERROR withFileNode: error caught in read: dead canary
Jul 17 11:45:13 _hostname_ encfs[22348]: ERROR canary=0x64756c63. Corruption?
Jul 17 11:45:13 _hostname_ encfs[22348]: ERROR withFileNode: error caught in flush: dead canary

<SNIP>

Jul 17 11:46:17 _hostname_ encfs[22348]: ERROR canary=CANARY_DESTROYED. File node accessed after it was destroyed.
Jul 17 11:46:17 _hostname_ encfs[22348]: ERROR withFileNode: error caught in read: dead canary
Jul 17 11:46:17 _hostname_ encfs[22348]: ERROR canary=CANARY_DESTROYED. File node accessed after it was destroyed.
Jul 17 11:46:17 _hostname_ encfs[22348]: ERROR withFileNode: error caught in read: dead canary
Jul 17 11:46:17 _hostname_ encfs[22348]: ERROR canary=CANARY_DESTROYED. File node accessed after it was destroyed.
Jul 17 11:46:17 _hostname_ encfs[22348]: ERROR withFileNode: error caught in flush: dead canary

@rfjakob
Copy link
Collaborator

rfjakob commented Jul 17, 2017

Ok, excellent, we are on the right track. Thanks for testing!

rfjakob added a commit that referenced this issue Jul 19, 2017
Adds a uint32 value to FileNode that is initialized to an
arbitrary value (CANARY_OK) in the constructor, and reset
when the reference is dropped (CANARY_RELEASED,
CANARY_DESTROYED).

The canary is checked on each withFileNode call.

Makes it much easier to trigger the bug seen in
#214 .
@rfjakob
Copy link
Collaborator

rfjakob commented Jul 23, 2017

@DominikChmiel I finished a proper fix yesterday, could you

git clone https://github.com/rfjakob/encfs.git

and try again? The canary errors should be gone now, as well.

@DominikChmiel
Copy link

Thanks for your work @rfjakob , will test it when I'm at my productive setup tomorrow.

@DominikChmiel
Copy link

DominikChmiel commented Jul 24, 2017

@rfjakob No Crash + no canary error messages, but the following:

Jul 24 12:00:16 _hn_ encfs[12110]: ERROR withFileNode: error caught in read: fh=40345 not found in fuseFhMap
Jul 24 12:00:16 _hn_ encfs[12110]: ERROR withFileNode: error caught in read: fh=40345 not found in fuseFhMap
Jul 24 12:00:16 _hn_ encfs[12110]: ERROR withFileNode: error caught in flush: fh=40345 not found in fuseFhMap
Jul 24 12:00:16 _hn_ encfs[12110]: ERROR withFileNode: error caught in read: fh=41519 not found in fuseFhMap
Jul 24 12:00:16 _hn_ encfs[12110]: ERROR withFileNode: error caught in read: fh=41519 not found in fuseFhMap
Jul 24 12:00:16 _hn_ encfs[12110]: ERROR withFileNode: error caught in read: fh=41519 not found in fuseFhMap
Jul 24 12:00:16 _hn_ encfs[12110]: ERROR withFileNode: error caught in read: fh=41519 not found in fuseFhMap
Jul 24 12:00:16 _hn_ encfs[12110]: ERROR withFileNode: error caught in read: fh=41519 not found in fuseFhMap
Jul 24 12:00:16 _hn_ encfs[12110]: ERROR withFileNode: error caught in flush: fh=41519 not found in fuseFhMap
Jul 24 12:00:16 _hn_ encfs[12110]: ERROR withFileNode: error caught in flush: fh=41519 not found in fuseFhMap
Jul 24 12:00:16 _hn_ encfs[12110]: ERROR withFileNode: error caught in flush: fh=41519 not found in fuseFhMap
Jul 24 12:00:28 _hn_ encfs[12110]: ERROR withFileNode: error caught in read: fh=48715 not found in fuseFhMap
Jul 24 12:00:28 _hn_ encfs[12110]: ERROR withFileNode: error caught in read: fh=48715 not found in fuseFhMap
Jul 24 12:00:28 _hn_ encfs[12110]: ERROR withFileNode: error caught in flush: fh=48715 not found in fuseFhMap
Jul 24 12:00:28 _hn_ encfs[12110]: ERROR withFileNode: error caught in read: fh=48961 not found in fuseFhMap
Jul 24 12:00:28 _hn_ encfs[12110]: ERROR withFileNode: error caught in read: fh=48961 not found in fuseFhMap
Jul 24 12:00:28 _hn_ encfs[12110]: ERROR withFileNode: error caught in flush: fh=48961 not found in fuseFhMap

Checked with commit rfjakob/encfs-next@bce3cee (branch issue214 from your repo)

@rfjakob
Copy link
Collaborator

rfjakob commented Jul 24, 2017

Ok, thanks, these messages were expected. I should proba bly downgrade them to warnings.

The build completes successfully, right?

@benrubson
Copy link
Contributor

Really strange to have flush after filehandle has been released.
Would be interesting to know if these messages still raise mounting with -o direct_io.
Thx 👍

@DominikChmiel
Copy link

Just installed that version again to recheck:
The build-process I'm using to trigger this bug does not complete successfully. I'm getting I/O Errors on random files during the process (which are not present using the older 1.8 version).

As for the build of encfs: The build itself runs fine, however make test fails for reasons I can't quite track down:

Running tests...
Test project /home/dominik/projects/encfs_patched/src/build
    Start 1: checkops
1/2 Test #1: checkops .........................   Passed    0.11 sec
    Start 2: scriptedtests
2/2 Test #2: scriptedtests ....................***Failed    0.19 sec

50% tests passed, 1 tests failed out of 2

Total Test time (real) =   0.30 sec

The following tests FAILED:
          2 - scriptedtests (Failed)

After remounting with -o direct_io these I/O errors + entries in syslog persist.

@benrubson
Copy link
Contributor

Run make check instead of make test to have further error details.

@DominikChmiel
Copy link

Seems to fail during a sanity-check of the resulting binary:

2/2 Test #2: scriptedtests ....................***Failed    0.19 sec

runTests: mode=standard

#   Failed test 'encfs command returns 0'
#   at /home/dominik/projects/encfs_patched/src/encfs/tests/normal.t.pl line 312.
Bailout called.  Further testing stopped:  
FAILED--Further testing stopped.


50% tests passed, 1 tests failed out of 2

https://github.com/rfjakob/encfs/blob/issue214/tests/normal.t.pl#L312

@DominikChmiel
Copy link

Seems to me that this is due to the hardcoded path of ./build/encfs. I'm currently using the folder layout of the arch-repo PKGBUILD: https://git.archlinux.org/svntogit/community.git/tree/trunk/PKGBUILD?h=packages/encfs

@DominikChmiel
Copy link

Okay, just checked by going through the build process manually as detailed in INSTALL.md. Tests pass fine now:

Test project /home/dominik/projects/tmp/encfs/build
    Start 1: checkops
1/2 Test #1: checkops .........................   Passed    0.12 sec
    Start 2: scriptedtests
2/2 Test #2: scriptedtests ....................   Passed   22.51 sec

100% tests passed, 0 tests failed out of 2

Total Test time (real) =  22.63 sec
Built target check

Was a folder layout error on my part.

@rfjakob
Copy link
Collaborator

rfjakob commented Jul 24, 2017

Thanks for testing, dominik. That your build system receives i/o errors means that fixing the crashes was only half of the story.

I'll try reproducing with something like a linux kernel build.

@DominikChmiel
Copy link

DominikChmiel commented Jul 24, 2017

Went back to rfjakob/encfs-next@c8ff1f9 (where you had just added the canary) because I wasn't sure about the build-result with that version. Can confirm the I/O Errors were already present there, my bad for missing that earlier.

If there are any further tests I can do to help you just let me know.

@rfjakob
Copy link
Collaborator

rfjakob commented Jul 24, 2017

I can reproduce the I/O error using a Linux kernel build. I think I found the root cause: Concurrent opens of the same file can result in two FileNodes for the same path. Commit af64702 does not take that into account.

Excerpt from a debug log:

encfs_open: path=/torvalds-linux/arch/x86/include/uapi/asm/errno.h fh=195535
encfs_open: path=/torvalds-linux/arch/x86/include/uapi/asm/errno.h fh=195533

Patch is in the works but not ready to publish yet.

@benrubson
Copy link
Contributor

What about simply reverting this commit ?
What are its benefits ?

@rfjakob
Copy link
Collaborator

rfjakob commented Jul 25, 2017

Unfortunately does not revert cleanly

@rfjakob rfjakob marked this as a duplicate of #348 Jul 25, 2017
@rfjakob
Copy link
Collaborator

rfjakob commented Jul 25, 2017

@DominikChmiel I have pushed v2 of the patch series:

git clone -b issue214-v2 https://github.com/rfjakob/encfs.git
cd encfs
./build.sh

Kernel build runs A-OK for me now. Let's hope you find the same.

@DominikChmiel
Copy link

DominikChmiel commented Jul 25, 2017

Ran 5 rebuilds, no I/O errors, no syslog messages, no crash, good build result. Looks like that fixed it to me. Thanks again @rfjakob!

@rfjakob
Copy link
Collaborator

rfjakob commented Jul 25, 2017

Fix released in EncFS v1.9.2. Thanks to everybody involved!

@rfjakob rfjakob closed this as completed Jul 25, 2017
@benrubson benrubson added this to the 1.9.2 milestone Nov 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants