Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mmap(NULL, ...) on zero-length file fails with ENOEXEC #3451

Closed
thorsteneb opened this issue Aug 9, 2018 · 19 comments
Closed

mmap(NULL, ...) on zero-length file fails with ENOEXEC #3451

thorsteneb opened this issue Aug 9, 2018 · 19 comments
Labels

Comments

@thorsteneb
Copy link

thorsteneb commented Aug 9, 2018

  • Your Windows build number: (Type ver at a Windows Command Prompt)

April Update, 10.0.17134.191
Also tested on 19H1, build 18204

  • What you're doing and what's happening: (Copy&paste specific commands and their output, or include screen shots)

The package "liblmdb" does not work on WSL. This impacts any projects that use lmdb. I've narrowed this down to mmap(NULL,...) on a 0-length file failing with ENOEXEC. This is likely related to #658 , yet distinct enough that the @therealkenc suggested I file a new issue, to avoid confusion.

To reproduce with lmdb:

git clone https://github.com/LMDB/lmdb.git
cd lmdb/libraries/liblmdb
make test

mtest.c:50: mdb_env_open(env, "./testdb", MDB_FIXEDMAP , 0664): Exec format error
Aborted (core dumped)
Makefile:61: recipe for target 'test' failed
make: *** [test] Error 134

strace shows:

mmap(NULL, 10485760, PROT_READ, MAP_SHARED, 4, 0) = -1 ENOEXEC (Exec format error)

Just before it cores.

Sample code showing just the ENOEXEC, but not going on long enough after that to core:

#include <stddef.h>
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>

int main (void) {
  void *map;
  int fd;

// mmap an empty file, fails with ENOEXEC
// Test case to recreate an issue with lmdb and WSL
// mmap(NULL, 10485760, PROT_READ, MAP_SHARED, 4, 0) = -1 ENOEXEC (Exec format error)
  remove("./testfile");
  fd = open("./testfile", O_RDWR | O_CREAT, 0664);

  map = mmap(NULL, 1024, PROT_READ, MAP_SHARED, fd, 0);
  if (map == MAP_FAILED)
    printf("mmap(NULL,...) of empty file failed, strace shows ENOEXEC\n");
  else {
    printf("mmap(NULL,...) of empty file succeeded, issue fixed\n");
    munmap(map, 1024);
  }

  close(fd);
  remove("./testfile");

// By contrast, with a file that contains some data
  fd = open("./testfile", O_RDWR | O_CREAT, 0664);
  write(fd,"Content",7);

  map = mmap(NULL, 1024, PROT_READ, MAP_SHARED, fd, 0);
  if (map == MAP_FAILED)
    printf("mmap(NULL,...) of non-empty file failed, that's unexpected\n");
  else {
    printf("mmap(NULL,...) of non-empty file succeeded, as expected\n");
    munmap(map, 1024);
  }

  close(fd);
  remove("./testfile");

  return 0;
}
  • Strace of the failing command, if applicable:

strace of the sample code:
mmap-example.strace.txt

strace of liblmdb itself:
liblmdb.strace.txt

User voice for this issue lives here: https://wpdev.uservoice.com/forums/266908-command-prompt-console-windows-subsystem-for-l/suggestions/35053207-support-oracle-berkeley-db-and-lmdb

@benhillis
Copy link
Member

benhillis commented Aug 9, 2018

I'm testing a fix for this now, but it looks like lmdb is hitting another error after this:
mtest.c:113: mdb_cursor_get(cursor, &key, &data, MDB_LAST): MDB_NOTFOUND: No matching key/data pair found

I'm trying to determine if this is related to my "fix" or another issue.

@therealkenc
Copy link
Collaborator

therealkenc commented Aug 10, 2018

The major use case for "mmap's problems" is BerkeleyDB. Instead of deep diving lmdb (or boltdb #2304 #3162 or foo-db) you may consider:

sudo apt install tcl tcl-dev   # plus the usual suspects
mkdir bdb && cd bdb
wget http://download.oracle.com/berkeley-db/db-5.3.21.tar.gz
tar xf db-5.3.21.tar.gz
cd db-5.3.21/build_unix
../dist/configure --prefix=/usr/local --enable-compat185 --enable-dbm \
    --disable-static --enable-test --enable-debug --enable-cxx \
    --enable-tcl --with-tcl=/usr/lib
make -j4
tclsh  # in tickle shell after this, most unfortunately
source ../test/tcl/test.tcl  
# run_std     # worthy goal but it's a big test
# r btree     # more reasonable but still a big ask
test042 btree # first random fail (segfault) I hit on WSL, fine on Real Linux

There is some information on running the individual tests here. If you get the BerkeleyDB tests to pass it will squash #3149 #2627 #2852 #2871 (and everything else in the BDB universe). The other foo-db will almost certainly follow suit.

@drujd
Copy link

drujd commented Aug 22, 2018

I guess this is related to #902 ? It sure would be nice to see the mmap issues finally fixed...

@thorsteneb
Copy link
Author

@drujd Vote on user voice, get your friends to vote. Right now, there's not exactly an outcry to get BerkeleyDB to work, or any other project relying on these mmap features.

https://wpdev.uservoice.com/forums/266908-command-prompt-console-windows-subsystem-for-l/suggestions/35053207-support-oracle-berkeley-db-and-lmdb

@drujd
Copy link

drujd commented Aug 22, 2018

The issue has much broader implications, I have voted, but I do not think this is the right thing to vote on. The dysfunct mmap functionality breaks everything that depends on the correct mmap behavior, including bup or building Android from source. Personally I don't really care about BerkeleyDB.

@drujd
Copy link

drujd commented Aug 22, 2018

Also, I really think this should be considered a bug, not a feature request.

@thorsteneb
Copy link
Author

Could be a combination of bugs and missing features. I don’t understand WSL even nearly well enough to have an informed opinion on that.

I think it’s very likely that supporting Berkeley DB will also “knock down” a range of other related issues. Any DB with the same pedigree will work.

For Android source development - does that still require 32-bit elf support? If so, the uservoice for that is here: https://wpdev.uservoice.com/forums/266908-command-prompt-console-windows-subsystem-for-l/suggestions/13377507-please-enable-wsl-to-run-32-bit-elf-binaries

@benhillis
Copy link
Member

@yorickdowne - The distinction of bug vs feature request is fairly arbitrary, we call things bugs that are simple fixes and a feature request would take a lot longer to implement.

I've looked into this and #902 and they share the commonality that both rely on functionality that the Windows kernel does not support which makes the scope of this fix a bit larger.

@therealkenc
Copy link
Collaborator

The dysfunct mmap functionality breaks everything that depends on the correct mmap behavior, including bup or building Android from source.

At base the problem is stated in the SetEndOfFile function....

If CreateFileMapping is called to create a file mapping object for hFile, UnmapViewOfFile must be called first to unmap all views and call CloseHandle to close the file mapping object before you can call SetEndOfFile.

This has been dysfunct differently abled for 25 years. Repeating the sentiment here, and reiterating that this guy was right (he just framed the ask incorrectly), fixing this In Windows means you don't need patches like this for lmdb on Windows (and only for Windows).

The reason Lineage builds don't work is because no one does Lineage builds on Windows. Then someone does an upstream patch like this (which note has no #ifdef _WIN32 because it doesn't need one) and you never hear of it again. Academically, if you look closely at the lmdb work-around, it could be done in a way that doesn't require an #ifdef too. The Linux side of the #ifdef could set the file size to sizehi instead of 0 and close the handle too. Maybe someone will.

Also, I really think this should be considered a bug, not a feature request.

Alright. But by the chosen definition of not-dysfunct, it is a Windows bug contrast lack of Windows feature too. In extremis, ditto ha ha but serious.

@thorsteneb
Copy link
Author

Wow. That rabbit hole went deep fast. Thanks, @therealkenc .

fixing this In Windows

That'd be amazing. Your theory that WSL works because 23 years of Cygwin workarounds means not all code triggers #1529 is -- oddly compelling.

You are saying that allowing SetEndOfFile with handles open would then lend itself to also allowing other actions with handles open, such as renaming, and deleting. Do I get that right?

@benhillis , thanks for looking into it! From your perspective, are allowing SetEndOfFile/delete/rename while handles are open and allowing mapping of a zero-length-file things to be tackled in Windows, or through some form of layer in WSL?

@therealkenc
Copy link
Collaborator

therealkenc commented Aug 23, 2018

allowing SetEndOfFile with handles open would then lend itself to also allowing other actions with handles open, such as renaming, and deleting. Do I get that right?

Nah I didn't mean to imply that in any narrow sense. Better to call the two issues (#1529 and mmap's problems) analogous. No magic bullet here. [ed] To illustrate the bigger picture, see #2915 (message) which doesn't conflate with Windows' "pinned handles" xplat problem. You could fix it in WSL, but that doesn't help win32 NodeJS being broken. How do you eat an elephant? One bite at a time.

@thorsteneb
Copy link
Author

I've looked into this and #902 and they share the commonality that both rely on functionality that the Windows kernel does not support which makes the scope of this fix a bit larger.

@benhillis Is it realistic to tackle this and #902 as part of 20H1?

@daviesalex
Copy link

daviesalex commented Feb 19, 2019

Edit: this statement is not correct. I can not reproduce this with https://github.com/WhitewaterFoundry/WLE. I'm going to try to do more bisecting here to see if we can figure out what the difference is between working RPM and not working RPM distributions are. My apologies (and thanks to @DarthSpock for pointing this out)

Edit 2: we found that different versions of RHEL have problems. What is interesting is that we were originally trying on an older version, but the folks at WhitewaterFoundry/Pengwin-Enterprise#20 have reproduced it on a higher version. If I get some time i'll see if I can bisect out the OS versions to prove if it really is that (i.e. a change in RPM or one of its dependencies such as bdb),

--

This behavior has the effect of preventing people running most RPM based distributions such as RHEL/Fedora/CentOS/Scientific Linux under WSL.

@therealkenc
Copy link
Collaborator

Have you tried WSLFedoraRemix?

I would be mildly curious to know whether #2852 (rpm ‑‑initdb) reproduces on WSLFedoraRemix. That issue never had usable reproduction steps for whatever it was the OP was doing on their Fedora (for some value of Fedora), but rpm --initdb works on Ubuntu natch. Curious, but not curious enough to actually install WSLFedoraRemix and look. Maybe it's been patched like Alex suggests. Maybe not. Dunno.

@sirredbeard
Copy link
Contributor

sirredbeard commented Feb 26, 2019

It affects WLinux Enterprise distributions (RHEL, SL, CentOS) as well. See WhitewaterFoundry/Pengwin-Enterprise#20.

@therealkenc
Copy link
Collaborator

@benhillis

mtest.c:113: mdb_cursor_get(cursor, &key, &data, MDB_LAST): MDB_NOTFOUND: No matching key/data pair found
I'm trying to determine if this is related to my "fix" or another issue.

Ben, at the time back in August 2018 did you manage to track that fail any? Understandable if you didn't, but whatever you found (if anything) might save Hayden some effort.

@ytrezq
Copy link

ytrezq commented Apr 9, 2019

Hey ! It’s starting to makes really long for not being able to update Fedora or install any other packages…

@therealkenc
Copy link
Collaborator

therealkenc commented May 8, 2019

Was hoping for some joy on the OP repro in 18890, but 'fraid not.

233   execve("/usr/bin/make", ["make", "test"], 0x7fffcece5ae0 /* 22 vars */) = 0
[blah blah...]
238   openat(AT_FDCWD, "./testdb/data.mdb", O_RDWR|O_CREAT, 0664) = 4
238   fstatfs(4, {f_type=0x53464846, f_bsize=4096, f_blocks=125688063, f_bfree=84551925, f_bavail=84551925, f_files=999, f_ffree=1000000, f_fsid={val=[1, 0]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOATIME}) = 0
238   pread64(4, "", 152, 0)            = 0
238   mmap(NULL, 10485760, PROT_READ, MAP_SHARED, 4, 0) = -1 ENOEXEC (Exec format error)
238   munmap(0x7ffb5ddd0000, 2101248)   = 0

I'm testing a fix for this now, but it looks like lmdb is hitting another error after this:
mtest.c:113: mdb_cursor_get(cursor, &key, &data, MDB_LAST): MDB_NOTFOUND: No matching key/data pair found

Didn't make it that far. If that initial Aug 2018 fix attempt is still in the can, and doesn't break anything obvious, it might be worth a git push to see if it gets any further.

Copy link
Contributor

This issue has been automatically closed since it has not had any activity for the past year. If you're still experiencing this issue please re-file this as a new issue or feature request.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants