Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Couple of test failures due to mmap changes #212

Closed
deepakjois opened this issue Sep 11, 2017 · 7 comments
Closed

Couple of test failures due to mmap changes #212

deepakjois opened this issue Sep 11, 2017 · 7 comments
Assignees
Labels
kind/bug Something is broken. kind/maintenance Maintenance tasks, such as refactoring, with no impact in features.
Milestone

Comments

@deepakjois
Copy link
Contributor

deepakjois commented Sep 11, 2017

This seems to be happening because the AppVeyor instance does not have enough virtual address space to map the value log. I am guessing this should not prevent Badger from working on a Windows machine with enough RAM, but I don’t have one around to test.

screenshot from 2017-09-11 11-08-52

@fjl
Copy link
Contributor

fjl commented Sep 11, 2017

Maybe it could fall back to file IO in this case?

@deepakjois
Copy link
Contributor Author

deepakjois commented Sep 11, 2017

This seems like a problem specific to the AppVeyor instance, and may not happen for users trying out Badger on Windows. But I don’t have a Windows machine with me at the moment, to confirm.

Moreover, falling back to File I/O would also mean maintaining two different code paths. Windows is not a primary target for us at the moment. We try to make sure that the code builds so that people can try out Badger on that platform. Maintaining a separate code path with File I/O fallback just for Windows doesn’t sound like a great option.

Hopefully, we can figure out a workaround to get the build fixed.

@manishrjain manishrjain modified the milestone: v1.0 Sep 11, 2017
@manishrjain
Copy link
Contributor

We could set the max value size to be the same as the size of log file (a parameter in options). Then we can set the write mmap to be twice the size of the log file. That way, we know that we won't exceed that limit. And it would also work on Appveyor.

@manishrjain manishrjain added the kind/maintenance Maintenance tasks, such as refactoring, with no impact in features. label Sep 15, 2017
@manishrjain manishrjain added the kind/bug Something is broken. label Sep 18, 2017
@manishrjain
Copy link
Contributor

IPFS is failing because of this. Let's fix this asap.

@deepakjois
Copy link
Contributor Author

I don’t have a Windows machine, so I am using AppVeyor as a remote build bot to investigate. I created a small project. Here is the build history.

  • When I mmap the file with it’s exact size, it works fine.
  • When I mmap it with math.MaxUint32 or 500*1024*1023 (500MB), it fails.

There is probably an issue in the way we mmap on Windows. This code was probably never actually exercised on Windows (in the tables package we load to RAM by default).

I will investigate further.

@deepakjois
Copy link
Contributor Author

deepakjois commented Sep 19, 2017

So after a lot of trial and error, I made some changes which work well in my test project replicating this problem.

But the changes are still causing test failures (screenshot below):

  • TestWrite is failing with : truncate:The requested operation cannot be performed on a file with a user-mapped section open.. This refers to the truncate operation during value log replay. It seems that we cannot use truncate on windows while a file is mmap-ed. A fix for this would require changes in the way we do value log replay. I can’t really say if it is a trivial change or not, without further investigation.

  • TestConcurrentWrite is failing because it just takes too long to run. This indicates that there is some sort of race condition, but it only happens when we use mmap. We need to understand how Windows mmap-ing works more deeply to dig into this.

Sigh… @manishrjain I hate to say it, but it seems like #235 might be the easiest way to get around this, after all.

screenshot from 2017-09-19 14-42-21

@manishrjain
Copy link
Contributor

Based on our last chat, a good solution here would be to not mmap until replay has been completed. Also, to preallocate the write file to 2xvalue log size, and then truncate it when done.

deepakjois added a commit that referenced this issue Sep 21, 2017
mmap on Windows works differntly from mmap on Linux.

* We cannot mmap to a size beyond the size of the file. So we need to
truncate to that size and then map.

* We cannot truncate a file while a section of it has been mapped.

This has implications for how we manage value logs. This change fixes
mmap-ing on Windows with the following changes:

* When badger.NewKV() function is called, which in turn calls
valueLog.Open() method, we don’t immediately mmap the writable log
files (read-only log files are not a problem). We mmap the writable log
after replay has happened.

* Whenever the valueLog.iterate() method is called, and it determines
that a file needs to be truncated - we need to either make sure that
there are no open mmaps, or we need to munmap and then re-map the file
after truncation.

* Whenever a writable log file is mmap-ed, its size increases to
ValueLogFileSize * 2 (because of truncation in y.Mmap() function). We
need to ensure that once it has filled up, it is truncated back to the
actual size before it is closed and opened as a read-only file. We also
need to truncate if back if the value log is closed before it fills up.

* Fix a few tests on AppVeyor that were running out of memory by
reducing ValueLogFileSize option value.

Fixes #212.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something is broken. kind/maintenance Maintenance tasks, such as refactoring, with no impact in features.
Development

No branches or pull requests

3 participants