-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Population Density File loading can be sped up considerably #223
Comments
Nice work on this. I have a work item in my notes to visit this area of the code (but focus is elsewhere right now). Have you experience of memory-mapped files? I suspect that when handling such large files the overhead of the system calls contributes enough that there's more low-hanging fruit there. |
Indeed - thanks for the work on this, which looks very helpful. See if you can do some reading on git/github, as we need you to be in the flow of pushing changes, getting feedback, etc... (Branch removed - see comment later) |
I did speculate as to whether the I/O performance might be worth looking at in Issue #197 since there are clearly things that could be done to improve it, but it wasn't clear to me whether that was a worthwhile direction. The question really is what the I/O pattern looks like. Since I've not got a machine here large enough to actually run the code at a sensible scale, I've not been able to do my own tests and have so far confined my reviews to inspection of the code, and a few very basic tests. If only a single pass is needed to read the data, then a read syscall is probably most efficient. If you have enough ram that the data can stay (at least mostly) in ram, and you are going to randomly access the data, then mmap becomes more interesting. Page faults (even minor ones) do have a cost, so thats why a single sweep is often more efficient with read. mmap can become more complicated to deal with if the file changes size while it is mapped. So care is needed in that case, but I suspect that is not the case here. fread is buffered anyway, by the c library in addition to the OS, so a plain read will be potentially faster, if you don't need the c library buffer. You can also turn the c library buffering on and off with setvbuf and friends. Also, if you know that you want to read the whole file, then fadvise (or for mmap, madvise) can be useful to turn readahead up and ensure that I/O is done in large chunks. Large databases will use O_DIRECT which imposes certain alignment constraints on the data, but is a lot faster if you are willing to do all the memory handling yourself (which is not too tricky if you know that everything will fit in RAM) and if you can read in large blocks which are page aligned. If you are looking at O_DIRECT then you probably also want to be doing I/O in at least 4MB chunks and maybe as large as 16MB or more depending on what is most efficient for the filesystem & underlying storage. |
Just following up on the code side - for copyright/licensing reasons, we need changes to start with a Pull Request from the original author, and then the ongoing conversation continues from there. So I can't make a branch/PR containing your code, I'm afraid. Thanks for your thoughts and efforts with this - I hope you can get started with github, and get some of these helpful suggestions looked at. |
Thanks, I created my first memory mapped file back in the days of Windows NT SP2 nearly 25 years ago. It would have been my first choice here, but since this is targeting different platforms, I wasn’t aware of a platform independent solution. As for savings, unless I’m missing something a memory mapped file isn’t going to provide much speed up. The measurement code proves that we can read from a normal file and process the whole string in under a second. The file reading does some pointer manipulation and maths to prepare the string, but it’s in the outside loop, and that along with the factual file reads (I think I included them) came to 0.3 seconds. I tried rewriting the integer conversion inline using binary arithmetic/assumptions and that didn’t give much. Almost all the time spent is in the double conversion, and specifically the first and second double conversion. To speed this up any further would require a rewrite of that, and I don’t think the speedup will be worth the loss in readability. I mean, I managed to get another 4 seconds with some pretty hard core optimisations, some of which don’t massively affect the code, but some made the code quite horrible. (See attached) So if there is potential speed gains in memory mapped files, I don’t think they will be significant, although, I would still prefer to use them on a file this size because it would lower the memory usage and makes the code a lot more readable! |
I'll see what I can do. I'm quite busy but I understand the issue and will do my best. Thanks |
Yes, in fact I started with using setvbuf, but a lot of CRT's do not use their internal buffer if the buffer you give it is larger. So I concluded that the call to turn off buffering may actually be making the code slower and deleted the line. As it stands, the ability to process the file in 0.5 seconds, as the measurement code proves, suggests that there is not much else in the file handling and profiling shows almost all of the delay is spent in the double conversion process. I shaved a few seconds off with some more optimisations, but not in a way that should really form part of a maintainable code base. There are probably other bottlenecks that will deliver more without writing ugly code. |
I measured the Population Density file reading and there are the code in SetupModel.cpp is inefficient
Double buffer reading to establish the number of lines is not something to be done with files of this size.
rewind(dat);
P.BinFileLen = 0;
while (fgets(buf, sizeof(buf), dat) != NULL) P.BinFileLen++;
if (ferror(dat)) ERR_CRITICAL("Error while reading density file\n");
// Read each line, and build the binary structure that corresponds to it
Likewise the code that reads the code into the array is suboptimal too.
I used it as an example of optimisation, and in dong so I rewrote both the measuring and the reading.
On my device the measuring part is down from over 14 seconds to 0.5 seconds. The reading of the file is down from 63 seconds to under 30 seconds. Together the total time went down from around 77 seconds to under 30 seconds.
There is clearly a faster method than the one the code is currently employing. I've attached what I did so you can refer to it, but feel free to come up with a solution you're happier with. The measurement code took me less than 10 minutes, so it's shouldn't take too long to roll another solution. (I would put this on a branch so it can be seen there, but I'm afraid I don't really know how to use GitHub, sorry)
SpeedupExample.zip
The text was updated successfully, but these errors were encountered: