-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use windowed read/write in median_smoothing #1674
Use windowed read/write in median_smoothing #1674
Conversation
See the issue description in this forum comment: https://community.opendronemap.org/t/post-processing-after-odm/16314/16?u=adrien-anton-ludwig TL;DR: Median smoothing used windowing to go through the array but read it entirely in RAM. Now the full potential of windowing is exploited to read/write by chunks.
Uh oh. Sorry for this.
Sorry for this HUGE mistake. 😅 |
Okay, after further testing this morning I came upon errors when reading a big file:
The little dataset I used to test yesterday calls I need to explore this further. If you have any idea, I'm all ears. |
Since you're doing writes from parallel threads, perhaps there needs to be a mutex. I'm not sure if that would affect performance. This would be a cool improvement. |
Indeed, it works using a single worker. I will look into this. Thanks! |
The last commit fixes racing conditions while reading/writing using locks. I could not test performance for now as I don't have any big project with the temporary meshing files. I guess I could test it on something else since it is not too specific. My main concern about performance is that each read overlaps with the previous and next on by a few pixels (half the kernel size). |
The previous way did not alter the original pixel value until the whole process is done. Now each thread will read/write directly, you shouldn't use the input file to save the results during processing (later processing might read altered values hence giving incorrect results), instead, write to a new file and rename it after all processes are done. Parallel reading in read-only mode shouldn't be an issue, but writing should be sequential as there is overlapping. |
Oh you're totally right! I remember thinking about the better way to manage the temporary files from one iteration to another... and the idea of making it in place came to my mind. I felt real smart at the time but now that you point it out it's obviously not a good idea. 😅 Should I keep the input file intact until the output is created ? I feel like this is the better way to go but it should be consistent with the rest of the pipeline, so I would like your opinion. This implies having 2 temporary files when doing more than one iteration, thus taking up more disk space. But I suppose it's fine since it would be the equivalent of the input size which is far less than the individual tiles or the
If we go this way, should the 2 temporary files be renamed between the iterations to always have a clear name indicating which corresponds to the latest pass? Let me know what you think and have a nice weekend! 😉 |
I think keeping the input file intact is better, it also benefits when the process failed in the middle, guarantee the source data is only changed when the process finishes successfully. How to name the temporary files doesn't matter to me, they are just temporary files to store the data, similar to how we store the data in memory. Just make sure to clean up them at the end. |
When using rasterio "r+" open mode, the file is well updated while opened but completely wrond once saved.
The last 2 commits avoid reading altered data. The first commit was my first attempt but it doesn't work. I pushed to have feedback, if possible. I don't understand why but when using rasterio "r+" mode to open the file, its data is completely wrong once saved. It is even bigger than the original file. The second commit fixes it but I find the workaround quite ugly. I am opened to suggestions. |
Hello again! 👋 I finally had the time to test the performance impact of this PR. So, I used a dataset of 120 images with a resolution of 5472x3648. I stoped the process at the meshing step and timed the execution time of
Results are averaged on 10 runs for 1 iteration and 5 for 10 iterations. For information, here are the sizes of the input and outputs:
and the resolution of Once again, I use Apart from the elegancy of the workaround described in my last comment, I think the PR is ready to be merged. Do not hesitate to tell me if you have concerns. 😉 |
Thanks @Adrien-LUDWIG 👍 I've run some tests on a smaller dataset, runtime is similar as you found out, except in my tests the original code runs about ~10-15% faster (which is what I would expect). This can be merged, but I'm trying to understand the use-case and tradeoff: was the goal to reduce memory usage? Did you have ODM crash (run out of memory) at the median smoothing step during processing? |
Yes, exactly, ODM crashed at the median smoothing step because it tried to allocate to much RAM (352 Go for my dataset) whereas the rest of the processing didn't required more than 120 Go of RAM (well, 16 Go of RAM and ~100 Go of swap). The previous computations on |
See the issue description in this forum comment:
https://community.opendronemap.org/t/post-processing-after-odm/16314/16?u=adrien-anton-ludwig
TL;DR:
Median smoothing used windowing to go through the array but read it entirely in RAM. Now the full potential of windowing is exploited to read/write by chunks.
Tests
I tested it on a very small dataset of 18 images on an external HDD.
Here are the results:
I also used
cksum
to compare the resulting files, which were exactly the same.Do not hesitate if you have ideas for further testing. 😉