Use windowed read/write in median_smoothing #1674

Adrien-LUDWIG · 2023-07-12T17:17:48Z

See the issue description in this forum comment:
https://community.opendronemap.org/t/post-processing-after-odm/16314/16?u=adrien-anton-ludwig

TL;DR:
Median smoothing used windowing to go through the array but read it entirely in RAM. Now the full potential of windowing is exploited to read/write by chunks.

Tests

I tested it on a very small dataset of 18 images on an external HDD.
Here are the results:

>>> median_smoothing(input, output, smoothing_iterations=1000, num_workers=16)
# Before
 [INFO] Completed smoothing to create /datasets/windowed_io/odm_meshing/tmp/mesh_dsm.tif in 0:00:30.055436
# After
 [INFO] Completed smoothing to create /datasets/windowed_io/odm_meshing/tmp/mesh_dsm.tif in 0:00:28.892439

I also used cksum to compare the resulting files, which were exactly the same.

Do not hesitate if you have ideas for further testing. 😉

See the issue description in this forum comment: https://community.opendronemap.org/t/post-processing-after-odm/16314/16?u=adrien-anton-ludwig TL;DR: Median smoothing used windowing to go through the array but read it entirely in RAM. Now the full potential of windowing is exploited to read/write by chunks.

Uh oh. Sorry for this.

Adrien-LUDWIG · 2023-07-13T10:29:54Z

Sorry for this HUGE mistake. 😅
Tips to future me: re-reading with a fresh head works way better.

Adrien-LUDWIG · 2023-07-13T10:55:31Z

Okay, after further testing this morning I came upon errors when reading a big file:

ERROR 1: ZIPDecode:Decoding error at scanline 256, unknown compression method
ERROR 1: TIFFReadEncodedTile() failed.
ERROR 1: /datasets/whole_hdp_features/odm_meshing/tmp/tiles.tif, band 1: IReadBlock failed at X offset 1, Y offset 9: TIFFReadEncodedTile() failed.
ERROR 1: ZIPDecode:Decoding error at scanline 256, incorrect data check
ERROR 1: TIFFReadEncodedTile() failed.
ERROR 1: /datasets/whole_hdp_features/odm_meshing/tmp/tiles.tif, band 1: IReadBlock failed at X offset 0, Y offset 11: TIFFReadEncodedTile() failed.

The little dataset I used to test yesterday calls window_filter_2d only once. So, I suppose it reads the whole file at once.

I need to explore this further. If you have any idea, I'm all ears.

pierotofy · 2023-07-13T11:03:55Z

Since you're doing writes from parallel threads, perhaps there needs to be a mutex. I'm not sure if that would affect performance.

This would be a cool improvement.

Adrien-LUDWIG · 2023-07-13T11:13:40Z

Indeed, it works using a single worker. I will look into this.

Thanks!
I was able to read a few hundred windows from this file yesterday and was sure it was with many workers... I thought I was starting to lose my mind. Now, that you point it out, I can see where I got mixed up.

Adrien-LUDWIG · 2023-07-13T12:00:04Z

The last commit fixes racing conditions while reading/writing using locks.

I could not test performance for now as I don't have any big project with the temporary meshing files. I guess I could test it on something else since it is not too specific.
Edit: I tried to use an orthophoto but it has no nodatavals which causes an error.

My main concern about performance is that each read overlaps with the previous and next on by a few pixels (half the kernel size).

originlake · 2023-07-14T18:04:02Z

My main concern about performance is that each read overlaps with the previous and next on by a few pixels (half the kernel size).

The previous way did not alter the original pixel value until the whole process is done. Now each thread will read/write directly, you shouldn't use the input file to save the results during processing (later processing might read altered values hence giving incorrect results), instead, write to a new file and rename it after all processes are done. Parallel reading in read-only mode shouldn't be an issue, but writing should be sequential as there is overlapping.

Adrien-LUDWIG · 2023-07-14T19:22:36Z

Oh you're totally right! I remember thinking about the better way to manage the temporary files from one iteration to another... and the idea of making it in place came to my mind. I felt real smart at the time but now that you point it out it's obviously not a good idea. 😅

Should I keep the input file intact until the output is created ? I feel like this is the better way to go but it should be consistent with the rest of the pipeline, so I would like your opinion.

This implies having 2 temporary files when doing more than one iteration, thus taking up more disk space. But I suppose it's fine since it would be the equivalent of the input size which is far less than the individual tiles or the tiles.tmp.tif file. In my case for example:

individual tiles total: 1TB (500 tiles of 2 GB)
tiles.tmp.tif: 352 GB
tiles.tif (the input here): 7.2 GB

If we go this way, should the 2 temporary files be renamed between the iterations to always have a clear name indicating which corresponds to the latest pass?

Let me know what you think and have a nice weekend! 😉

originlake · 2023-07-15T18:25:31Z

I think keeping the input file intact is better, it also benefits when the process failed in the middle, guarantee the source data is only changed when the process finishes successfully. How to name the temporary files doesn't matter to me, they are just temporary files to store the data, similar to how we store the data in memory. Just make sure to clean up them at the end.

When using rasterio "r+" open mode, the file is well updated while opened but completely wrond once saved.

Adrien-LUDWIG · 2023-07-17T17:00:53Z

The last 2 commits avoid reading altered data.

The first commit was my first attempt but it doesn't work. I pushed to have feedback, if possible. I don't understand why but when using rasterio "r+" mode to open the file, its data is completely wrong once saved. It is even bigger than the original file.
I can confirm that the file is well updated while opened, because the second temporary file is valid after many iterations of reading from and writing to it.
What am I missing? Is this not what "r+" is for?

The second commit fixes it but I find the workaround quite ugly. I am opened to suggestions.

Adrien-LUDWIG · 2023-08-11T16:56:14Z

Hello again! 👋

I finally had the time to test the performance impact of this PR.

So, I used a dataset of 120 images with a resolution of 5472x3648.
I kept the default parameters.

I stoped the process at the meshing step and timed the execution time of median_smoothing.
Here are the results for different number of iterations:

Iterations	Without windowing	With windowing
1	10.107	6.558
10	58.68	56.48
100	537.55	538.51

Results are averaged on 10 runs for 1 iteration and 5 for 10 iterations.

For information, here are the sizes of the input and outputs:

File	Size (MB)
tiles.tif	74.25
mesh_dsm_1.tif	61.76
mesh_dsm_10.tif	45.65
mesh_dsm_100.tif	30.75

and the resolution of tiles.tif : 6031*6088

Once again, I use cksum to ensure the outputs were the same for both implementations.

Apart from the elegancy of the workaround described in my last comment, I think the PR is ready to be merged.

Do not hesitate to tell me if you have concerns. 😉

pierotofy · 2023-08-11T20:56:35Z

Thanks @Adrien-LUDWIG 👍 I've run some tests on a smaller dataset, runtime is similar as you found out, except in my tests the original code runs about ~10-15% faster (which is what I would expect). This can be merged, but I'm trying to understand the use-case and tradeoff: was the goal to reduce memory usage? Did you have ODM crash (run out of memory) at the median smoothing step during processing?

Adrien-LUDWIG · 2023-08-12T20:30:52Z

Yes, exactly, ODM crashed at the median smoothing step because it tried to allocate to much RAM (352 Go for my dataset) whereas the rest of the processing didn't required more than 120 Go of RAM (well, 16 Go of RAM and ~100 Go of swap).

The previous computations on tiles.tif, using GDAL, had no problem handling it. It crashed only when trying to open it with rasterio.

Adrien-LUDWIG added 2 commits July 12, 2023 16:55

Remove forgotten exit call

9b9ba72

Uh oh. Sorry for this.

Adrien-LUDWIG marked this pull request as draft July 13, 2023 10:49

Add locks to fix racing conditions

87f82a1

Adrien-LUDWIG added 2 commits July 17, 2023 16:15

Use temporary files to avoid reading altered data

65c2079

Avoid using rasterio "r+" open mode (ugly patch)

b4aa3a9

When using rasterio "r+" open mode, the file is well updated while opened but completely wrond once saved.

Adrien-LUDWIG marked this pull request as ready for review August 11, 2023 16:56

pierotofy merged commit e9e1805 into OpenDroneMap:master Aug 12, 2023
2 checks passed

Adrien-LUDWIG deleted the median_smoothing_memory_optimization branch August 13, 2023 21:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use windowed read/write in median_smoothing #1674

Use windowed read/write in median_smoothing #1674

Adrien-LUDWIG commented Jul 12, 2023

Adrien-LUDWIG commented Jul 13, 2023

Adrien-LUDWIG commented Jul 13, 2023

pierotofy commented Jul 13, 2023 •

edited

Loading

Adrien-LUDWIG commented Jul 13, 2023

Adrien-LUDWIG commented Jul 13, 2023 •

edited

Loading

originlake commented Jul 14, 2023 •

edited

Loading

Adrien-LUDWIG commented Jul 14, 2023

originlake commented Jul 15, 2023

Adrien-LUDWIG commented Jul 17, 2023

Adrien-LUDWIG commented Aug 11, 2023

pierotofy commented Aug 11, 2023 •

edited

Loading

Adrien-LUDWIG commented Aug 12, 2023

Use windowed read/write in median_smoothing #1674

Use windowed read/write in median_smoothing #1674

Conversation

Adrien-LUDWIG commented Jul 12, 2023

Tests

Adrien-LUDWIG commented Jul 13, 2023

Adrien-LUDWIG commented Jul 13, 2023

pierotofy commented Jul 13, 2023 • edited Loading

Adrien-LUDWIG commented Jul 13, 2023

Adrien-LUDWIG commented Jul 13, 2023 • edited Loading

originlake commented Jul 14, 2023 • edited Loading

Adrien-LUDWIG commented Jul 14, 2023

originlake commented Jul 15, 2023

Adrien-LUDWIG commented Jul 17, 2023

Adrien-LUDWIG commented Aug 11, 2023

pierotofy commented Aug 11, 2023 • edited Loading

Adrien-LUDWIG commented Aug 12, 2023

pierotofy commented Jul 13, 2023 •

edited

Loading

Adrien-LUDWIG commented Jul 13, 2023 •

edited

Loading

originlake commented Jul 14, 2023 •

edited

Loading

pierotofy commented Aug 11, 2023 •

edited

Loading