Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: Run flow accumulation in batches for larger DEMs #221

Closed
InsolublePancake opened this issue Jan 17, 2022 · 7 comments
Closed

Request: Run flow accumulation in batches for larger DEMs #221

InsolublePancake opened this issue Jan 17, 2022 · 7 comments

Comments

@InsolublePancake
Copy link

I have a workflow that uses several whitebox tools (Fill depressions, D8_flow_pointer, D8_flow_accumulation, Extract Streams, and a few others that process the streams). I have large DEMs that are too big to process in one go due to memory limitations, but which could be chunked and run quite easily. Except that this doesn't make sense for the flow accumulation step.

I wonder whether it would be possible to do this by running flow accumulation on one chunk of DEM, then 'seeding' the adjacent DEM with the accumulation values along its edge.

Any consideration of this would be greatly appreciated. Thanks

@jfbourdon
Copy link
Contributor

We also have to run DEMs that are too large and we deal with that by using Isobasins (on a lower resolution DEM) in order to split the DEM in a hydro-logical way. Using GDAL when then can crop the original DEM into the chunks we need to pass to WBT. We then reconnect the streams together afterwards. However, when using this method (Isobasins on a lower resolution DEM), you will need to add a buffer on each chunk as the basins perimeters might not fit exactly when used on the higher resolution DEM. Some cleanup might be needed also.

Being able to process rasters too large to fit in memory by letting WBT do chunking would be great. Could be tricky for some operations like FillDepressions and BreachDepressions where neigboring values (outside of the chunk) have an effect. I suppose that a buffer on each chunk would be necessary but even then some differences between a chunked and non-chunked raster could remain. If ever implemented, it would need to be set via a parameter to ensure reproducibility.

@jblindsay
Copy link
Owner

What you are asking for is a fundamental change in the approach for flow accumulation used in WBT, and even more lower-level the way that it deals with reading/writing raster data. I'm not convinced that this is realistic change as it would impact a far greater surface area than simply the flow accumulation tools. There are existing tools and libraries that are geared towards flow accumulation on massive DEMs. The approach that I have adopted is intended to provide good performance for a large proportion of users that are working with more moderate sized DEMs.

@InsolublePancake
Copy link
Author

Ok, fair enough if it's too difficult to implement. Can you recommend any of these libraries you allude to? We need to generate catchments and drainage lines from a DEM or a pointer dataset. Whitebox looked so promising for us!

@jblindsay
Copy link
Owner

How large are the DEMs that you are trying to process and what are the memory limits on your system? Also, specifically which tools were you using in WBT for your flow accumulation workflow and which one(s) raised the out-of-memory error? I can certainly try to reduce the memory requirements of these tools further, but ultimately they will always need to read the entirety of the DEM into memory, given the way the tools are designed.

@jfbourdon
Copy link
Contributor

TauDEM is certainly a possible alternative, but I don't think that you really need to look elsewere. There are ways to circumvent this memory constrain with WBT. We used WBT to produce 1 m resolution rasters (breached DEMs, flow direction, flow accumulation, etc.) for over 400 000 km². Chunking your source DEM is the solution, or at least it's the solution we chose.

We are doing it by pre-processing a very large (>1000 km²) DEM at a lower resolution (say 5 m) using GDAL and then breach/fill this corse DEM before using Isobasins to split it in manageable chunks. We then process these chunks (plus a buffer) at 1 m resolution. Merging everything together afterward is not without some challenge, but it is very doable.

@InsolublePancake
Copy link
Author

@jblindsay we recently ran a DEM that was 2 gb (compressed) and 22 gb (uncompressed). We have larger areas that we would like to model. I trialled chunking the DEM, processing, then recombining, but most of the gained efficiency is lost, and potential for error .
A lot of our DEM in this recent project was no-data because the region included lots of islands and coastline. I suppose if there was efficiencies to be gained there you would have already applied them.

@InsolublePancake
Copy link
Author

@jfbourdon thanks for your suggestions, TauDEM looks promising and I will look into it.
Chunking is the way I was considering. It feels a bit Heath Robinson to chunk and recombine this. The resampling and running isobasins is a good idea. I'll have a go with this, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants