-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
image load failures due to bad abort support in geotiff #604
Comments
I am unable to reproduce. Are you using the demo site? This was a known issue (to me) but was able to fix it by updating |
Specifically, if you're not using the demo site, make sure to set up |
Here's a screencast of getting the AggregateError on the demo site. |
I will look into this then. I suspect it has to do with a recent PR we made there. This used to be a problem but I was nearly 100% certain I had fixed it in a local copy of geotiff we had - we then merged that copy into the real thing but it's definitely possible something was lost in translation. Yes, that section of the code is a mess - any help there would be appreciated. It's not completely unapproachable, just takes some debugging. |
As an aside, I think there may be a CORS issue with Firefox and your bucket. EDIT: I am actually having trouble even with |
Regarding blockedsource.js, the code looks very confused, but maybe I don't understand the intent. So, my analysis could be wrong. The first major issue is that it inspects the cache, then releases the cpu (by use of await), then pulls from the cache. This can't work reliably because the cache is dynamic, it changes over time. The values cached before await may not be there after await. Every use of await introduces another race condition. The code should instead pull what it can from the cache at the top of fetch(), and hold on to those values. That should be the only use of cache in fetch(). It should hold on to the requests it makes, rather than expecting them to appear in cache later, because, again, the cache is dynamic: every addition to cache can remove something from cache, so you can't wait on multiple things and then expect them to all be there. The second issue is the abort code. I suspect the intent of the API is that the same tile will not be fetched twice if requested more than once in a short time interval. Without reference counting the number of requests on the same tile, I don't believe there's a reliable way to support abort, because with no count there's no way to know if the tile is still needed. The current code attempts to work around this by retrying aborted tiles, but this introduces races on this.abortedBlockIds. I suspect the desired functionality requires many fewer state variables than the current implementation. I could try reworking it, but I would probably do a clean rewrite rather than trying to fix what's there. |
I would be open to that @acthp - I held some similar suspicions of the code but could not have wrote it clearly like you did. Thanks so much - if you don't implement it, I will probably take a stab at it but community help is always appreciated! As for the abort code, this comes from |
Ok, I'll see what I can put together in the next few days. Sorry about the CORS issue on that image. If you want to test it I believe localhost:8080 is allowed. I can open it up more if you need it. |
Ah, we use port 3000 for development. I'll circle back to this on Friday to check in. Thanks! |
Have you had a chance to look at this @acthp? Otherwise I'll look at it early next week. |
Yes, though I'm mostly focused on writing tests atm. I think there are three groups of bugs: races on the cache, races on the abort, and a suspected off-by-one that leads to an exception when the request range matches the block size. Trying to shore up the tests to make sure there's not more. |
Amazing, thank you! The last one sounds horrifying, been in a similar place with this library! |
Hi all, I'm curious what the current state of this issue is? I am working on using parts of VIV code (specifically the loader/ImageLayer) and I've run into a similar issue that I currently suspect has the same root cause. If it is helpful I can share/explain what I am running into. |
@Dev-Lan I rewrote the fetch code in geotiff, which resolved the various races we were experiencing. Code is here: https://github.com/ucscXena/geotiff.js/commits/ucsc-beta I haven't kept it up to date. I may come back to it in the future, but currently it's not clear if we will continue with viv, due to performance concerns. |
@Dev-Lan Could you expand on the issue? @acthp Could you also expand on your performance concerns, even if you are using a fork? Is this about tiff or the library? We do support OME-NGFF. I am wondering if it is time to discuss a plan for "adopting" the |
There was also geotiffjs/geotiff.js#336 which may help alleviate your issues. Have you tried this? It purports to solve some issues. |
This shouldn't be a fix, because we should be setting the cache size to be infinite, but maybe it has a side-effect? |
@ilan-gold Re: performance, I haven't been in this code for awhile so my recollection is probably wrong in places. But as I remember it, the main issue we've had is lag in rendering tiles. We've also had problems with the main thread being blocked, freezing the whole UI. When I looked into it, it appeared from back-of-the-envelope calculations that it was fetching a lot more data than it should need to render a given view. It canceled requests much less frequently than I would expect. For example, when zooming and panning rapidly I would expect to see cancellations for views that were merely passed through, but I rarely saw cancelled requests. So, most of the time was spent waiting for tiles that weren't used. To bring the data sizes down we tried some compressed formats, but that would lock up the UI for seconds at a time. It appears that the decompress is happening on the main thread. Also, the javascript decompression performance was poor, and pretty much negated any potential gain from the compression. While troubleshooting these issues I felt that the code complexity was quite a bit beyond what was necessary for the problem, which made performance analysis difficult. Some of that is inherited from the deckgl and geotiff APIs. E.g. if I recall correctly the tile requests are driven from deckgl, so analysis of tile fetch and cancel require digging into that code. Geotiff implements BlockedSource, but the underlying image is already tiled, so there are two different chunking mechanisms working on top of each other. I'm not sure this is necessary. We think it might be easier for us to build a system that has just a few abstractions, and fetches tiles that are individual, pre-computed, compressed image files so the browser can render them natively in the C++. If we do try to continue rendering directly from the pyramid-style image files, at minimum we would put the decompress on a separate thread, and would likely move the code to webassembly. Again, sorry for any inaccuracies. I hope to get back to this in a few weeks. |
I'm experiencing two of these, though potentially exposed in a slightly different way. I have a tiff stack of low resolution time-varying data. Panning/zooming does not trigger any fetches, but scrubbing through time does.
Right now, this is my biggest issue. This is somewhat random, but can be triggered by 2 quick updates causing 2 fetches for images. The whole page becomes unresponsive.
This is a smaller issue since I can work around it with the UI, but was surprising. Again, if I scrub through say 5 frames quickly, if deckgl or fetching can't keep up I would expect subsequent changes to abort/override previous changes, and get a single visual update with the 5th frame. Instead, when I'm done scrubbing all 5 frames show in sequence. Like I said, this is fixable with a debounce or throttle, but I did try adding It is also definitely possible I'm using something wrong or at least unexpectedly. I'm using vue, and have had the most success so far using plain js deckgl, and using the loaders and I'm layer classes from viv. My code is at https://github.com/visdesignlab/aardvark. 90% of the relevant code is at https://github.com/visdesignlab/aardvark/blob/main/src/components/ImageViewer.vue, and a little bit is in https://github.com/visdesignlab/aardvark/blob/main/src/stores/imageViewerStore.ts I can also provide videos or meet virtually if that would be useful in any way. |
Hi again! I'm getting back into my project using this code. I was wondering if there have been any developments on this issue? As is I can still easily cause the whole page to freeze from loading subsequent frames too quickly. I have tried swapping out the geotiff code from the fork in the previous comment, but haven't been successful in getting it to build.
|
As I see it there are two issues:
Something you can do is use What @acthp suggests in his last comment (C++ with precomputed tiles) sort of goes against what this repo does so if you really need that performance, then that might be the route to go. But if you're pre-computing compressed tiles, I suspect that our repo's performance will actually be fine since the computation cost of decompression will go down substantially as to not break anything. |
P.S @Dev-Lan if you're issue is just bundling a |
Right now, I'm just trying to fix the first issue. I certainly would appreciate any advice if you have it regarding the bundling. Of my three approaches described above, I think 1 is a dead-end. For both option 2 and 3 I am using a fork I created from VIV that references the ucsc Option 2: load viv monorepo from my fork If I look at the viv package in Option 3: get subpackage for loaders only.
Let me know if any additional info would be helpful. |
I don't currently have plans to continue with this. Regarding #2, in addition to deck.gl over-fetching, we also saw over-fetching due to inappropriate tile sizes. All of the images we've tested so far have required rebuilding due to various encoding errors, e.g. mixing up dimensions, botching the tiles, etc. This is pretty typical for research data, and it undermines the utility of being able to directly render from these formats. So, we're building conventional pyramids of png files during our image data ingest, and rendering with deck.gl. I will update this thread if I'm able to reduce the deck.gl over-fetching. re: build, I had to fork viv to inject the patched geotiff. Diffs are here: master...ucscXena:viv:ucscxena There's probably a better way, but I don't know what it is. |
Thanks for the link, this is a helpful reference! I'll give this another shot. Edit: So far no luck :'( |
I ended up just making a local copy of just the loaders package from viv and updating it to use This does help, but I can still trigger a "hard crash" of the webpage, where the page becomes unresponsive, and I need to close the tab. |
Alright, "hopefully," last update. I found something that appears to have resolved my issue. I think @ilan-gold's suggestion for using a pool finally clicked. Specifying a pool in the options when I call
|
Yay! |
@ilan-gold is passing a |
I think perhaps the reason for this was the workers on the heatmap but that is something of a moot point now since it almost certainly will not need to use workers anymore for spatial data (heatmaps for spatial data tend to be smaller). |
Describe the bug
Image tile loading fails after some use of the UI. Some tiles fail to load, and zoom/pan does not resolve the issue.
I can avoid the bug from viv by removing the abort signal in the calls to geotiff. This might be a good workaround for viv until geotiff is fixed. However, the blockedsource.js file in geotiff looks extremely broken. I suspect there are several more race conditions due to incorrect use of async/await, but the abort code is the easiest one to hit from avivator.
To Reproduce
Load ome tiff in avivator. Load a new channel. While channel is loading, zoom and pan the image. Eventually geotiff will throw an AggregateError, and some tiles will never load.
Expected behavior
Tile loading continues to work
Screenshots
Environment:
The text was updated successfully, but these errors were encountered: