Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Viewer can be sluggish for one of the most common type of series we have (chest CT) #307

Closed
wlongabaugh opened this issue Sep 18, 2020 · 14 comments
Assignees
Labels

Comments

@wlongabaugh
Copy link
Member

@fedorov @pieper If users hit our landing page and click on the featured CT scan, they are looking at a series with 280 slices. It takes a long time to load, so if they start scrolling, they are looking at a frozen "Loading..." page for awhile. Perhaps not the greatest of first impressions? The featured PET scan also has a ton of slices (262), though is not quite as frozen.

Best case scenario would be a snazzy image with fewer slices showing up as the first thing you see.

@pieper
Copy link
Member

pieper commented Sep 18, 2020

Yes, there is definitely something wrong with the frame fetching behavior. If possible, I'd prefer to fix the underlying issue, since 280 slices is not an unusual case.

Here's what I see:

  1. open the console to the Network tab
  2. open the study url https://dev-viewer.canceridc.dev/viewer/1.3.6.1.4.1.14519.5.2.1.6279.6001.224985459390356936417021464571
  3. grab the slice scroll tab on the right side and pull it to the bottom of the window to look at slice 280.

The result is a very long (~5 second) delay with the "Loading" message before the slice appears even though the network tab is very active.

At the beginning, slices are downloading within about 100ms, so if things were working correctly I should be able to see the any interactively selected slice within 100ms.

But instead it appears that as queue is being swamped with fetches related to slices that were triggered as I scrolled past, and they are not canceled even though I'm no longer on that slice. In the end so many slice fetches are queued up that some of the accesses take over 7 seconds.

@swederik do you agree this is an issue we can fix? I'm kind of curious to take a look myself, but I'm sure you know the code a lot better than I do.

@swederik
Copy link

There are definitely things we can do to improve it but I think there will be some tradeoffs (e.g. dropping in-progress requests, which would be a waste of data and server resources). I spoke to James and he's going to look into it and see if there's anything obviously wrong.

@pieper
Copy link
Member

pieper commented Sep 21, 2020

From what I could see there are a lot of pending requests generated when dragging the scrollbar that become "stale" when the scroll bar moves on. We should be able to identify and drop those. If we can really get an arbitrary slice in 100ms then there is no reason we should ever have more than 100ms latency between scrolling to a location and seeing the corresponding slice. Anything else is a bug IMHO.

@fedorov
Copy link
Member

fedorov commented Sep 23, 2020

I was recording a video for the tutorial, and it took about 45 seconds to load MRP for a chest CT. I am going to cut that piece out, and add a message "45 seconds later".

@pieper
Copy link
Member

pieper commented Sep 23, 2020

It would be good to know if this is an issue with the client, the proxy, or the google healthcare api. My evidence points at the proxy (sorry @wlongabaugh).

There are 280 slices in the CT study linked from the main page, and if you watch the network tab image below you can see that the time takes anywhere from 79ms to almost 9 seconds per slice. At 100ms/slice, you should get 10 slices per second, or 28 seconds worst case non-overlapped access. So 45 seconds means that whatever we are doing is 2x worse.

If I use the sandbox to hit google dicomweb directly, I can load a 500 slice study in about 10 seconds, and if I look at the network tab worst case is about 400ms per slice, and most are below 50ms.

@wlongabaugh are you able to look at the proxy logs together with the network tab in the browser to see what's going on? Maybe a scaling issue over other overload of the proxy?

image

@s-paquette
Copy link
Member

@wlongabaugh Is this fixed by #311?

@wlongabaugh
Copy link
Member Author

@s-paquette Alas, no, since the selected series include the many-slice series in question here.

@wlongabaugh
Copy link
Member Author

wlongabaugh commented Sep 24, 2020

Data point one is that we only keep three instances running at all times, and a large influx of requests requires instances to be spun up. These are the instances coming online to handle the first tab series. A request that brings an instance online will take about 7-8 seconds to respond. At about $10/day per instance, it would cost $36/K year to keep ten instances at the ready. It might be cheaper on App Engine Flex, I don't know. This spin-up time is not something you will see hitting Google:

InstanceSpinUp

@wlongabaugh
Copy link
Member Author

Data point two is that I agree the myriad OPTIONS calls were taking too long; the code to respond to that was later than it needed to be, and I am deploying that fix.

@wlongabaugh
Copy link
Member Author

@pieper When I chose a large-slice series from sandbox-000, I am seeing about 1 second load times per slice (see below). I am also seeing, for some reasons, a lot less CORS OPTIONS calls to the server. I note that for the IDC featured series, I am seeing over 600 calls to bring down 280 slices, with half of those as OPTIONS calls. I have made them a little bit more efficient, but this is slowing things down.
Sandbox000-1

@wlongabaugh
Copy link
Member Author

I needed to set the value of "Access-Control-Max-Age" to something in the CORS OPTIONS response, which cut the number of requests to the server roughly in half. (This is deployed on dev.) Still looking at other possible optimizations.

@pieper
Copy link
Member

pieper commented Sep 24, 2020

Sounds like progress.

Data point one is that we only keep three instances running at all times, and a large influx of requests requires instances to be spun up.

Should we limit the client to only make max of 3 simultaneous requests?

pieper added a commit to ImagingDataCommons/ThrottleProxy that referenced this issue Sep 24, 2020
Related to ImagingDataCommons/IDC-WebApp#307, it seems that the autoscaling is introducing a time lag on some requests.

According to [the app engine docs](https://cloud.google.com/appengine/docs/standard/python/how-instances-are-managed) we can autoscale on CPU and latency which may provide better performance.

After discussion with the OHIF team, @swederik suggested this change.
@wlongabaugh
Copy link
Member Author

wlongabaugh commented Oct 1, 2020

Series is not going to be changed by MVP, this is now a proxy/viewer combination performance issue that is post-MVP.

@wlongabaugh wlongabaugh self-assigned this Oct 1, 2020
@fedorov fedorov changed the title Featured landing page image has a very large first series (280 slices) Viewer can be sluggish for one of the most common type of series we have (chest CT) Oct 2, 2020
@fedorov
Copy link
Member

fedorov commented Jun 22, 2021

There were improvements to the viewer over the past few months, and performance is now considerably better.

@fedorov fedorov closed this as completed Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants