Skip to content
This repository has been archived by the owner on Aug 8, 2023. It is now read-only.

[core] Blocking the Map Thread when destroying TileData objects #3724

Closed
wants to merge 5 commits into from

Conversation

tmpsantos
Copy link
Contributor

We create and destroy TileData objects constantly, specially when zooming in/out fast, and we might block and destroying them. This issue is noticeable on iOS (tested on a iPad Mini 2 Retina) and can be reproduced two ways:

  • Pinch the map fast, zooming in and out and you will see the map hanging sometimes.
  • Move the map to a place somewhat far from the current location and click on the top right button to trigger a "fly to" animation, that should hang a few times (can be made worse with the map on full tilt).

While the first scenario is hardly a real life use case, the second is. The root cause is that we cannot cancel worker tasks once they are started (see #2644).

@tmpsantos tmpsantos added the bug label Jan 28, 2016
@1ec5 1ec5 added this to the ios-v3.1.0 milestone Jan 28, 2016
@1ec5 1ec5 added the performance Speed, stability, CPU usage, memory usage, or power usage label Jan 28, 2016
@tmpsantos
Copy link
Contributor Author

👀 @jfirebaugh @kkaefer

@tmpsantos
Copy link
Contributor Author

Also easy to see using the "Start World Tour" option on the iOS app.

@jfirebaugh
Copy link
Contributor

This adds additional complexity to an area where reasoning about threading behavior and object lifetimes is already difficult. I would much prefer we continue to investigate and eliminate the direct causes of blocking, rather than delaying TileData destructors.

@tmpsantos
Copy link
Contributor Author

This adds additional complexity to an area where reasoning about threading behavior and object lifetimes is already difficult. I would much prefer we continue to investigate and eliminate the direct causes of blocking, rather than delaying TileData destructors.

The alternative I see would be change how we use workers. We would always transfer the ownership of what is pass on the parameters to the worker, so cancellation would still need a mutex for flipping a flag, but executing the task on the worker would not acquire that mutex. The worker would only acquire the mutex for posting back the callback to the Map Thread or not if it was cancelled.

Canceling an ongoing task would be a matter of let the task go on and self destruct at the end. Only destroying the worker itself would block.

Good thing is these objects momentarily owned by the worker would be destroyed on the worker thread in this case, offloading the destruction job from the Map Thread, but I don't think this will make a difference. The advantage I see is cancelling a task can be considered effectively non-blocking (not truly because the cancellation flag mutex).

Bad thing is we need to enforce a safe design pattern that I don't think it can be done programmatically like it is today. We would only pass as arguments to the worker std::unique_ptr or std::shared_ptr. The later only to immutable objects.

@1ec5 1ec5 removed this from the ios-v3.1.0 milestone Jan 29, 2016
@1ec5
Copy link
Contributor

1ec5 commented Jan 29, 2016

Pulling off the 3.1.0 milestone given #3731.

@kkaefer
Copy link
Contributor

kkaefer commented Feb 1, 2016

Thinking some more about this, I think we should actually implement what @tmpsantos is suggesting in #3724 (comment). Having a custom deleter will add unnecessary complexity and essentially add a custom deferred shared pointer destruction that could be very complicated to debug.

This is what our workers currently use:

  • layers from the stylesheet (these are already copied ✓)
  • actual tile data (they are moved ✓)
  • placement configuration (already copied ✓)
  • tile ID, source ID, map mode (already copied ✓)
  • SpriteStore, GlyphAtlas, GlyphStore, these are currently references and shared among all workers and the main thread. In JS, these live on the main thread, and web workers get a copy when initializing and communicate with the main thread to mutate them. We should do this in native as well, and make it easier to communicate across thread boundaries.

@jfirebaugh
Copy link
Contributor

Agreed @kkaefer; that's been our long term goal since #667.

tmpsantos and others added 5 commits April 14, 2016 18:14
Add a `::tryCancel()` method to the WorkTask so objects
that own a WorkTask can try to cancel and if it will block,
postpone the cancellation.

On debug mode, the RunLoop will print a warning when we are
blocking trying to cancel tasks on the main thread which we
should never do.
We all the time are creating and destroying TileData objects and
when we try to destroy, we first cancel any ongoing WorkRequest.

When canceling a WorkRequest, there are three scenarios:

- We cancel before the RunLoop queue of the Worker Thread
  executes the task. This should not block the Map Thread.

- We cancel after the work was already executed (but we didn't
  pick the callback yet). This should not block either, the
  bad thing is the work is wasted, but we can live with that.

- We cancel but there is a work ongoing. This is the worse case
  scenario, because we block until the work is completed by the
  Worker Thread. This is a design choice because the Worker might
  share objects with the producer thread. It gets worse if for
  the particular device, parsing is slow.

This new method `::tryCancel` will return false if canceling is
not possible without blocking, so we can defer.
Emit notifications when the work is completed. This will be
used by the next patch.
We cannot simply delete the TileData at will because
it might block. The custom deleter makes sure that
it will only delete the object when it is non-blocking.
@jfirebaugh jfirebaugh force-pushed the 3724-blocking_when_destroying_tiledata branch from a74e11b to e3a4a03 Compare April 15, 2016 01:34
@jfirebaugh jfirebaugh mentioned this pull request Apr 15, 2016
@jfirebaugh
Copy link
Contributor

Closing; I think we should investigate alternative solutions here per above comments.

@jfirebaugh jfirebaugh closed this Apr 15, 2016
@jfirebaugh jfirebaugh deleted the 3724-blocking_when_destroying_tiledata branch April 15, 2016 19:20
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug performance Speed, stability, CPU usage, memory usage, or power usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants