Skip to content
This repository has been archived by the owner on Aug 8, 2023. It is now read-only.

De-mutex GlyphAtlas; expose glyph information to workers via message. #8341

Merged
merged 2 commits into from
Apr 4, 2017

Conversation

ChrisLoer
Copy link
Contributor

This PR ports the gl-js GlyphAtlas architecture (roughly) to gl-native. Instead of using a mutex-protected GlyphAtlas shared across all worker threads, we restrict access to the GlyphAtlas to the main thread, and use a message passing interface for worker threads to request glyph information from the atlas.

Working on Harfbuzz/Freetype code motivated me to make these changes -- as the shaping code grew more complex/expensive, holding onto the GlyphAtlas mutex became more of a point of resource contention. Also, reasoning about mutexes is hard.

I haven't done any performance benchmarking on this PR yet. In theory it should reduce contention, but it also introduces some extra copying and the extra round trip of always having to send a message to the main thread for glyphs even if they're already locally available.

I'm putting up the PR early to get feedback on the approach. If the approach looks good, we can also use it for SpriteAtlas, and at that point the worker threads should (?) be running independent of shared state the same way they do in gl-js. If we're doing the same changes for both atlases, we could also update all the "symbol dependency" code to share one common pathway. Right now this new pathway is just kind of glued on to the symbolDependenciesChanged code from #6928.

I started out by porting gl-js callbacks more-or-less directly to C++ lambdas, but tracking ownership quickly got... complicated. Instead, I made GlyphAtlas directly responsible for tracking who needs to get sent what data when glyphs become available. There's more code than I'd like added to GlyphAtlas, but hopefully it's a simplification overall.

/cc @kkaefer @jfirebaugh @lucaswoj @mollymerp

@ChrisLoer ChrisLoer added needs discussion performance Speed, stability, CPU usage, memory usage, or power usage refactor text rendering labels Mar 10, 2017
@mention-bot
Copy link

@ChrisLoer, thanks for your PR! By analyzing this pull request, we identified @jfirebaugh, @1ec5 and @ansis to be potential reviewers.

@ChrisLoer ChrisLoer force-pushed the cloer_demutex_glyph_atlas branch 2 times, most recently from c8c20a5 to e40429b Compare March 10, 2017 17:20
@ChrisLoer ChrisLoer added the ⚠️ DO NOT MERGE Work in progress, proof of concept, or on hold label Mar 10, 2017
Copy link
Contributor

@jfirebaugh jfirebaugh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach looks good. My suggestions are focused on simplifying the datastructures and algorithms used in GlyphAtlas.

}

GeometryTile::~GeometryTile() {
glyphAtlas.removeGlyphs(reinterpret_cast<uintptr_t>(this));
Copy link
Contributor

@jfirebaugh jfirebaugh Mar 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uintptr_t was a hack to avoid a circular dependency. You can change the signature to GlyphAtlas::removeGlyphs(GlyphRequestor&) and remove reinterpret_cast.

}

void GeometryTile::getGlyphs(GlyphDependencies glyphDependencies) {
glyphAtlas.getGlyphs(reinterpret_cast<uintptr_t>(this), std::move(glyphDependencies), *this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too, remove the reinterpret_cast<uintptr_t>(this) argument in favor of GlyphRequestor&/*this. Probably makes sense to swap the argument order so that it's GlyphAtlas::getGlyphs(GlyphRequestor&, GlyphDependencies).


util::WorkQueue workQueue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove #include <mbgl/util/work_queue.hpp> along with this.

tileDependencies.emplace(std::piecewise_construct,
std::forward_as_tuple(tileUID),
std::forward_as_tuple(missing, std::move(glyphDependencies), &requestor));
for (auto fontStack : missing) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this loop be merged with the generation of missing above?

};
std::unordered_map<uintptr_t,TileDependency> tileDependencies;
typedef std::pair<FontStack,GlyphRange> PendingGlyphRange;
std::map<PendingGlyphRange, std::set<uintptr_t>> pendingGlyphRanges;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding a new (FontStack, GlyphRange)-keyed map, can we add a std::set<GlyphRequestor*> to GlyphPBF and perform a FontStackGlyphRangeGlyphPBF lookup via GlyphAtlas::entries and Entry::ranges?

for (const auto& fontStack : glyphDependencies) {
for (auto glyphID : fontStack.second) {
GlyphRange range = getGlyphRange(glyphID);
if (!hasGlyphRange(fontStack.first,range)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop should perform direct lookups in entries rather than calling hasGlyphRange. That way, the entries.find(fontStack) step can be hoisted out of the inner loop.

void GlyphAtlas::getGlyphs(uintptr_t tileUID, GlyphDependencies glyphDependencies, GlyphRequestor& requestor) {
// Figure out which glyph ranges need to be fetched
GlyphRangeDependencies missing;
for (const auto& fontStack : glyphDependencies) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop variable is a pair whose first member is a FontStack, so this name is misleading.

@ChrisLoer
Copy link
Contributor Author

I'm moving forward with making matching changes in SpriteAtlas, which allows us to get rid of most of the "symbolDependenciesChanged" logic.

One performance consideration I didn't consider until now is that we won't be able to start on prepare calls until all glyphs/sprites are available. Before, if one SymbolLayout depended on a network request, we could go ahead and do prepare calls for everything else while we waited for the reply -- and then when the network request finished, we'd have less work to do to get the tile ready.

If we think we need to keep the incremental preparation capability, I'll need to change the onGlyphsAvailable logic to be more granular. If we think it's not that important, then we can simplify GeometryTileWorker even more (attemptPlacement just becomes place, we stop having to keep track of SymbolLayout::state, etc.).

/cc @mourner

@jfirebaugh
Copy link
Contributor

My guess is that incremental preparation is not critical. It looks like GL JS does not use it. But I don't think it's ever been benchmarked or otherwise empirically measured.

we stop having to keep track of SymbolLayout::state

The other thing that SymbolLayout::state is used for is to avoid re-doing preparation when re-doing placement, i.e. when the map is rotated. This is probably more valuable to preserve (although again, no empirical data on this).

@jfirebaugh
Copy link
Contributor

Pushed a few more suggested simplifications in d3c91db. Further possible refactoring I see:

  • Inline GlyphAtlas::addGlyph into addGlyphs (avoiding some repetitive entries[fontStack] lookups).
  • Combine the Entry::glyphValues and GlyphSet::sdfs maps (both keyed by glyph ID). This could be a part of a larger refactor:
  • Merge GlyphSet and GlyphPBF into GlyphAtlas. These auxiliary classes don't have a clear role. It makes more sense to merge them, and then consider how to extract GlyphAtlas::Entries as a standalone class instead (mainly for testing).

@ChrisLoer
Copy link
Contributor Author

I do not have the ASCII diagramming skills necessary to include the updated GeometryTileWorker state transition diagram in the comments, but as part of the refactoring I drew up as detailed a version as I could:

geometrytileworker

I'm much less familiar with the SpriteAtlas code than the GlyphAtlas code, so it would be nice to get some eyes on this from someone who knows annotations/etc. I had some open questions:

  • Is it safe to assume that all access to SpriteAtlas is on the main thread now that's I've removed the access from GeometryTileWorker? It appears that way to me, but I'm not very familiar at all with annotations.
  • I started development with the idea that we'd have to track individual icon dependencies, but it looks like we only really ask "is this SpriteAtlas loaded". If there's no value to tracking individual dependencies, I can strip out some of that code.
  • GeometryTileWorker assumes it can be dependent on any number of sprite atlases, but it looks like usually there's only two (one for the annotation manager, one for the style). If that's true, that might allow for some simplification -- and also if we knew what sprite atlases were needed before we started layout, we could tell the worker ahead of time if they were already loaded and skip the getIcons round trip.

@ChrisLoer
Copy link
Contributor Author

I'm a bit baffled by the bitrise failure on text-font/chinese: http://mapbox.s3.amazonaws.com/mapbox-gl-native/render-tests/9980/index.html

It looks like four symbols in that test get partially distorted. It doesn't seem to be obviously timing related since the same four symbols had the problem three builds in a row. The problem doesn't show up when I run test locally (xcode 8.2, macOS 10.12.3), or on any of the other platforms that run the node tests.... Maybe some kind of difference in how the platform handles some of the math?

@ChrisLoer
Copy link
Contributor Author

I did some basic benchmarking by running the macosapp in release mode, with the map centered over greater asia and using a style that loads {name} instead of {name_en} (so the test was designed to have lots of glyph dependencies). I recorded milliseconds from initialization of the Map object to the time of the last update call to the map, doing four runs on master and four runs on this branch. I also re-did the tests with the cache deleted between each run.

Master De-Mutexed
With Cache 1107 ms 888 ms
No Cache 1818 ms 1717 ms

These numbers make me feel fairly confident that this branch isn't introducing a major performance regression, although there are definitely theoretical cases where these changes could slow us down.

@ChrisLoer ChrisLoer force-pushed the cloer_demutex_glyph_atlas branch 2 times, most recently from 162c97a to 3dce1ca Compare March 22, 2017 19:28
@jfirebaugh
Copy link
Contributor

The test failure might be an instance of #6863 -- it's hard to tell exactly because the icon is partially obscured by text.

@ChrisLoer
Copy link
Contributor Author

Interesting... yeah it looks sort of similar -- although the screenshots in #6863 looks like a sort of alternating pixel-on/pixel-off pattern, whereas in this case a whole bunch of pixels just seem to be shifted one pixel right.

@ChrisLoer
Copy link
Contributor Author

The behavior looks very similar using basketballs instead of dogs (just to change the index into the sprite texture): http://mapbox.s3.amazonaws.com/mapbox-gl-native/render-tests/10009/index.html

@ChrisLoer
Copy link
Contributor Author

And it still happens even with bigger basketballs: http://mapbox.s3.amazonaws.com/mapbox-gl-native/render-tests/10010/index.html

It kind of looks like the vertices for the lower right triangle are ending up shifted one pixel right from the vertices for the upper left triangle.

@ChrisLoer ChrisLoer force-pushed the cloer_demutex_glyph_atlas branch 3 times, most recently from b340de5 to 8570a88 Compare March 23, 2017 03:58
@ChrisLoer
Copy link
Contributor Author

I did an experiment with the fragment shader doing its own rounding... the results (http://mapbox.s3.amazonaws.com/mapbox-gl-native/render-tests/10015/index.html) show lots of little changes for icons, but for the text-font/chinese test the broken behavior is totally unaffected.

@ChrisLoer
Copy link
Contributor Author

On the other hand, making the symbol icon vertex shader round gl_Position to integer values solves the split-icon problem in text-font/chinese (at the expense of making lots of small changes all over the map). http://mapbox.s3.amazonaws.com/mapbox-gl-native/render-tests/10019/index.html

@@ -143,24 +154,46 @@ void GeometryTileWorker::setPlacementConfig(PlacementConfig placementConfig_, ui
}
}

void GeometryTileWorker::onGlyphsAvailable(GlyphPositionMap glyphs) {
glyphPositions = std::move(glyphs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add assert(waitingForGlyphs);

}

void GeometryTileWorker::onIconsAvailable(IconAtlasMap icons_) {
icons = std::move(icons_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add assert(waitingForIcons);

case Idle:
if (hasPendingSymbolDependencies()) {
case NeedPlacement:
if (!hasPendingSymbolDependencies()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setPlacementConfig needs this guard as well for case Idle. Or alternatively (my preference): remove the places coalesce is currently called in favor of inlining it at the end of attemptPlacement. attemptPlacement already has an early exit for the hasPendingSymbolDependencies case, so it can then be called unconditionally.

Along with this, I also suggest factoring the followup to redoLayout that's currently repeated in four places:

            if (!hasPendingSymbolDependencies()) {
                coalesce();
            } else {
                state = NeedPlacement;
            }

Move that into redoLayout while relying on the fact that attemptPlacement now itself calls coalesce:

void GeometryTileWorker::redoLayout() {
    ...

    if (hasPendingSymbolDependencies()) {
        state = NeedPlacement;
    } else {
        attemptPlacement();
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are good suggestions. Making the changes caused some problems with a part of the logic I hadn't fully worked through before (that is: what happens when messages arrive but the placement config or data isn't set yet). I addressed those and everything looked great aside from the state transition diagram getting more complicated... but now I'm seeing an intermittent memory corruption problem. So that's reassuring.

I'll take another stab at this tomorrow.

geometrytileworker2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you push these changes?

@ChrisLoer
Copy link
Contributor Author

ChrisLoer commented Mar 28, 2017

@jfirebaugh or @kkaefer, maybe you can help me figure out how to handle this tricky case:

  • Placement starts, but has symbol dependencies so it can't finish right away
  • setData is called, so GeometryTileWorker moves into NeedLayout state
  • Symbol dependencies for the initial placement arrive

The previous behavior was that since we were in the NeedLayout state, we would redo layout, and then move on to placement. Re-doing layout might cause there to be new symbol dependencies, but since we'd be looking at the same tile it would be likely that there were no new dependencies (and since the worker threads could access the atlases directly, they could finish placement without having to yield).

Now that symbol dependencies are being satisfied asynchronously, the behavior is a little different. The simplest approach is just to keep doing pretty much the same thing:

  • Symbol dependencies arrive while we're in NeedLayout
  • We trigger redoLayout, which will cause us to ask for symbol dependencies again and throw out the ones that just arrived.
  • When those dependencies arrive, we finish placement.

The problem with this approach is that it opens up room for starvation -- where we keep getting new layout requests and we never finish making placement calls. The starvation possibility technically existed before but it would be very hard to trigger since each layout call would have to introduce a new symbol dependency.

An alternative approach I've been trying is:

  • Symbol dependencies arrive while we're in NeedLayout
  • Do placement, send result to main thread
  • Trigger redoLayout (and thus another placement)

But the problem with this approach is that if someone's calling Map::renderStill, they'll get a result after the first placement happens, but if they had triggered a layout change, they will expect that change to be in their result.

After writing this all down, it occurs to me there's a third approach that probably makes sense here -- hold on to symbol dependencies in the GeometryTileWorker between calls and only ask the atlases for more dependencies if their are new ones. That gets us back to pretty close to the original behavior (which still technically has a starvation case, but we're probably OK with it) -- and is probably a good performance optimization too. Thanks rubber duckies!

@jfirebaugh
Copy link
Contributor

I'm not terribly worried about the problems with either of the first two approaches:

  • The starvation scenario relies on continuously receiving layout requests, an edge case which seems pretty unlikely.
  • The Map::renderStill case also seems unlikely -- static rendering makes very limited use of runtime styling. That said, correlationID is the intended mechanism to avoid a GeometryTile being marked as fully loaded until the most recently sent layout or placement request has been completed. Does that cover this scenario?

@ChrisLoer
Copy link
Contributor Author

The Map::renderStill case triggered a test failure in AnnotationTest/nonImmediateAdd. The correlationID didn't help because the tile worker just holds on to the latest correlation ID it receives, and in the "do placement first" case it sends a result for the old placement using the new correlation ID.

I'm happy with the solution of holding on to symbols between layout calls -- it's an easy and sensible optimization. It does make me realize there's a kind of funny case where a tile with lots of changing text (on some map hooked up to a live data feed?) wouldn't garbage collect glyphs until the tile was removed. But nothing here changes that behavior.

The changes I just pushed pass all my tests locally, but I don't have a fully up-to-date state transition diagram for them and I think there might still be some edge cases hiding out around what happens when you have symbol-dependent layout happening before the initial call to setPlacementConfig.

@ChrisLoer
Copy link
Contributor Author

OK, I'm back to having convinced myself this is correct. There were in fact some edge cases I wasn't handling (although I think they would have been hard to hit). Here is the updated state transition diagram:

geometrytileworker3

@ChrisLoer ChrisLoer force-pushed the cloer_demutex_glyph_atlas branch 2 times, most recently from 01405cb to 0494d2f Compare March 31, 2017 22:12
@ChrisLoer ChrisLoer removed needs discussion ⚠️ DO NOT MERGE Work in progress, proof of concept, or on hold labels Mar 31, 2017
@ChrisLoer
Copy link
Contributor Author

@jfirebaugh / @kkaefer This is just about ready to go from my point of view, but I'm still waiting for further input/perspective from you guys. I'd like to get it in soon-ish if it's reasonable so I can stop worrying about keeping the PR in sync with master.

@jfirebaugh
Copy link
Contributor

GeometryTileWorker has gotten too complex. How can we simplify it?

  • Why is firstPlacement necessary? How could we remove it?
  • Is there an alternative to the coalescing mechanism that would allow avoiding unnecessary work in a simpler way?
  • Should we remove the initial two step layout/placement step, and not render anything in a tile until all layers including symbol layers are ready? (This is what GL JS does.)

@ChrisLoer
Copy link
Contributor Author

I thought that it would be a simplification to get rid of the canPrepare step and replace it with just "if your symbols have arrived, do all your prepare calls". However, that meant we needed to track extra state to make sure the symbols that were arriving matched the symbols that had just been requested. That cascaded into the special firstPlacement state. So... not a simplification after all. A better approach may be to move back towards the previous logic state transition logic, but just treat each symbolDependenciesChanged call as if it's updating a local copy of the glyph atlas -- it's actually easy to do because we don't have to worry about stale glyphs being removed over the life of a GeometryTileWorker (they don't get removed from the main thread glyph atlas until the GeometryTile is destroyed). I'll give that a try and see how it works.

I don't think the two-step rendering actually adds that much complication on the side of GeometryTileWorker -- it's just a matter of sending an extra onLayout message to the main thread. Complexity aside, I think it may be a loss in terms of smooth rendering. It's nice enough if you're showing a map for the first time to show the terrain first and then have the symbols come in soon after. But if you're underzooming or overzooming, it's kind of jarring -- because once the terrain data for the new (zoomed) tile comes in, you render the new terrain (making symbols disappear), and then have the symbols pop back in soon after. It would be better to stay at the over/underzoomed tile until the symbols are available.

It seems to me the extra complication comes from:

  • On gl-native you can update layers without re-parsing tile data. Not sure if this is an important optimization, but it seems useful.
  • On gl-native you can create a tile without having any placement configuration. This seems like something we could get rid of.

@ChrisLoer
Copy link
Contributor Author

I went to remove the code I had that tracked individual icon dependencies, but in doing so realized we have a problem with Map::removeImage: if you remove an image that's in use, the worker threads will produce garbage references into the SpriteAtlas. This looks like a problem in the current version of master, but in master the problem will resolve (i.e. the removed icons will disappear instead of being garbage) on the next placement, whereas on this branch the problem won't resolve until the next layout.

I'm not sure what the right solution is here:

  • removeImage could clear all the tiles and re-trigger layout... at the cost of ugly flicker.
  • We could track dependencies and error if removeImage is called on an image that's still in use (or maybe better than tracking tile dependencies would be to just require that the current style doesn't reference the image before you can remove it?)
  • We could track dependencies and hold on to removed images until the last referencing tile was destroyed... but this is a little weird. Should the icon disappear on tiles that get laid out after the remove call, but still be present on tiles that haven't had layout re-triggered? What should happen if you paired removeImage and addImage to try to update an image?

It looks like @ivovandongen touched on a closely related problem at #6735... and I see there's a proposal for a separate updateImage there.

@jfirebaugh
Copy link
Contributor

I think the right fix is a combination of the first and third points, but let's defer this for now.

@ChrisLoer
Copy link
Contributor Author

  • I pushed the changes that bring GeometryTileWorker's state transitions much closer to their original behavior.
  • I left the code for tracking individual icon dependencies in the branch, even though the code's not hooked up to anything useful now -- with the idea that will be useful later for addressing the removeImage issue.
  • I like the idea of avoiding the intermediate render for tiles without symbols, but that doesn't need to come into this PR.
  • I'm ambivalent about removing the optional<PlacementConfig> behavior -- now that we've moved back to the original state transitions, it doesn't add much complexity, and it seems unlikely that rendering with the default PlacementConfig constructor is ever really going to be what the user of the class expects/wants (for instance, if you called setData and immediately after that called setPlacementConfig, would you expect placement to happen twice, the first time with a throwaway default config?).

Copy link
Contributor

@jfirebaugh jfirebaugh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phew! Nice work.

 - Expose glyph and icon information to workers via message interface.
 - Glyph/SpriteAtlas track which tiles have outstanding requests
    and send messages to them when glyphs/icons become available.
 - Remove obsolete "updateSymbolDependentTiles" pathway
 - Symbol preparation for a tile now depends on all glyphs becoming
    available before it can start.
 - Start tracking individual icons needed for a tile, although we don't
    do anything with the information yet.
 - Introduce typedef for GlyphID
@ChrisLoer
Copy link
Contributor Author

Hallelujah! Thanks for all the help, @jfirebaugh. For anyone reading this PR in the future, ignore all the state transition diagrams in the conversation here. The end result matches what was in the code comment all along, although I've included an updated graphic that spells it out in a little more detail.

geometrytileworker4

@ChrisLoer ChrisLoer merged commit ad29489 into master Apr 4, 2017
@ChrisLoer ChrisLoer deleted the cloer_demutex_glyph_atlas branch April 4, 2017 18:33
@ChrisLoer
Copy link
Contributor Author

Uploading an updated version of the GeometryTileWorker diagram here, just to have it hosted by GitHub for inclusion in the wiki:

geometrytileworker5

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
performance Speed, stability, CPU usage, memory usage, or power usage refactor text rendering
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants