Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in Cesium3DTileset / Reconsider readyPromise pattern #10909

Closed
mramato opened this issue Nov 17, 2022 · 12 comments · Fixed by #11059
Closed

Race condition in Cesium3DTileset / Reconsider readyPromise pattern #10909

mramato opened this issue Nov 17, 2022 · 12 comments · Fixed by #11059

Comments

@mramato
Copy link
Contributor

mramato commented Nov 17, 2022

  • Paste this into a new file called aTrivialSpec.js into packages/engine/Specs/Scene/.
import { Cesium3DTileset, DeveloperError } from "../../index.js";
import createScene from "../../../../Specs/createScene.js";

fdescribe("trivialSpec", function () {
  let scene;
  beforeAll(function () {
    scene = createScene();
  });

  afterAll(function () {
    scene.destroyForSpecs();
  });

  it("a trivial test", function () {
    const tileset = new Cesium3DTileset({ url: "test.invalid" });
    expect(function () {
      throw new DeveloperError();
    }).toThrowDeveloperError();
    tileset.destroy();
  });
});
  • Run the unit tests
  • Most of the time it will fail/hang with the below error
$ npm -- run test --browsers ChromeHeadless

> [email protected] test
> gulp test --browsers ChromeHeadless

[20:29:15] Using gulpfile ~/Git/CesiumGS/cesium/gulpfile.js
[20:29:15] Starting 'test'...

  trivialSpec
    ✓ a trivial test
Chrome Headless 107.0.5304.110 (Linux x86_64) ERROR
  An error was thrown in afterAll
  Unhandled promise rejection: Request has failed. Status Code: 404 thrown
Chrome Headless 107.0.5304.110 (Linux x86_64) ERROR
  An error was thrown in afterAll
  Unhandled promise rejection: Request has failed. Status Code: 404 thrown
0.068 secs: trivialSpec a trivial test
0 secs: Renderer/Buffer fails to destroy: WebGL 2
0 secs: Renderer/Buffer fails to copy a large array view: WebGL 2
0 secs: Renderer/Buffer fails to provide an array view: WebGL 2
0 secs: Renderer/Buffer destroys: WebGL 2
0 secs: Renderer/Buffer copyFromBuffer with index buffers: WebGL 2
0 secs: Renderer/Buffer copyFromBuffer with vertex buffers: WebGL 2
0 secs: Renderer/Buffer copyFromBuffer throws when readBuffer is the same buffer and copy range overlaps: WebGL 2
0 secs: Renderer/Buffer copyFromBuffer throws with one index buffer and the other is not an index buffer: WebGL 2
0 secs: Renderer/Buffer copyFromBuffer throws with invalid sizeInBytes: WebGL 2

Chrome Headless 107.0.5304.110 (Linux x86_64): Executed 1 of 13836 (skipped 156) ERROR (0.02 secs / 0.068 secs)

[20:29:18] Finished 'test' after 2.51 s
Chrome Headless 107.0.5304.110 (Linux x86_64) ERROR
  Disconnected , because no message in 120000 ms.
Chrome Headless 107.0.5304.110 (Linux x86_64) ERROR
  Disconnected , because no message in 120000 ms.
0.068 secs: trivialSpec a trivial test
0 secs: Renderer/BuiltinFunctions has czm_transpose (4x4)
0 secs: Renderer/BuiltinFunctions has czm_transpose (3x3)
0 secs: Renderer/BuiltinFunctions has czm_transpose (2x2)
0 secs: Renderer/AutomaticUniforms can declare automatic uniforms
0 secs: Renderer/AutomaticUniforms has czm_viewport
0 secs: Renderer/AutomaticUniforms has czm_viewportOrthographic
0 secs: Renderer/AutomaticUniforms has czm_viewportTransformation
0 secs: Renderer/AutomaticUniforms has czm_globeDepthTexture
0 secs: Renderer/AutomaticUniforms has czm_model
  • It seems to happen more often with npm -- run test --browsers ChromeHeadless, but since it's a race condition it's hard to say why.

I'm fairly certain what's happening is that Cesium3DTileset has an async code path that keeps running, even after destroy is called (and therefore after the test is finished). In 99.99% of the times I've seen this error from karma/jasmine, that's always been the case in general.

@mramato
Copy link
Contributor Author

mramato commented Nov 17, 2022

I already found one race condition, https://github.com/CesiumGS/cesium/blob/main/packages/engine/Source/Scene/Cesium3DTileset.js#L978 is missing a if (that.isDestroyed()) { return;} check, similar to what is in the other blocks in that chain. However, fixing that did not fix this problem for me.

@ggetz
Copy link
Contributor

ggetz commented Nov 17, 2022

I'm fairly certain what's happening is that Cesium3DTileset has an async code path that keeps running, even after destroy is called (and therefore after the test is finished).

That seems likely. We'll take a look at this further and see if we can get to the root of the rogue code path.

In the meantime, is it possible to await the tileset's ready promise as a workaround?

@mramato
Copy link
Contributor Author

mramato commented Nov 18, 2022

That will certainly work around the trivial reproduction case above, but not sure how much work it will take to update the actual codebase I'm working with.

In general, the entire readyPromise approach is actually a bad abstraction that's may become a larger problem in modern systems, with unhanded promise detection detecting a failed readyPromise as a critical error unless someone awaits it.

I wish there was a way to synchronously create a tileset, like it just doesn't load anything until you call loadTileset or something. That would make writing tests much easier.

@mramato
Copy link
Contributor Author

mramato commented Nov 18, 2022

Awaiting the promise in the real code did fix part of the problem, but I then discovered Viewer has the same issue.

    fit("crash jasmine", function () {
      const viewer = new Viewer(document.createElement("div"));
      viewer.destroy();
    });
    ✗ crash jasmine
        Global error spy was not uninstalled. (Did you forget to await the return value of spyOnGlobalErrorsAsync?)
            at <Jasmine>

Chrome Headless 106.0.5249.119 (Linux x86_64) ERROR
  An error was thrown in afterAll
  Unhandled promise rejection: Request has failed. thrown

Cesium seems to have unhandled promise failures sprinkled throughout the code, largely because of the flawed readyPromise convention that does not match modern Promise best practices.

@mramato
Copy link
Contributor Author

mramato commented Nov 18, 2022

The Viewer issue seems to be related to creating the default world imagery provider and is probably related to the existence of imageryProvider.readyPromise, which also does not have it's rejection handled.

I think I can work around this one in my tests by telling the Viewer not to create imagery, but it's still a nasty underlying bug that should be addressed.

@ggetz
Copy link
Contributor

ggetz commented Nov 18, 2022

In general, the entire readyPromise approach is actually a bad abstraction that's may become a larger problem in modern systems, with unhanded promise detection detecting a failed readyPromise as a critical error unless someone awaits it.

Cesium seems to have unhandled promise failures sprinkled throughout the code, largely because of the flawed readyPromise convention that does not match modern Promise best practices.

I do agree that this is a fundamental problem that needs fixing. I think this is alluded to in a few other issues, but does not have its own, so we can use this one to track it.

@ggetz ggetz changed the title Race condition in Cesium3DTileset Race condition in Cesium3DTileset / Reconsider readyPromise pattern Nov 18, 2022
@ggetz
Copy link
Contributor

ggetz commented Dec 6, 2022

In general, the entire readyPromise approach is actually a bad abstraction that's may become a larger problem in modern systems, with unhanded promise detection detecting a failed readyPromise as a critical error unless someone awaits it.

I wish there was a way to synchronously create a tileset, like it just doesn't load anything until you call loadTileset or something. That would make writing tests much easier.

This will necessitate a breaking change in Cesium3DTileset such that instantiating a tileset will no longer start loading resources. And the same goes for the the other parts of the API which employ a similar pattern, such as the imagery providers and terrain providers. Correct? I don't see a way to deprecate that gracefully.

@mramato
Copy link
Contributor Author

mramato commented Dec 6, 2022

I don't see a way to deprecate that gracefully.

  • The normal pattern for this is that you can don't actually take the url as a constructor parameter. You take whatever the url would return.
  • You then add a static fromUrl or similarly named helper function to actually load from a url. (This returns an instance).
  • You can gracefully deprecate this by having the deprecation be trigger by passing a url to the constructor (and having the old behavior) but if they don't pass the url, they get the new behavior.

This should work for most/all cases in Cesium that are following this pattern.

Basically constructors should never be async or do async work. See IonResource as a simple example of the above pattern.

@mramato
Copy link
Contributor Author

mramato commented Dec 6, 2022

An alternative, perhaps shorter-term, fix would be to just make sure that every promise is either exposed as part of the API or handled internally; but my personal bias would be the suggestion I had above.

@ggetz
Copy link
Contributor

ggetz commented Dec 6, 2022

Thanks, I agree the first option is preferable as we've used it elsewhere in the codebase.

I think I misunderstood because of the mention of the new load function. So the desired end result here would be a fromUrl method for the initial step to load the JSON file, then in addition a loadTileset function which would actually start loading tiles once it is explicitly called. The existing behavior of both loading the JSON and starting to load the tileset content would be deprecated, and only executed when passing in the url parameter.

@javagl
Copy link
Contributor

javagl commented Dec 6, 2022

When reviewing the handling of promise chains and catch usage, one might want to have a look at the Viewer#updateZoomTarget function. It has two very similar code blocks for the zoom target being a Cesium3DTileset or a TimeDynamicPointCloud, with the latter omitting a

      .catch(() => {
        cancelZoom(viewer);
      });

that is present in the former. (Omitting this may or may not be intentional, but that could be part of the review - maybe the code block could be moved into a method to handle both target types equally...?)

@javagl
Copy link
Contributor

javagl commented Mar 31, 2023

I'm not deeply involved in the process and state here (and I assume that the readyPromises have only been deprecated until now, and are supposed to be removed afterwards). But should the point from my last comment be moved into an own issue?

Specifically, the current Viewer.updateZoomTarget contains these two pieces of code:

Cesium Viewer Zoom Difference

First, it should be moved into a updateZoomTargetForBoundingSphere function.
Second: The missing cancelZoom for the TimeDynamicPointCloud looks suspicious.
(It might be intentional, but that seems unlikely...)

@ggetz ggetz moved this from In Progress to Issue/PR closed in CesiumJS Issue/PR backlog Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Issue/PR closed
Development

Successfully merging a pull request may close this issue.

3 participants