Skip to content
This repository has been archived by the owner on Nov 14, 2024. It is now read-only.

Don't perform async initialization on calling thread #5746

Merged
merged 5 commits into from
Nov 10, 2021

Conversation

pkoenig10
Copy link
Member

@pkoenig10 pkoenig10 commented Nov 3, 2021

Goals (and why):

Enable faster server initialization when using initializeAsync

In large internal authentication service, we've noticed that some installations take up to 4 minutes for the server to be initialized. Logs indicate that CassandraClientPoolImpl initialization is happening before the server is initialized and takes a majority of those 4 minutes.

Implementation Description (bullets):

Before this PR, when initializeAsync is true, AtlasDB attempts to initialize components synchronously on the calling thread. If the initialization takes a long time (like it does for CassandraClientPoolImpl), then this will block other initialization work.

After this PR, when initializeAsync is true, AtlasDB will schedule the initialization to run immediately on the executor thread. This way other initialization can continue while the AtlasDB initialization happens in the background.

This seems to be more intuitive behavior for a flag named initializeAsync.

Testing (What was existing testing like? What have you done to improve it?):

I've updated AsyncInitializerTest to verify this behavior. I've also refactored this test class to make it a bit more readable.

@changelog-app
Copy link

changelog-app bot commented Nov 3, 2021

Generate changelog in changelog/@unreleased

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

Initialization always happens asynchronously when initializeAsync is true.

Check the box to generate changelog(s)

  • Generate changelog entry

@pkoenig10
Copy link
Member Author

Maybe we don't want this since it will introduce some additional overhead in the typical case.

Digging into metrics a bit more, the slow initialization issue seems to be isolated to a couple installations, so this may be more of an environment issue.

@jeremyk-91
Copy link
Contributor

I was OOTO from last week Friday. Will get to this at some point this week.

Copy link
Contributor

@jeremyk-91 jeremyk-91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I have some comments on one of the tests, but in general I agree that this model is preferable. I'd like a quick second check from @gmaretic who had worked on this feature at least in a reviewing capacity back when it was implemented, in case there are any clients that might actually have depended on this specific behaviour.

}

private static final class AlwaysFailingInitializerAssert
extends AbstractObjectAssert<AlwaysFailingInitializerAssert, AlwaysFailingInitializer> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ sometimes I feel we should use this more often in Atlas...

Copy link
Contributor

@jeremyk-91 jeremyk-91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks!

TimestampService timestampService = factory.createManagedTimestampService(kvs, Optional.empty(), true);

assertThat(timestampService.isInitialized()).isFalse();
assertThatThrownBy(timestampService::getFreshTimestamp).isInstanceOf(NotInitializedException.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof, looks like this assertion failed on CircleCI :/ would it make sense to pass the executor for async init in?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately the race condition here looks hard to avoid

Copy link
Contributor

@jeremyk-91 jeremyk-91 Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I realise this kind of went out of the scope of the original change, if you'd prefer someone on the team to do this instead, just let me know)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I noticed this when making this change. I was surprised the KVS test hasn't flaked before.

I'm happy to fix this. Like you said, the race condition here is hard to avoid. Do you have a suggestion? These tests don't seem that useful to me, I'd probably just remove them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can remove just the not initialized assertions.

@bulldozer-bot bulldozer-bot bot merged commit 4e73663 into develop Nov 10, 2021
@bulldozer-bot bulldozer-bot bot deleted the pkoenig10/async branch November 10, 2021 16:19
@svc-autorelease
Copy link
Collaborator

Released 0.469.0

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants