Defer security auto-configuration on initial node startup #82574

albertzaharovits · 2022-01-13T19:30:32Z

The new security autoconfiguration in version 8 prints information to the terminal. This information is required in order to access the node (eg the generated elastic user password). But this contends for screen lines with the normal log output of a starting node, to the extent that the security autoconfiguration can pass by unnoticed by the user.

For example, it can be easily missed in the case of a small-ish terminal window (32x110 px):

But it is observable on a big one (66x222 px):

This PR addresses this concern by:

making sure that auto-configuration is triggered only after the node is started (ClusterPlugin#onNodeStarted). Auto-configuration requires the http bind address, which is guaranteed to be available only after this point.
additionally deferring the auto-configuration by a fixed time amount (Thread.sleep(9000)), even after all the depended resources have finished initializing. This is ugly but the alternative to wait for other components to do their thing (eg installing index templates, or downloading of the geo-ip database), even though auto-configuration does not depend on then, is not a better alternative either.

Here are the results for the small terminal (32x110 px):

And for the bigger one (66x222 px):

Related: #82364

elasticmachine · 2022-01-13T19:33:40Z

Pinging @elastic/es-security (Team:Security)

albertzaharovits · 2022-01-13T19:34:23Z

It is deemed a bug and targeting v8 past FF because it impacts user experience which is one of the core goals of Security ON by default.

tvernum

I left some comments.
I think the change is good, but having to do it by blocking/sleeping threads isn't ideal.

If there's reasons why it needs to be this way, let me know, but I think we can shuffle things a little bit and avoid having threads waiting on signals/timers.

tvernum · 2022-01-14T06:03:41Z

...ity/src/main/java/org/elasticsearch/xpack/security/InitialNodeSecurityAutoConfiguration.java

+                        nodeStartedSignal.await();
+                        // a lot of stuff runs when a node just started, and the autoconfiguration is not time-critical
+                        // and nothing else depends on it; be a good sport and wait a couple
+                        Thread.sleep(9000);


Is there are reason for sleep rather than a scheduled thread?
It's minor issue but it's better not to use up one on the generic threads sleeping.

I overlooked the existence of the threadpool.schedule method.
I have pushed e0a33f3 .

In the above I actually forgot to remove the sleep, I've later pushed 459d6a9 .

tvernum · 2022-01-14T06:08:17Z

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/Security.java

@@ -1282,6 +1288,11 @@ public void onIndexModule(IndexModule module) {
        );
    }

+    @Override
+    public void onNodeStarted() {
+        this.nodeStartedSignal.countDown();


Is there a reason we signal rather than just moving the call to InitialNodeSecurityAutoConfiguration.maybeGenerateEnrollmentTokensAndElasticCredentialsOnNodeStartup into this method?

I presume it's because we'd need to push all the arguments somewhere, but with this approach we've got a generic thread sitting waiting on a countdown when it could just be deferred until the right part of the lifecycle.

I understand that we're trying to do the smallest thing here, so maybe it's OK for now, but I think it would be better to make InitialNodeSecurityAutoConfiguration a real object, and store it in a field, and then call execute() on it from onNodeStarted.

Is there a reason we signal rather than just moving the call to InitialNodeSecurityAutoConfiguration.maybeGenerateEnrollmentTokensAndElasticCredentialsOnNodeStartup into this method?

securityIndexManager.onStateRecovered must be invoked before the cluster state recovers, whcih means somewhere in the Node constructor.

To be clearer, there are technically two conditions that must be met to trigger this last part of auto-configuration:

the cluster state is just recovered and does not contain the .security index (which might later be created by other processes)

the node is started

In practice, in my tests, these happen almost simultaneously.

In order to detect 1), a Security Index manager listener has to be registered before cluster state recovery starts, otherwise it might miss the notification for the state recovery.

I understand that we're trying to do the smallest thing here, so maybe it's OK for now, but I think it would be better to make InitialNodeSecurityAutoConfiguration a real object, and store it in a field, and then call execute() on it from onNodeStarted.

I have replaced the CountdownLatch with a ListenableFuture, 4902844 .
This way no thread is blocked, instead a listener is registered, which can be called by anyone on any thread because itself only schedules an action.

jkakavas · 2022-01-14T07:20:06Z

How did we decide on the 9 seconds wait time ? It’s not that I have a bette suggestion or another number in mind but whether or not this is enough is mainly dependant on how much time the geo-up download stuff takes and this varies with each user’s connection speed , server availability etc
On the other hand, we don’t want to wait too much because we don’t want to risk the user losing focus from the terminal assuming “it’s done”

last time we discussed this in the project meeting , I remember folks being in general not pro a fixed amount of delay because of not knowing what amount makes sense and instead wanted to focus on making the template installation output and the geo ip output smaller by consolidating messages ie
#81640

Since geoip is unpredictable but largely resolved, I’m happy if 9s pulls us through the template output in a normal case (but I suspect this is also cpu/ram dependant) and we can remove / revisit when template output is adjusted

in general I’m +1 , just wanted to give some additional context. I can take a proper loook on Monday too when I’ll have a access to a laptop again but happy to merge this before if Tim approves

albertzaharovits · 2022-01-14T10:22:03Z

How did we decide on the 9 seconds wait time ?

I have chosen it.

I am familiar with that story, but I don't know what conclusion to take from it.

jkakavas · 2022-01-14T11:23:53Z

I have chosen it.

I realise that , I was just curious as to the why more than the who :) as in why 9, not 5 or 12
For instance, I’d go for the minimum amount of seconds that gets us after the template output. Could 9 be any shorter ?
I’m not trying to argue for the sake of argument, just trying to understand and maybe help tweak this a bit in favor of users ( if that can even be done )

I am familiar with that story, but I don't know what conclusion to take from it.

The reason I brought it up is that we had discussed it and decided not to do it (add a standard delay) but now we’re doing it. I’m fine with it, as I was when putting this forward as an option originally and we can revisit it in the future.

albertzaharovits · 2022-01-14T16:14:31Z

I realise that , I was just curious as to the why more than the who :) as in why 9, not 5 or 12
For instance, I’d go for the minimum amount of seconds that gets us after the template output. Could 9 be any shorter ?
I’m not trying to argue for the sake of argument, just trying to understand and maybe help tweak this a bit in favor of users ( if that can even be done )

I've not been very methodical when choosing the value. Usually only one or two downloaded geo dbs (there are 3 in total, 6 log lines) produce log lines before the auto-conf details are printed. I was content that the information was visible on the smallish terminal size.

last time we discussed this in the project meeting , I remember folks being in general not pro a fixed amount of delay because of not knowing what amount makes sense and instead wanted to focus on making the template installation output and the geo ip output smaller by consolidating messages ie

I'm not sure which is more reasonable: to merge the log lines or to delay the autoconf info. I don't have any favorite between the two. So I picked the one that was floated first when we discussed this issue last.

tvernum · 2022-01-18T03:50:33Z

wanted to focus on making the template installation output and the geo ip output smaller by consolidating messages

The low hanging fruit there is gone. The geo-ip messages have been reduced, but it's still too long for our needs and there's no chance of making them shorter in 8.0

tvernum

LGTM.
I suspect we'll want to revisit this (somewhere around 8.2?) but this change makes progress for 8.0

) The new security autoconfiguration in version 8 prints information to the terminal. This information is required in order to access the node (eg the generated elastic user password). But this contends for screen lines with the normal log output of a starting node, to the extent that the security autoconfiguration can pass by unnoticed by the user. This PR addresses this concern by: * making sure that auto-configuration is triggered only after the node is started (ClusterPlugin#onNodeStarted). Auto-configuration requires the http bind address, which is guaranteed to be available only after this point in time. * Additionally deferring the auto-configuration by a fixed time amount (9 sec), even after all the depended resources have finished initializing.

…82733) The new security autoconfiguration in version 8 prints information to the terminal. This information is required in order to access the node (eg the generated elastic user password). But this contends for screen lines with the normal log output of a starting node, to the extent that the security autoconfiguration can pass by unnoticed by the user. This PR addresses this concern by: * making sure that auto-configuration is triggered only after the node is started (ClusterPlugin#onNodeStarted). Auto-configuration requires the http bind address, which is guaranteed to be available only after this point in time. * Additionally deferring the auto-configuration by a fixed time amount (9 sec), even after all the depended resources have finished initializing.

Done

f60e490

albertzaharovits self-assigned this Jan 13, 2022

elasticsearchmachine added the v8.1.0 label Jan 13, 2022

albertzaharovits added :Security/Security Security issues without another label >bug v8.0.0 and removed v8.1.0 labels Jan 13, 2022

elasticmachine added the Team:Security Meta label for security team label Jan 13, 2022

albertzaharovits requested a review from tvernum January 13, 2022 19:34

tvernum reviewed Jan 14, 2022

View reviewed changes

albertzaharovits added 2 commits January 14, 2022 12:26

Schedule

e0a33f3

Merge branch 'master' into print-enrollment-saliently

b792d95

albertzaharovits added 2 commits January 14, 2022 16:59

Meh

459d6a9

Listener

4902844

albertzaharovits requested a review from tvernum January 14, 2022 16:14

tvernum approved these changes Jan 18, 2022

View reviewed changes

albertzaharovits added the v8.1.0 label Jan 18, 2022

albertzaharovits merged commit 05ddae8 into elastic:master Jan 18, 2022

albertzaharovits deleted the print-enrollment-saliently branch January 18, 2022 10:45

jkakavas mentioned this pull request Jan 28, 2022

InitialNodeSecurityAutoConfiguration should wait for onNodeStarted() #81160

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defer security auto-configuration on initial node startup #82574

Defer security auto-configuration on initial node startup #82574

albertzaharovits commented Jan 13, 2022 •

edited

Loading

elasticmachine commented Jan 13, 2022

albertzaharovits commented Jan 13, 2022

tvernum left a comment

tvernum Jan 14, 2022

albertzaharovits Jan 14, 2022

albertzaharovits Jan 14, 2022

tvernum Jan 14, 2022

albertzaharovits Jan 14, 2022

albertzaharovits Jan 14, 2022

albertzaharovits Jan 14, 2022

jkakavas commented Jan 14, 2022

albertzaharovits commented Jan 14, 2022

jkakavas commented Jan 14, 2022

albertzaharovits commented Jan 14, 2022

tvernum commented Jan 18, 2022

tvernum left a comment

Defer security auto-configuration on initial node startup #82574

Defer security auto-configuration on initial node startup #82574

Conversation

albertzaharovits commented Jan 13, 2022 • edited Loading

elasticmachine commented Jan 13, 2022

albertzaharovits commented Jan 13, 2022

tvernum left a comment

Choose a reason for hiding this comment

tvernum Jan 14, 2022

Choose a reason for hiding this comment

albertzaharovits Jan 14, 2022

Choose a reason for hiding this comment

albertzaharovits Jan 14, 2022

Choose a reason for hiding this comment

tvernum Jan 14, 2022

Choose a reason for hiding this comment

albertzaharovits Jan 14, 2022

Choose a reason for hiding this comment

albertzaharovits Jan 14, 2022

Choose a reason for hiding this comment

albertzaharovits Jan 14, 2022

Choose a reason for hiding this comment

jkakavas commented Jan 14, 2022

albertzaharovits commented Jan 14, 2022

jkakavas commented Jan 14, 2022

albertzaharovits commented Jan 14, 2022

tvernum commented Jan 18, 2022

tvernum left a comment

Choose a reason for hiding this comment

albertzaharovits commented Jan 13, 2022 •

edited

Loading