Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] TransportConfigUpdateAction causing a race condition during node bootstrap leading to 403/500 errors #3204

Closed
shikharj05 opened this issue Aug 18, 2023 · 22 comments · Fixed by #3810
Assignees
Labels
bug Something isn't working triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable.

Comments

@shikharj05
Copy link
Contributor

What is the bug?
Partial configuration is loaded/cached in security plugin objects/memory in case of a race condition during OpenSearch process bootstrap on a node. For e.g. this can lead to missing role mappings in-memory causing invalid 403s to the requests landing on the buggy node.
Details:
During a node bootstrap, initOnNodeStart is kicked-off after Rest layer is ready. However, a node joins the cluster and replica shard for security index is assigned after transport channel is ready and cluster manager is discovered. Hence, if a TransportConfigUpdateAction triggers an update for a particular config (let's say securityconfig), DynamicConfigFactory object gets initialized and this condition is never true leading to partial config in memory.

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Create a multi-node cluster. From a healthy node, execute something like below in a continuous loop
curl -XPATCH "http://localhost:9200/_plugins/_security/api/securityconfig" -H 'Content-Type: application/json' -d'
[{
    "op": "replace", "path": "/config/dynamic/authc/basic_internal_auth_domain/transport_enabled", "value": "true"
  }
]'
  1. Restart OS process on a different node, (you can enable debug logs on ConfigurationLoaderSecurity7 to check the config being pushed.
  2. Since step 1 is running in a continuous loop, chances are transport channel action initializes the object with Config Type only and hence, other configs will be skipped.

A temporary fix here is to either flush the cache using flush API OR send a dummy config update request.

What is the expected behavior?
Do not partially initialize objects.

What is your host/environment?

  • OS: Linux
  • Version - All
  • Plugins - Security plugin

Do you have any screenshots?
NA

Do you have any additional context?

A fix here would be to check the state of DynamicConfigFactory here before calling reload.

@shikharj05 shikharj05 added bug Something isn't working untriaged Require the attention of the repository maintainers and may need to be prioritized labels Aug 18, 2023
@cwperks
Copy link
Member

cwperks commented Aug 18, 2023

Thank you for filing this issue @shikharj05!

The proposed fix is on the right track, but I think a call to dynamicConfigFactory.isInitialized() would be susceptible to the same bug because DynamicConfigFactory sets initialized to true after any config update. See the implementation in DynamicConfigFactory here.

Instead of relying on the initialized value from DynamicConfigFactory, I think it would be possible to also add initialized as a field within the ConfigurationRepository where the cache is initially seeded from the security index.

public class ConfigurationRepository {
  private final AtomicBoolean initialized = new AtomicBoolean();

  private ConfigurationRepository(...) {
    bgThread = new Thread(() -> {
      ...create security index if not exists and upload all yaml files
      ...if index already exists then query the security index and seed the cache
     
      while (!dynamicConfigFactory.isInitialized()) {
         ...loop that waits for the cache to be seeded

         // *Note:* dynamicConfigFactory.isInitialized() will be set to true if TransportConfigUpdateAction occurs before initial seed
      }
     
      initialized.set(true);
      LOGGER.info("Node '{}' initialized", clusterService.localNode().getName());
    });
  }

  public boolean isInitialized() {
    return initialized.get();
  }
}

and over in TransportConfigUpdateAction, the nodeOperation method can be updated to:

public class TransportConfigUpdateAction {
    @Override
    protected ConfigUpdateNodeResponse nodeOperation(final NodeConfigUpdateRequest request) {
        if (configurationRepository.isInitialized()) {
          configurationRepository.reloadConfiguration(CType.fromStringValues((request.request.getConfigTypes())));
          backendRegistry.get().invalidateCache();
        }
        return new ConfigUpdateNodeResponse(clusterService.localNode(), request.request.getConfigTypes(), null);
    }
}

@shikharj05
Copy link
Contributor Author

Thanks for the quick response @cwperks !

The proposed fix is on the right track, but I think a call to dynamicConfigFactory.isInitialized() would be susceptible to the same bug because DynamicConfigFactory sets initialized to true after any config update. See the implementation in DynamicConfigFactory here.

Right, however - the method is called only in two cases-

  1. Config update propagated using Transport action which updates the required types only. (which is fine in case of an already up-to-date configuration and only new configuration needs to be updated)
  2. During node bootup which always initializes all configuration types.

Your fix works too! Should we maintain two flags or just rely on the same one?

@cwperks
Copy link
Member

cwperks commented Aug 18, 2023

Oh, you're right. I'm mistaken. A single flag would be better.

I see now that if the TransportConfigUpdateAction.nodeOperation call to reloadConfiguration is wrapped in if (dyanmicConfigFactory.isInitialized()) { ... } that the only place that can initialize the dynamicConfigFactory is within this call in the constructor of ConfigurationRepository which is created within createComponents on node bootstrap. The logic in the constructor of ConfigurationRepository is used to create the security index from the files or read from the security index if it already exists.

@DarshitChanpura
Copy link
Member

I like the idea of waiting until ConfigRepo sets DynamicConfigFactory to be available. Is the solution to add a getter for DCF once ConfigRepository is available, so that initialized flag can be accessed?

@shikharj05
Copy link
Contributor Author

I like the idea of waiting until ConfigRepo sets DynamicConfigFactory to be available. Is the solution to add a getter for DCF once ConfigRepository is available, so that initialized flag can be accessed?

I don't think we need to keep the TransportConfigUpdateAction.nodeOperation waiting if DCF is not initialized - this is because when later DCF gets initialized during node bootup, it will use local client and hence local replica(up-to-date since node is part of the cluster) to create the configuration cache/CR object and be up-to-date with latest configuration.

@expani
Copy link

expani commented Aug 21, 2023

    private void reloadConfiguration0(Collection<CType> configTypes, boolean acceptInvalid) {
        final Map<CType, SecurityDynamicConfiguration<?>> loaded = getConfigurationsFromIndex(configTypes, false, acceptInvalid);
        configCache.putAll(loaded);
        notifyAboutChanges(loaded);
    }

Let's consider a scenario where the startup thread has loaded the configurations from index. Method call to getConfigurationsFromIndex is complete. The other 2 lines of updating cache and notifying changes are not yet done (
maybe due to CPU Thread Context Switch but there can be tons of reasons )

A config update request comes to the node and sees that DCF is not yet initialised and skips the update flow.

Startup thread gets scheduled for CPU again and finishes initialisation of in-memory security config state.

We will miss the config update in this case. So, TransportConfigUpdateAction should wait for DCF to initialise completely before reloading the configuration again.

The wait can be done in TransportConfigUpdateAction either using a while loop by waiting for DCF.isInitialized() ( Consumes CPU Cycles unnecessarily ) OR a CountDownLatch can be used in DCF and await can be exposed via another method called DCF.waitForInit

@stephen-crawford stephen-crawford removed the untriaged Require the attention of the repository maintainers and may need to be prioritized label Aug 21, 2023
@stephen-crawford
Copy link
Contributor

[Triage] Thank you for filing this issue @shikharj05. This issue appears to have a clear path of action forward so marking triaged.

@stephen-crawford stephen-crawford added the triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable. label Aug 21, 2023
@DarshitChanpura
Copy link
Member

Update:

While trying to implement a CountDownLatch in place of isInitialized, I ran some preexisting tests to verify the new behavior. In doing so, I discovered that ConfigRepository's bgThread.start() which inherently calls the while loop is never triggered.(This was because the default init is set to false. Checkout initOnNodeStart() for details). The call directly passes goes to TransportConfigUpdateAction nodeOperations().

After manually modifying the test setup to trigger bgThread.start(), I encountered latch TimeoutException. This is due to following:

  1. dcfLatch.await() waits on countdown to reach 0.
  2. this countdown reaches 0 only via DCF's onChange() method call.
  3. This onChange method is triggered only via notifyAboutChanges() which is called in reloadConfiguration0().

This tells that onChange needs to be called atleast once to set DynamicConfigFactory as initialized. I'm currently unsure of how to achieve this prior to TransportConfigUpdateAction's nodeOperations() call.

@DarshitChanpura
Copy link
Member

DarshitChanpura commented Sep 13, 2023

A possible solution could be to set the countdown latch to 0 in DCF constructor indicating that this is initialized. However, I'm not sure about the effects of doing so.

For this:

@Override
    protected ConfigUpdateNodeResponse nodeOperation(final NodeConfigUpdateRequest request) {
        try {
            dynamicConfigFactory.waitForInit();
        } catch (InterruptedException e) {
            // thread interrupted while waiting for DCF to be initialized
            throw new OpenSearchException("Error while waiting for DynamicConfigFactory to be initialized: ", e);
        }
        configurationRepository.reloadConfiguration(CType.fromStringValues(request.request.getConfigTypes()));
        backendRegistry.get().invalidateCache();
        return new ConfigUpdateNodeResponse(clusterService.localNode(), request.request.getConfigTypes(), null);
    }

to work, DCF latch needs to be counted down in the constructor.

@shikharj05
Copy link
Contributor Author

shikharj05 commented Sep 14, 2023

A possible solution could be to set the countdown latch to 0 in DCF constructor indicating that this is initialized. However, I'm not sure about the effects of doing so.

One of the side-effects I see is that the nodeOperation method might get stuck while waiting for DCF to be initialized. We need to understand different methods on how DCF gets initialized - for example, what's the behavior in this else block here-

} else {
LOGGER.info(
"Will not attempt to create index {} and default configs if they are absent. Will not perform background initialization",
securityIndex
);

From what I can see in the above condition, DCF is never initialized on NodeBootStrap and is only initialized (although partially) via TransportAction i.e. updates only. Should we keep this behavior consistent?

@DarshitChanpura
Copy link
Member

DarshitChanpura commented Sep 14, 2023

From what I can see in the above condition, DCF is never initialized on NodeBootStrap and is only initialized (although partially) via TransportAction i.e. updates only. Should we keep this behavior consistent?

If we keep as is then the latch await will be stuck in an infinite wait (or timeout) as DCF is never initialized before onChange call. This will only leave an option to keep the code as is for nodeOperation.

@DarshitChanpura DarshitChanpura removed their assignment Nov 17, 2023
@stephen-crawford stephen-crawford self-assigned this Nov 17, 2023
@stephen-crawford
Copy link
Contributor

@DarshitChanpura and @cwperks, were either one of you able to replicate Shikhar's issue? I am looking at this now but so far have not encountered it. I am going to try to run with docker one more time before swapping to a manual multinode setup in-case docker prevents this issue from arising. I was not able to directly kill the opensearch process with docker since the container host lacks most command line features--I bounced the container instead. So perhaps this is preventing the failure (though I want to say this should not be the case--opensearch restart is opensearch restart).

I figured I would check if either of you had a specific configuration setup that worked for you when you were looking at this. Thanks!

@cwperks
Copy link
Member

cwperks commented Nov 21, 2023

Another place this could be fixed is directly in DynamicConfigFactory.onChange which is called in both instances where the configCache can be invalidated/populated. The 2 instances are:

  1. On bootstrap where all config types are loaded into configCache
  2. On TransportConfigUpdateAction where one or more CTypes are invalidated/re-populated into configCache

In particular, there is an issue on this line where it expects the configCache to already be populated or return empty.

In the problematic case where TransportConfigUpdateAction occurs before the new node in the cluster is initialized, then configCache will be empty and this method returns SecurityDynamicConfiguration.empty()

When that returns empty (i.e. config.getImplementingClass() == null) then DynamicConfigFactory.onChange should be a noop and not set isInitialized to true.

@stephen-crawford
Copy link
Contributor

With @cwperks' help, I have a repo where you can see the issue occur by following these steps:

  1. Clone or copy the files from https://github.com/scrawfor99/OS-docker-demo
  2. Download the build/distributions/opensearch-security-2.11.1.0-SNAPSHOT.jar from here: stephen-crawford@fcdfb3c
  3. Change the paths in docker-compose to mount the security plugin artifact you just downloaded
  4. Spin up a cluster using docker-compose up
  5. Use docker desktop or the CLI to exec into the shell of one of the two containers
  6. Navigate into the plugins/opensearch-security/tools directory of the container and run
./securityadmin.sh -cd ../../../config/opensearch-security/ -icl -nhnv \
  -cacert ../../../config/root-ca.pem \
  -cert ../../../config/kirk.pem \
  -key ../../../config/kirk-key.pem
  1. Exit the docker container and use your terminal to run this command as a script:
while true;do curl -ai -u "admin:admin" -k -X PATCH https://localhost:9200/_plugins/_security/api/securityconfig -H 'Content-Type: application/json' -d'[{"op": "replace", "path": "/config/dynamic/authc/basic_internal_auth_domain/transport_enabled", "value": "true"}]'; done
  1. While that is executing open a new terminal and use it to restart the second docker container associated with port 9201 on your own machine (this should be node 2).

You should now see the error of the node coming back up but failing to connect to the cluster and the script will throw 500s.

@stephen-crawford
Copy link
Contributor

stephen-crawford commented Nov 22, 2023

For fixes, I have tried modifying DyanmicConfigFactory to have this:

 config = cr.getConfiguration(CType.CONFIG);
        if (config.getImplementingClass() == null) {
            return;
        }

at line 173 (the top of the onChange method.

Sadly, that does not actually work so going to try the above.

@cwperks
Copy link
Member

cwperks commented Nov 22, 2023

@scrawfor99 my bad, it would need to go in the else block here, but Shikhar's solution is more succinct and clearer.

@stephen-crawford
Copy link
Contributor

So with Shikhar's change, the nodes fail to come up even with running the securityadmin tool. They just go into 503s saying Security is not initialized. I am going to try @cwperks revised idea and if that does not work look into the locks discussed by DC.

@stephen-crawford
Copy link
Contributor

I have tried several versions of both @cwperks and @shikharj05's suggestions and not yet been able to fix the issue. Here is an example of the logs even with the changes.

23-11-28 12:13:05     at org.opensearch.security.ssl.transport.SecuritySSLRequestHandler.messageReceivedDecorate(SecuritySSLRequestHandler.java:221) [opensearch-security-2.11.0.0.jar:2.11.1.0-SNAPSHOT]
2023-11-28 12:13:05     at org.opensearch.security.transport.SecurityRequestHandler.messageReceivedDecorate(SecurityRequestHandler.java:317) [opensearch-security-2.11.0.0.jar:2.11.1.0-SNAPSHOT]
2023-11-28 12:13:05     at org.opensearch.security.ssl.transport.SecuritySSLRequestHandler.messageReceived(SecuritySSLRequestHandler.java:169) [opensearch-security-2.11.0.0.jar:2.11.1.0-SNAPSHOT]
2023-11-28 12:13:05     at org.opensearch.security.OpenSearchSecurityPlugin$6$1.messageReceived(OpenSearchSecurityPlugin.java:774) [opensearch-security-2.11.0.0.jar:2.11.1.0-SNAPSHOT]
2023-11-28 12:13:05     at org.opensearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:113) [opensearch-index-management-2.11.0.0.jar:2.11.0.0]
2023-11-28 12:13:05     at org.opensearch.performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived(PerformanceAnalyzerTransportRequestHandler.java:43) [opensearch-performance-analyzer-2.11.0.0.jar:2.11.0.0]
2023-11-28 12:13:05     at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:106) [opensearch-2.11.0.jar:2.11.0]
2023-11-28 12:13:05     at org.opensearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:471) [opensearch-2.11.0.jar:2.11.0]
2023-11-28 12:13:05     at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) [opensearch-2.11.0.jar:2.11.0]
2023-11-28 12:13:05     at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.11.0.jar:2.11.0]
2023-11-28 12:13:05     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
2023-11-28 12:13:05     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
2023-11-28 12:13:05     at java.lang.Thread.run(Thread.java:833) [?:?]
2023-11-28 12:13:05 [2023-11-28T17:13:05,962][INFO ][stdout                   ] [opensearch-node2] Reloading configuration
2023-11-28 12:13:05 [2023-11-28T17:13:05,962][INFO ][stdout                   ] [opensearch-node2] Reloading configuration for configTypes
2023-11-28 12:13:05 [2023-11-28T17:13:05,962][INFO ][stdout                   ] [opensearch-node2] Reloading Configuration 0 for config Types
2023-11-28 12:13:05 [2023-11-28T17:13:05,964][INFO ][stdout                   ] [opensearch-node2] Change detected at top of onChange in DCM
2023-11-28 12:13:05 [2023-11-28T17:13:05,964][INFO ][stdout                   ] [opensearch-node2] The CTYPE is requested in getConfiguration is: CONFIG
2023-11-28 12:13:05 [2023-11-28T17:13:05,964][INFO ][stdout                   ] [opensearch-node2] There was an entry in the configCache so returning a deep clone
2023-11-28 12:13:05 [2023-11-28T17:13:05,965][INFO ][stdout                   ] [opensearch-node2] The returned config at top of onChange is of implementing class class org.opensearch.security.securityconf.impl.v7.ConfigV7 and entries {config=Config [dynamic=Dynamic [filtered_alias_mode=warn, kibana=Kibana [multitenancy_enabled=true, private_tenant_enabled=true, default_tenant=, server_username=kibanaserver, opendistro_role=null, index=.kibana], http=Http [anonymous_auth_enabled=false, xff=Xff [enabled=false, internalProxies=192\.168\.0\.10|192\.168\.0\.11, remoteIpHeader=X-Forwarded-For]], authc=Authc [domains={jwt_auth_domain=AuthcDomain [http_enabled=false, transport_enabled=false, order=0, http_authenticator=HttpAuthenticator [challenge=false, type=jwt, config={signing_key=base64 encoded HMAC key or public RSA/ECDSA pem key, jwt_header=Authorization, jwt_clock_skew_tolerance_seconds=30}], authentication_backend=AuthcBackend [type=noop, config={}], description=Authenticate via Json Web Token], ldap=AuthcDomain [http_enabled=false, transport_enabled=false, order=5, http_authenticator=HttpAuthenticator [challenge=false, type=basic, config={}], authentication_backend=AuthcBackend [type=ldap, config={enable_ssl=false, enable_start_tls=false, enable_ssl_client_auth=false, verify_hostnames=true, hosts=[localhost:8389], userbase=ou=people,dc=example,dc=com, usersearch=(sAMAccountName={0})}], description=Authenticate via LDAP or Active Directory], basic_internal_auth_domain=AuthcDomain [http_enabled=true, transport_enabled=true, order=4, http_authenticator=HttpAuthenticator [challenge=true, type=basic, config={}], authentication_backend=AuthcBackend [type=intern, config={}], description=Authenticate via HTTP Basic against internal users database], proxy_auth_domain=AuthcDomain [http_enabled=false, transport_enabled=false, order=3, http_authenticator=HttpAuthenticator [challenge=false, type=proxy, config={user_header=x-proxy-user, roles_header=x-proxy-roles}], authentication_backend=AuthcBackend [type=noop, config={}], description=Authenticate via proxy], clientcert_auth_domain=AuthcDomain [http_enabled=false, transport_enabled=false, order=2, http_authenticator=HttpAuthenticator [challenge=false, type=clientcert, config={username_attribute=cn}], authentication_backend=AuthcBackend [type=noop, config={}], description=Authenticate via SSL client certificates], kerberos_auth_domain=AuthcDomain [http_enabled=false, transport_enabled=false, order=6, http_authenticator=HttpAuthenticator [challenge=true, type=kerberos, config={krb_debug=false, strip_realm_from_principal=true}], authentication_backend=AuthcBackend [type=noop, config={}], description=null]}], authz=Authz [domains={roles_from_another_ldap=AuthzDomain [http_enabled=false, transport_enabled=false, authorization_backend=AuthzBackend [type=ldap, config={}], description=Authorize via another Active Directory], roles_from_myldap=AuthzDomain [http_enabled=false, transport_enabled=false, authorization_backend=AuthzBackend [type=ldap, config={enable_ssl=false, enable_start_tls=false, enable_ssl_client_auth=false, verify_hostnames=true, hosts=[localhost:8389], rolebase=ou=groups,dc=example,dc=com, rolesearch=(member={0}), userrolename=disabled, rolename=cn, resolve_nested_roles=true, userbase=ou=people,dc=example,dc=com, usersearch=(uid={0})}], description=Authorize via LDAP or Active Directory]}]]]}
2023-11-28 12:13:05 [2023-11-28T17:13:05,966][INFO ][stdout                   ] [opensearch-node2] The CTYPE is requested in getConfiguration is: ACTIONGROUPS
2023-11-28 12:13:05 [2023-11-28T17:13:05,966][INFO ][stdout                   ] [opensearch-node2] Returning an empty Security Dynamic Configuration since there was no matching cache entry
2023-11-28 12:13:05 [2023-11-28T17:13:05,966][INFO ][stdout                   ] [opensearch-node2] The CTYPE is requested in getConfiguration is: INTERNALUSERS
2023-11-28 12:13:05 [2023-11-28T17:13:05,966][INFO ][stdout                   ] [opensearch-node2] Returning an empty Security Dynamic Configuration since there was no matching cache entry
2023-11-28 12:13:05 [2023-11-28T17:13:05,966][INFO ][stdout                   ] [opensearch-node2] The CTYPE is requested in getConfiguration is: ROLES
2023-11-28 12:13:05 [2023-11-28T17:13:05,966][INFO ][stdout                   ] [opensearch-node2] Returning an empty Security Dynamic Configuration since there was no matching cache entry
2023-11-28 12:13:05 [2023-11-28T17:13:05,966][INFO ][stdout                   ] [opensearch-node2] The CTYPE is requested in getConfiguration is: ROLESMAPPING
2023-11-28 12:13:05 [2023-11-28T17:13:05,966][INFO ][stdout                   ] [opensearch-node2] Returning an empty Security Dynamic Configuration since there was no matching cache entry
2023-11-28 12:13:05 [2023-11-28T17:13:05,966][INFO ][stdout                   ] [opensearch-node2] The CTYPE is requested in getConfiguration is: TENANTS
2023-11-28 12:13:05 [2023-11-28T17:13:05,966][INFO ][stdout                   ] [opensearch-node2] Returning an empty Security Dynamic Configuration since there was no matching cache entry
2023-11-28 12:13:05 [2023-11-28T17:13:05,966][INFO ][stdout                   ] [opensearch-node2] The CTYPE is requested in getConfiguration is: NODESDN
2023-11-28 12:13:05 [2023-11-28T17:13:05,966][INFO ][stdout                   ] [opensearch-node2] Returning an empty Security Dynamic Configuration since there was no matching cache entry
2023-11-28 12:13:05 [2023-11-28T17:13:05,966][INFO ][stdout                   ] [opensearch-node2] The CTYPE is requested in getConfiguration is: WHITELIST
2023-11-28 12:13:05 [2023-11-28T17:13:05,966][INFO ][stdout                   ] [opensearch-node2] Returning an empty Security Dynamic Configuration since there was no matching cache entry
2023-11-28 12:13:05 [2023-11-28T17:13:05,966][INFO ][stdout                   ] [opensearch-node2] The CTYPE is requested in getConfiguration is: ALLOWLIST
2023-11-28 12:13:05 [2023-11-28T17:13:05,966][INFO ][stdout                   ] [opensearch-node2] Returning an empty Security Dynamic Configuration since there was no matching cache entry
2023-11-28 12:13:05 [2023-11-28T17:13:05,966][INFO ][stdout                   ] [opensearch-node2] The CTYPE is requested in getConfiguration is: WHITELIST
2023-11-28 12:13:05 [2023-11-28T17:13:05,966][INFO ][stdout                   ] [opensearch-node2] Returning an empty Security Dynamic Configuration since there was no matching cache entry
2023-11-28 12:13:05 [2023-11-28T17:13:05,966][INFO ][stdout                   ] [opensearch-node2] The CTYPE is requested in getConfiguration is: ALLOWLIST
2023-11-28 12:13:05 [2023-11-28T17:13:05,966][INFO ][stdout                   ] [opensearch-node2] Returning an empty Security Dynamic Configuration since there was no matching cache entry
2023-11-28 12:13:05 [2023-11-28T17:13:05,966][INFO ][stdout                   ] [opensearch-node2] The CTYPE is requested in getConfiguration is: AUDIT
2023-11-28 12:13:05 [2023-11-28T17:13:05,966][INFO ][stdout                   ] [opensearch-node2] Returning an empty Security Dynamic Configuration since there was no matching cache entry
2023-11-28 12:13:05 [2023-11-28T17:13:05,966][INFO ][stdout                   ] [opensearch-node2] At top of if statement since implementing class is ConfigV7
2023-11-28 12:13:05 [2023-11-28T17:13:05,966][ERROR][o.o.s.c.ConfigurationRepository] [opensearch-node2] org.opensearch.security.securityconf.DynamicConfigFactory@26d1a8a2 listener errored: StaticResourceException[Unable to load static roles]
2023-11-28 12:13:05 org.opensearch.security.configuration.StaticResourceException: Unable to load static roles
2023-11-28 12:13:05     at org.opensearch.security.securityconf.DynamicConfigFactory.onChange(DynamicConfigFactory.java:255) ~[opensearch-security-2.11.0.0.jar:2.11.1.0-SNAPSHOT]
2023-11-28 12:13:05     at org.opensearch.security.configuration.ConfigurationRepository.notifyAboutChanges(ConfigurationRepository.java:414) [opensearch-security-2.11.0.0.jar:2.11.1.0-SNAPSHOT]
2023-11-28 12:13:05     at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration0(ConfigurationRepository.java:403) [opensearch-security-2.11.0.0.jar:2.11.1.0-SNAPSHOT]
2023-11-28 12:13:05     at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration(ConfigurationRepository.java:386) [opensearch-security-2.11.0.0.jar:2.11.1.0-SNAPSHOT]
2023-11-28 12:13:05     at org.opensearch.security.action.configupdate.TransportConfigUpdateAction.nodeOperation(TransportConfigUpdateAction.java:130) [opensearch-security-2.11.0.0.jar:2.11.1.0-SNAPSHOT]
2023-11-28 12:13:05     at org.opensearch.security.action.configupdate.TransportConfigUpdateAction.nodeOperation(TransportConfigUpdateAction.java:52) [opensearch-security-2.11.0.0.jar:2.11.1.0-SNAPSHOT]
2023-11-28 12:13:05     at org.opensearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:200) [opensearch-2.11.0.jar:2.11.0]
2023-11-28 12:13:05     at org.opensearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:328) [opensearch-2.11.0.jar:2.11.0]
2023-11-28 12:13:05     at org.opensearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:324) [opensearch-2.11.0.jar:2.11.0]
2023-11-28 12:13:05     at org.opensearch.security.ssl.transport.SecuritySSLRequestHandler.messageReceivedDecorate(SecuritySSLRequestHandler.java:221) [opensearch-security-2.11.0.0.jar:2.11.1.0-SNAPSHOT]
2023-11-28 12:13:05     at org.opensearch.security.transport.SecurityRequestHandler.messageReceivedDecorate(SecurityRequestHandler.java:317) [opensearch-security-2.11.0.0.jar:2.11.1.0-SNAPSHOT]
2023-11-28 12:13:05     at org.opensearch.security.ssl.transport.SecuritySSLRequestHandler.messageReceived(SecuritySSLRequestHandler.java:169) [opensearch-security-2.11.0.0.jar:2.11.1.0-SNAPSHOT]
2023-11-28 12:13:05     at org.opensearch.security.OpenSearchSecurityPlugin$6$1.messageReceived(OpenSearchSecurityPlugin.java:774) [opensearch-security-2.11.0.0.jar:2.11.1.0-SNAPSHOT]
2023-11-28 12:13:05     at org.opensearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:113) [opensearch-index-management-2.11.0.0.jar:2.11.0.0]
2023-11-28 12:13:05     at org.opensearch.performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived(PerformanceAnalyzerTransportRequestHandler.java:43) [opensearch-performance-analyzer-2.11.0.0.jar:2.11.0.0]
2023-11-28 12:13:05     at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:106) [opensearch-2.11.0.jar:2.11.0]
2023-11-28 12:13:05     at org.opensearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:471) [opensearch-2.11.0.jar:2.11.0]
2023-11-28 12:13:05     at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) [opensearch-2.11.0.jar:2.11.0]
2023-11-28 12:13:05     at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.11.0.jar:2.11.0]
2023-11-28 12:13:05     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
2023-11-28 12:13:05     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
2023-11-28 12:13:05     at java.lang.Thread.run(Thread.java:833) [?:?]
2023-11-28 12:13:06 [2023-11-28T17:13:06,316][INFO ][stdout                   ] [opensearch-node2] Reloading configuration
2023-11-28 12:13:06 [2023-11-28T17:13:06,317][INFO ][stdout                   ] [opensearch-node2] Reloading configuration for configTypes
2023-11-28 12:13:06 [2023-11-28T17:13:06,317][INFO ][stdout                   ] [opensearch-node2] Reloading Configuration 0 for config Types
2023-11-28 12:13:06 [2023-11-28T17:13:06,319][INFO ][stdout                   ] [opensearch-node2] Change detected at top of onChange in DCM
2023-11-28 12:13:06 [2023-11-28T17:13:06,319][INFO ][stdout                   ] [opensearch-node2] The CTYPE is requested in getConfiguration is: CONFIG
2023-11-28 12:13:06 [2023-11-28T17:13:06,319][INFO ][stdout                   ] [opensearch-node2] There was an entry in the configCache so returning a deep clone
2023-11-28 12:13:06 [2023-11-28T17:13:06,320][INFO ][stdout                   ] [opensearch-node2] The returned config at top of onChange is of implementing class class org.opensearch.security.securityconf.impl.v7.ConfigV7 and entries {config=Config [dynamic=Dynamic [filtered_alias_mode=warn, kibana=Kibana [multitenancy_enabled=true, private_tenant_enabled=true, default_tenant=, server_username=kibanaserver, opendistro_role=null, index=.kibana], http=Http [anonymous_auth_enabled=false, xff=Xff [enabled=false, internalProxies=192\.168\.0\.10|192\.168\.0\.11, remoteIpHeader=X-Forwarded-For]], authc=Authc [domains={jwt_auth_domain=AuthcDomain [http_enabled=false, transport_enabled=false, order=0, http_authenticator=HttpAuthenticator [challenge=false, type=jwt, config={signing_key=base64 encoded HMAC key or public RSA/ECDSA pem key, jwt_header=Authorization, jwt_clock_skew_tolerance_seconds=30}], authentication_backend=AuthcBackend [type=noop, config={}], description=Authenticate via Json Web Token], ldap=AuthcDomain [http_enabled=false, transport_enabled=false, order=5, http_authenticator=HttpAuthenticator [challenge=false, type=basic, config={}], authentication_backend=AuthcBackend [type=ldap, config={enable_ssl=false, enable_start_tls=false, enable_ssl_client_auth=false, verify_hostnames=true, hosts=[localhost:8389], userbase=ou=people,dc=example,dc=com, usersearch=(sAMAccountName={0})}], description=Authenticate via LDAP or Active Directory], basic_internal_auth_domain=AuthcDomain [http_enabled=true, transport_enabled=true, order=4, http_authenticator=HttpAuthenticator [challenge=true, type=basic, config={}], authentication_backend=AuthcBackend [type=intern, config={}], description=Authenticate via HTTP Basic against internal users database], proxy_auth_domain=AuthcDomain [http_enabled=false, transport_enabled=false, order=3, http_authenticator=HttpAuthenticator [challenge=false, type=proxy, config={user_header=x-proxy-user, roles_header=x-proxy-roles}], authentication_backend=AuthcBackend [type=noop, config={}], description=Authenticate via proxy], clientcert_auth_domain=AuthcDomain [http_enabled=false, transport_enabled=false, order=2, http_authenticator=HttpAuthenticator [challenge=false, type=clientcert, config={username_attribute=cn}], authentication_backend=AuthcBackend [type=noop, config={}], description=Authenticate via SSL client certificates], kerberos_auth_domain=AuthcDomain [http_enabled=false, transport_enabled=false, order=6, http_authenticator=HttpAuthenticator [challenge=true, type=kerberos, config={krb_debug=false, strip_realm_from_principal=true}], authentication_backend=AuthcBackend [type=noop, config={}], description=null]}], authz=Authz [domains={roles_from_another_ldap=AuthzDomain [http_enabled=false, transport_enabled=false, authorization_backend=AuthzBackend [type=ldap, config={}], description=Authorize via another Active Directory], roles_from_myldap=AuthzDomain [http_enabled=false, transport_enabled=false, authorization_backend=AuthzBackend [type=ldap, config={enable_ssl=false, enable_start_tls=false, enable_ssl_client_auth=false, verify_hostnames=true, hosts=[localhost:8389], rolebase=ou=groups,dc=example,dc=com, rolesearch=(member={0}), userrolename=disabled, rolename=cn, resolve_nested_roles=true, userbase=ou=people,dc=example,dc=com, usersearch=(uid={0})}], description=Authorize via LDAP or Active Directory]}]]]}
2023-11-28 12:13:06 [2023-11-28T17:13:06,320][INFO ][stdout                   ] [opensearch-node2] The CTYPE is requested in getConfiguration is: ACTIONGROUPS
2023-11-28 12:13:06 [2023-11-28T17:13:06,320][INFO ][stdout                   ] [opensearch-node2] Returning an empty Security Dynamic Configuration since there was no matching cache entry
2023-11-28 12:13:06 [2023-11-28T17:13:06,320][INFO ][stdout                   ] [opensearch-node2] The CTYPE is requested in getConfiguration is: INTERNALUSERS
2023-11-28 12:13:06 [2023-11-28T17:13:06,320][INFO ][stdout                   ] [opensearch-node2] Returning an empty Security Dynamic Configuration since there was no matching cache entry
2023-11-28 12:13:06 [2023-11-28T17:13:06,320][INFO ][stdout                   ] [opensearch-node2] The CTYPE is requested in getConfiguration is: ROLES
2023-11-28 12:13:06 [2023-11-28T17:13:06,320][INFO ][stdout                   ] [opensearch-node2] Returning an empty Security Dynamic Configuration since there was no matching cache entry
2023-11-28 12:13:06 [2023-11-28T17:13:06,320][INFO ][stdout                   ] [opensearch-node2] The CTYPE is requested in getConfiguration is: ROLESMAPPING
2023-11-28 12:13:06 [2023-11-28T17:13:06,320][INFO ][stdout                   ] [opensearch-node2] Returning an empty Security Dynamic Configuration since there was no matching cache entry
2023-11-28 12:13:06 [2023-11-28T17:13:06,320][INFO ][stdout                   ] [opensearch-node2] The CTYPE is requested in getConfiguration is: TENANTS
2023-11-28 12:13:06 [2023-11-28T17:13:06,320][INFO ][stdout                   ] [opensearch-node2] Returning an empty Security Dynamic Configuration since there was no matching cache entry
2023-11-28 12:13:06 [2023-11-28T17:13:06,320][INFO ][stdout                   ] [opensearch-node2] The CTYPE is requested in getConfiguration is: NODESDN
2023-11-28 12:13:06 [2023-11-28T17:13:06,320][INFO ][stdout                   ] [opensearch-node2] Returning an empty Security Dynamic Configuration since there was no matching cache entry
2023-11-28 12:13:06 [2023-11-28T17:13:06,320][INFO ][stdout                   ] [opensearch-node2] The CTYPE is requested in getConfiguration is: WHITELIST
2023-11-28 12:13:06 [2023-11-28T17:13:06,320][INFO ][stdout                   ] [opensearch-node2] Returning an empty Security Dynamic Configuration since there was no matching cache entry
2023-11-28 12:13:06 [2023-11-28T17:13:06,320][INFO ][stdout                   ] [opensearch-node2] The CTYPE is requested in getConfiguration is: ALLOWLIST
2023-11-28 12:13:06 [2023-11-28T17:13:06,320][INFO ][stdout                   ] [opensearch-node2] Returning an empty Security Dynamic Configuration since there was no matching cache entry
2023-11-28 12:13:06 [2023-11-28T17:13:06,320][INFO ][stdout                   ] [opensearch-node2] The CTYPE is requested in getConfiguration is: WHITELIST
2023-11-28 12:13:06 [2023-11-28T17:13:06,320][INFO ][stdout                   ] [opensearch-node2] Returning an empty Security Dynamic Configuration since there was no matching cache entry
2023-11-28 12:13:06 [2023-11-28T17:13:06,320][INFO ][stdout                   ] [opensearch-node2] The CTYPE is requested in getConfiguration is: ALLOWLIST
2023-11-28 12:13:06 [2023-11-28T17:13:06,321][INFO ][stdout                   ] [opensearch-node2] Returning an empty Security Dynamic Configuration since there was no matching cache entry
2023-11-28 12:13:06 [2023-11-28T17:13:06,321][INFO ][stdout                   ] [opensearch-node2] The CTYPE is requested in getConfiguration is: AUDIT
2023-11-28 12:13:06 [2023-11-28T17:13:06,321][INFO ][stdout                   ] [opensearch-node2] Returning an empty Security Dynamic Configuration since there was no matching cache entry
2023-11-28 12:13:06 [2023-11-28T17:13:06,321][INFO ][stdout                   ] [opensearch-node2] At top of if statement since implementing class is ConfigV7
2023-11-28 12:13:06 [2023-11-28T17:13:06,321][ERROR][o.o.s.c.ConfigurationRepository] [opensearch-node2] org.opensearch.security.securityconf.DynamicConfigFactory@26d1a8a2 listener errored: StaticResourceException[Unable to load static roles]
2023-11-28 12:13:06 org.opensearch.security.configuration.StaticResourceException: Unable to load static roles
2023-11-28 12:13:06     at org.opensearch.security.securityconf.DynamicConfigFactory.onChange(DynamicConfigFactory.java:255) ~[opensearch-security-2.11.0.0.jar:2.11.1.0-SNAPSHOT]
2023-11-28 12:13:06     at org.opensearch.security.configuration.ConfigurationRepository.notifyAboutChanges(ConfigurationRepository.java:414) [opensearch-security-2.11.0.0.jar:2.11.1.0-SNAPSHOT]
2023-11-28 12:13:06     at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration0(ConfigurationRepository.java:403) [opensearch-security-2.11.0.0.jar:2.11.1.0-SNAPSHOT]
2023-11-28 12:13:06     at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration(ConfigurationRepository.java:386) [opensearch-security-2.11.0.0.jar:2.11.1.0-SNAPSHOT]
2023-11-28 12:13:06     at org.opensearch.security.action.configupdate.TransportConfigUpdateAction.nodeOperation(TransportConfigUpdateAction.java:130) [opensearch-security-2.11.0.0.jar:2.11.1.0-SNAPSHOT]
2023-11-28 12:13:06     at org.opensearch.security.action.configupdate.TransportConfigUpdateAction.nodeOperation(TransportConfigUpdateAction.java:52) [opensearch-security-2.11.0.0.jar:2.11.1.0-SNAPSHOT]
2023-11-28 12:13:06     at org.opensearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:200) [opensearch-2.11.0.jar:2.11.0]
2023-11-28 12:13:06     at org.opensearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:328) [opensearch-2.11.0.jar:2.11.0]
2023-11-28 12:13:06     at org.opensearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:324) [opensearch-2.11.0.jar:2.11.0]
2023-11-28 12:13:06     at org.opensearch.security.ssl.transport.SecuritySSLRequestHandler.messageReceivedDecorate(SecuritySSLRequestHandler.java:221) [opensearch-security-2.11.0.0.jar:2.11.1.0-SNAPSHOT]
2023-11-28 12:13:06     at org.opensearch.security.transport.SecurityRequestHandler.messageReceivedDecorate(SecurityRequestHandler.java:317) [opensearch-security-2.11.0.0.jar:2.11.1.0-SNAPSHOT]
2023-11-28 12:13:06     at org.opensearch.security.ssl.transport.SecuritySSLRequestHandler.messageReceived(SecuritySSLRequestHandler.java:169) [opensearch-security-2.11.0.0.jar:2.11.1.0-SNAPSHOT]
2023-11-28 12:13:06     at org.opensearch.security.OpenSearchSecurityPlugin$6$1.messageReceived(OpenSearchSecurityPlugin.java:774) [opensearch-security-2.11.0.0.jar:2.11.1.0-SNAPSHOT]
2023-11-28 12:13:06     at org.opensearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:113) [opensearch-index-management-2.11.0.0.jar:2.11.0.0]
2023-11-28 12:13:06     at org.opensearch.performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived(PerformanceAnalyzerTransportRequestHandler.java:43) [opensearch-performance-analyzer-2.11.0.0.jar:2.11.0.0]
2023-11-28 12:13:06     at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:106) [opensearch-2.11.0.jar:2.11.0]
2023-11-28 12:13:06     at org.opensearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:471) [opensearch-2.11.0.jar:2.11.0]
2023-11-28 12:13:06     at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) [opensearch-2.11.0.jar:2.11.0]
2023-11-28 12:13:06     at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.11.0.jar:2.11.0]
2023-11-28 12:13:06     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
2023-11-28 12:13:06     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
2023-11-28 12:13:06     at java.lang.Thread.run(Thread.java:833) [?:?]

From what I have found the issue is that the statement:

 if (config.getImplementingClass() == null) {
                System.out.println("Config is null");
                return;
            }

will never trigger. Basically, it is never going to be null because even on the empty configurations created on line 63 of the SecurityDynamicConfiguration class, they are assumed to be v7. Going to look into figuring out a better fx

@cwperks
Copy link
Member

cwperks commented Dec 6, 2023

So with Shikhar's change, the nodes fail to come up even with running the securityadmin tool.

@scrawfor99 Does securityadmin produce an error? If so, what is the error message?

@cwperks
Copy link
Member

cwperks commented Dec 6, 2023

@scrawfor99 I opened a PR against my local fork here. There needs to be a different flag that TransportConfigUpdateAction can use that signals to it when its ok to start accepting transport requests. The securityadmin script can be used to initialize security and relies on running these TransportConfigUpdateActions to initialize security. Those use-cases need to be accounted for.

@stephen-crawford stephen-crawford removed their assignment Dec 20, 2023
@peternied peternied self-assigned this Jan 5, 2024
@peternied
Copy link
Member

@cwperks I'll see about getting this merged and backported

@cwperks
Copy link
Member

cwperks commented Jan 5, 2024

@peternied Thank you for your help on this issue :).

stephen-crawford pushed a commit that referenced this issue Jan 5, 2024
… has completed (#3810)

### Description

Introduces another variable on DynamicConfigFactory called
`bgThreadComplete` that behaves differently than the `initialized`
variable. `bgThreadComplete` is a flag that signals to
TransportConfigUpdateAction that it can start accepting updates.

There are 2 ways the security index can be created from scratch:

1. If `plugins.security.allow_default_init_securityindex` is set to
**true** it will create the security index and load all yaml files
2. If `plugins.security.allow_default_init_securityindex` is set to
**false**, the security index is not created on bootstrap and requires a
user to run securityadmin to initialize security. When securityadmin is
utilized, the cluster does depend on
[TransportConfigUpdateAction](https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/tools/SecurityAdmin.java#L975-L977)
to initialize security so there still needs to be an avenue where this
can update the config before `initialized` is set to **true**

This PR sets `bgThreadComplete` to **false** on node startup and
explicitly sets it to **true** once its ok for
TransportConfigUpdateAction to start accepting transport actions. In
case 2) up above, it can be set to **true** before DynamicConfigFactory
is `initialized` so that it can accept requests from securityadmin.

* Category (Enhancement, New feature, Bug fix, Test fix, Refactoring,
Maintenance, Documentation)

Bug fix

### Issues Resolved

- Resolves #3204

### Check List
- [X] New functionality includes testing
- [ ] New functionality has been documented
- [X] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

---------

Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Peter Nied <[email protected]>
Signed-off-by: Peter Nied <[email protected]>
Co-authored-by: Peter Nied <[email protected]>
Co-authored-by: Peter Nied <[email protected]>
cwperks added a commit to cwperks/security that referenced this issue Mar 12, 2024
… has completed (opensearch-project#3810)

Introduces another variable on DynamicConfigFactory called
`bgThreadComplete` that behaves differently than the `initialized`
variable. `bgThreadComplete` is a flag that signals to
TransportConfigUpdateAction that it can start accepting updates.

There are 2 ways the security index can be created from scratch:

1. If `plugins.security.allow_default_init_securityindex` is set to
**true** it will create the security index and load all yaml files
2. If `plugins.security.allow_default_init_securityindex` is set to
**false**, the security index is not created on bootstrap and requires a
user to run securityadmin to initialize security. When securityadmin is
utilized, the cluster does depend on
[TransportConfigUpdateAction](https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/tools/SecurityAdmin.java#L975-L977)
to initialize security so there still needs to be an avenue where this
can update the config before `initialized` is set to **true**

This PR sets `bgThreadComplete` to **false** on node startup and
explicitly sets it to **true** once its ok for
TransportConfigUpdateAction to start accepting transport actions. In
case 2) up above, it can be set to **true** before DynamicConfigFactory
is `initialized` so that it can accept requests from securityadmin.

* Category (Enhancement, New feature, Bug fix, Test fix, Refactoring,
Maintenance, Documentation)

Bug fix

- Resolves opensearch-project#3204

- [X] New functionality includes testing
- [ ] New functionality has been documented
- [X] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

---------

Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Peter Nied <[email protected]>
Signed-off-by: Peter Nied <[email protected]>
Co-authored-by: Peter Nied <[email protected]>
Co-authored-by: Peter Nied <[email protected]>
(cherry picked from commit 045d4ef)
dlin2028 pushed a commit to dlin2028/security that referenced this issue May 1, 2024
… has completed (opensearch-project#3810)

Introduces another variable on DynamicConfigFactory called
`bgThreadComplete` that behaves differently than the `initialized`
variable. `bgThreadComplete` is a flag that signals to
TransportConfigUpdateAction that it can start accepting updates.

There are 2 ways the security index can be created from scratch:

1. If `plugins.security.allow_default_init_securityindex` is set to
**true** it will create the security index and load all yaml files
2. If `plugins.security.allow_default_init_securityindex` is set to
**false**, the security index is not created on bootstrap and requires a
user to run securityadmin to initialize security. When securityadmin is
utilized, the cluster does depend on
[TransportConfigUpdateAction](https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/tools/SecurityAdmin.java#L975-L977)
to initialize security so there still needs to be an avenue where this
can update the config before `initialized` is set to **true**

This PR sets `bgThreadComplete` to **false** on node startup and
explicitly sets it to **true** once its ok for
TransportConfigUpdateAction to start accepting transport actions. In
case 2) up above, it can be set to **true** before DynamicConfigFactory
is `initialized` so that it can accept requests from securityadmin.

* Category (Enhancement, New feature, Bug fix, Test fix, Refactoring,
Maintenance, Documentation)

Bug fix

- Resolves opensearch-project#3204

- [X] New functionality includes testing
- [ ] New functionality has been documented
- [X] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

---------

Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Peter Nied <[email protected]>
Signed-off-by: Peter Nied <[email protected]>
Co-authored-by: Peter Nied <[email protected]>
Co-authored-by: Peter Nied <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable.
Projects
None yet
6 participants