Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some settings to specify a remote Openshift cluster URL #111

Closed
wants to merge 1 commit into from

Conversation

barkbay
Copy link

@barkbay barkbay commented Nov 2, 2017

Hi,

Our ElasticSearch cluster is hosted on an remote K8S cluster. Therefore we have to be able to set the URL of the Openshift cluster when the plugin want to check the authorizations of a user.
This PR allows you to specify a remote Openshift cluster URL while allowing the Kubernetes plugin to discover the topology of the ElasticSearch cluster the usual way.
I would love to hear your thoughts.

Thanks

@@ -64,6 +67,9 @@ public OpenshiftRequestContextFactory(final Settings settings, final RequestUtil
ConfigurationSettings.DEFAULT_OPENSHIFT_OPS_PROJECTS);
this.kibanaPrefix = settings.get(ConfigurationSettings.KIBANA_CONFIG_INDEX_NAME, ConfigurationSettings.DEFAULT_USER_PROFILE_PREFIX);
this.kibanaIndexMode = settings.get(ConfigurationSettings.OPENSHIFT_KIBANA_INDEX_MODE, UNIQUE);
this.openshiftMasterUrl = settings.get(ConfigurationSettings.OPENSHIFT_MASTER, ConfigurationSettings.DEFAULT_MASTER);
this.openshiftCaPath = settings.get(ConfigurationSettings.OPENSHIFT_CA_PATH, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the default CA path?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no default CA path. If this parameter is not set by the user then the CA detected or set by the K8S plugin is not overwritten.

if (openshiftCaPath != null) {
builder.withCaCertFile(openshiftCaPath);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there additional types of Exceptions which can be thrown and need to be handled? e.g. if the user specifies an incorrect openshiftCaPath will the api throw a FileNotFound or PermissionDenied or some other type of exception? I want to make sure e.g. if the user made a typo they can easily identify the problem, or if the user did not set the right file permission, etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the CA path is incorrect the following exception is thrown :

io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
	at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:57)
	at io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:137)
	at io.fabric8.kubernetes.client.BaseClient.<init>(BaseClient.java:41)
	at io.fabric8.openshift.client.DefaultOpenShiftClient.<init>(DefaultOpenShiftClient.java:174)
	at io.fabric8.openshift.client.DefaultOpenShiftClient.<init>(DefaultOpenShiftClient.java:170)
	at io.fabric8.elasticsearch.plugin.OpenshiftClientFactory.create(OpenshiftClientFactory.java:50)
[....]
Caused by: java.io.FileNotFoundException: /tmp/junit2122595593726896985/ca.crt.does_not_exist (No such file or directory)
	at java.io.FileInputStream.open0(Native Method)
	at java.io.FileInputStream.open(FileInputStream.java:195)
	at java.io.FileInputStream.<init>(FileInputStream.java:138)
	at java.io.FileInputStream.<init>(FileInputStream.java:93)
	at io.fabric8.kubernetes.client.internal.CertUtils.getInputStreamFromDataOrFile(CertUtils.java:55)
	at io.fabric8.kubernetes.client.internal.CertUtils.createTrustStore(CertUtils.java:61)
	at io.fabric8.kubernetes.client.internal.SSLUtils.trustManagers(SSLUtils.java:113)
	at io.fabric8.kubernetes.client.internal.SSLUtils.trustManagers(SSLUtils.java:107)
	at io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:68)
	... 29 more

In case of bad permissions :

io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
	at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:57)
	at io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:137)
	at io.fabric8.kubernetes.client.BaseClient.<init>(BaseClient.java:41)
	at io.fabric8.openshift.client.DefaultOpenShiftClient.<init>(DefaultOpenShiftClient.java:174)
	at io.fabric8.openshift.client.DefaultOpenShiftClient.<init>(DefaultOpenShiftClient.java:170)
	at io.fabric8.elasticsearch.plugin.OpenshiftClientFactory.create(OpenshiftClientFactory.java:50)
[...]
Caused by: java.io.FileNotFoundException: /tmp/you_cant_read_it.crt (Permission denied)
	at java.io.FileInputStream.open0(Native Method)
	at java.io.FileInputStream.open(FileInputStream.java:195)
	at java.io.FileInputStream.<init>(FileInputStream.java:138)
	at java.io.FileInputStream.<init>(FileInputStream.java:93)
	at io.fabric8.kubernetes.client.internal.CertUtils.getInputStreamFromDataOrFile(CertUtils.java:55)
	at io.fabric8.kubernetes.client.internal.CertUtils.createTrustStore(CertUtils.java:61)
	at io.fabric8.kubernetes.client.internal.SSLUtils.trustManagers(SSLUtils.java:113)
	at io.fabric8.kubernetes.client.internal.SSLUtils.trustManagers(SSLUtils.java:107)
	at io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:68)
	... 29 more

@richm
Copy link
Contributor

richm commented Nov 2, 2017

Please also add some unit tests for this new feature

@jcantrill
Copy link
Collaborator

I'm not certain this change is required at all since you can alter all these values using env variables [1]. I would go further in that if we are going to accept these changes to prefer using if checks to build up the config instead of providing defaults and setting them. I think we should prefer the client defaults as we do now instead of redefining them.

[1] https://github.com/fabric8io/kubernetes-client

@barkbay
Copy link
Author

barkbay commented Nov 3, 2017

Not sure we can use env variables since the K8S client is already used inside a local Kubernetes cluster.
In other words we have to manage two different K8S clusters within the same JVM.
How handle this use case with env variables ?

@barkbay
Copy link
Author

barkbay commented Nov 4, 2017

I'm working on a new PR that will add some unit tests and preserve the client defaults instead of redefining them : https://github.com/fabric8io/openshift-elasticsearch-plugin/compare/master...barkbay:external_openshift.diff

I will update this PR when I have got time to run some e2e integration tests against our clusters.

@jcantrill
Copy link
Collaborator

@barkbay Please clarify the usecase you have:

  1. Elasticsearch running on a Kubernetes Cluster
  2. Logs from an Openshift cluster are written to ES on Kubernetes Cluster

This LGTM other then what appears to be an unexpected deployment topology. Please also rebase and squash the commits

@barkbay
Copy link
Author

barkbay commented Nov 12, 2017

A picture is worth a thousand words ;)
Architecture
Today our ElasticSearch cluster is deployed with Ansible but we are working on a new deployment using recent Kubernetes features like statefulset and local volumes.
I will rebase and squash the commits.

@barkbay barkbay force-pushed the external_openshift_rc1 branch from e135ef0 to 0156dec Compare November 12, 2017 13:06
@jcantrill
Copy link
Collaborator

@barkbay FYI, we have been slow to adopt statefulsets due do internal discussions regarding their ability to correctly support the ES usecase.
I will put together a PR against openshift/origin-aggregated-logging to run these changes through our CI test

@barkbay
Copy link
Author

barkbay commented Nov 13, 2017

@jcantrill Could you tell me a little bit more about the drawbacks of deploying ES with statefulsets ?
Thank you for the PR.

@portante
Copy link

@barkbay, from https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/, the section on "Deployment and Scaling Guarantees":


  • For a StatefulSet with N replicas, when Pods are being deployed, they are created sequentially, in order from {0..N-1}.
  • When Pods are being deleted, they are terminated in reverse order, from {N-1..0}.
  • Before a scaling operation is applied to a Pod, all of its predecessors must be Running and Ready.
  • Before a Pod is terminated, all of its successors must be completely shutdown.

The StatefulSet should not specify a pod.Spec.TerminationGracePeriodSeconds of 0. This practice is unsafe and strongly discouraged. For further explanation, please refer to force deleting StatefulSet Pods.

When the nginx example above is created, three Pods will be deployed in the order web-0, web-1, web-2. web-1 will not be deployed before web-0 is Running and Ready, and web-2 will not be deployed until web-1 is Running and Ready. If web-0 should fail, after web-1 is Running and Ready, but before web-2 is launched, web-2 will not be launched until web-0 is successfully relaunched and becomes Running and Ready.

If a user were to scale the deployed example by patching the StatefulSet such that replicas=1, web-2 would be terminated first. web-1 would not be terminated until web-2 is fully shutdown and deleted. If web-0 were to fail after web-2 has been terminated and is completely shutdown, but prior to web-1’s termination, web-1 would not be terminated until web-0 is Running and Ready.


Pods are created, terminated in a specific order. Which means, if a given pod fails to come up, of fails to terminate, then we have a problem.

How do statefulsets handle the case where we have to apply maintenance to one member out of order? Disk failure, etc.?

Do statefulsets allow dynamic scaling? For example oc scale dc es --replicas=3? If we consider the case where the replica set had 5 to start with, isn't this very easy to drop state quickly?

For Elasticsearch, how do statefulsets know when it is okay to remove those two pods? When data is properly replicated? How have we told Elasticsearch that the expected cluster size is now 3 instead of 5?

@smarterclayton

@barkbay
Copy link
Author

barkbay commented Nov 14, 2017

Pods are created, terminated in a specific order because of the default OrderedReady pod management policy.
With Parallel pod management the second pod does not wait for the first to be ready :

Parallel Pod Management
Parallel pod management tells the StatefulSet controller to launch or terminate all Pods in parallel, and to not wait for Pods to become Running and Ready or completely terminated prior to launching or terminating another Pod.

With a replica of two and OrderedReady pod management policy, the second pod is not created :

{17-11-14 8:11}124-100:~ root# kubectl get pods -n elastic1 -o wide                                                                                                                                                                                 
NAME                         READY     STATUS    RESTARTS   AGE       IP               NODE 
[...]
es-data-0                    0/1       Pending   0          2m        <none>           <none> 

With Parallel pod management policy :

{17-11-14 8:35}124-100:~ root# kubectl get pods -n elastic1 -o wide                                                                                                                                                                                
NAME                         READY     STATUS    RESTARTS   AGE       IP               NODE                              
[...]                        
es-data-0                    0/1       Pending   0          14m       <none>           <none>                            
es-data-1                    1/1       Running   0          14m       10.233.109.236   124-103

124-100:~ root# kubectl uncordon 124-101
node "124-101" uncordoned

124-100:~ root# kubectl get pods -n elastic1 -o wide
NAME                         READY     STATUS            RESTARTS   AGE       IP               NODE                      
[...]                 
es-data-0                    0/1       PodInitializing   0          18m       10.233.98.152    124-101                   
es-data-1                    1/1       Running           0          18m       10.233.109.236   124-103 

For Elasticsearch, how do statefulsets know when it is okay to remove those two pods?

Nothing specific to ES or K8S here. IMHO dropping more than one node at the same time is a very bad idea with any distributed storage system (ES, Cassandra, Zookeeper, Ceph).

@barkbay barkbay force-pushed the external_openshift_rc1 branch 3 times, most recently from f8fe852 to c6b98f5 Compare July 12, 2018 09:17
@barkbay barkbay force-pushed the external_openshift_rc1 branch from c6b98f5 to fc96d27 Compare October 9, 2018 14:11
@barkbay
Copy link
Author

barkbay commented Oct 16, 2018

Hi,

Please, could you have an other look at this PR ? We have to maintain our own build pipeline for this very small patch, we would be glad to have it merged into the plugin source code.

Tanks.

@jcantrill
Copy link
Collaborator

/ok-to-test

@fusesource-ci
Copy link
Contributor

License check failed: run mvn -N license:format to update all licenses, commit, squash & force push please.

Copy link
Collaborator

@jcantrill jcantrill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nits

@@ -25,6 +25,7 @@
import org.apache.logging.log4j.Logger;
import org.elasticsearch.ElasticsearchException;
import org.elasticsearch.ElasticsearchSecurityException;
import org.elasticsearch.common.inject.Inject;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove. DI is manual in lieu of using a library to wire dependencies

@@ -89,6 +89,10 @@
static final String OPENSHIFT_ACL_ROLE_STRATEGY = "openshift.acl.role_strategy";
static final String DEFAULT_ACL_ROLE_STRATEGY = "user";

static final String OPENSHIFT_MASTER = "openshift.master";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please set to 'openshift.master.url' to be consistent with the config setting

@fusesource-ci
Copy link
Contributor

License check failed: run mvn -N license:format to update all licenses, commit, squash & force push please.

@barkbay barkbay force-pushed the external_openshift_rc1 branch from 90a6109 to 271132c Compare December 7, 2018 13:28
@fusesource-ci
Copy link
Contributor

Tests failed.

@barkbay barkbay force-pushed the external_openshift_rc1 branch from 271132c to cdb07a7 Compare December 7, 2018 13:34
@barkbay barkbay force-pushed the external_openshift_rc1 branch from cdb07a7 to ab1178a Compare December 7, 2018 17:39
@fusesource-ci
Copy link
Contributor

Tests failed.

@barkbay barkbay force-pushed the external_openshift_rc1 branch from ab1178a to 529c4c1 Compare December 7, 2018 17:42
@@ -64,6 +67,15 @@ public PluginSettings(final Settings settings) {
this.opsIndexPatterns = new HashSet<String>(Arrays.asList(settings.getAsArray(OPENSHIFT_KIBANA_OPS_INDEX_PATTERNS, DEFAULT_KIBANA_OPS_INDEX_PATTERNS)));
this.expireInMillis = settings.getAsLong(OPENSHIFT_ACL_EXPIRE_IN_MILLIS, new Long(1000 * 60));

this.masterUrl = settings.get(OPENSHIFT_MASTER);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

settings.get returns null if there is no such setting?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@barkbay barkbay force-pushed the external_openshift_rc1 branch from 529c4c1 to 627d18c Compare December 10, 2018 10:35
@barkbay barkbay force-pushed the external_openshift_rc1 branch from 627d18c to 03168f5 Compare December 10, 2018 10:48
@barkbay
Copy link
Author

barkbay commented Dec 12, 2018

Do you think that this PR can be merged for the next release ?

Copy link
Contributor

@richm richm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - @jcantrill what say you?

@jcantrill
Copy link
Collaborator

[merge]

@fusesource-ci
Copy link
Contributor

Merge failed.

@jcantrill
Copy link
Collaborator

@barkbay please look at the merge test failures. I'm not certain why this would not be caught in the test job

@jcantrill
Copy link
Collaborator

closed as stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants