NETOBSERV-255: FLP multiple deployments #78

OlivierCazade · 2022-04-07T08:31:35Z

This PR rework the FLP reconciler to two different ones:

a single reconciler in charge of managing a single FLP deployment
a parent reconciler in charge of managing the global resources (clusterRole) and managing the single reconcilers

api/v1alpha1/flowcollector_types.go

eranra · 2022-04-17T15:39:48Z

config/crd/bases/flows.netobserv.io_flowcollectors.yaml

+                required:
+                - address
+                - topic
+                type: object


For the area of using FLP as ingress ... we will have multiple FLP instances consuming from the same topic. Need to make sure that we have some configuration on the distribution that Kafka is creating when sending data into those FLP's going forward this is going to be very important after we implement the stateful connection tracking capabilities. There might be multiple different configurations based on Load/ number of consumers etc ... those might need to be exported and configurable to users.

Yes, the kafka configuration block is meant to change, but I would prefer to change it again once we have the connection tracking so we can test the different configurations and chose what we need to expose.
Would you be fine with that approach?

Yes, it doesn't have to be in this PR, agree ... maybe just open a ticket somewhere to remember that we need to improve that later.

docs/FlowCollector.md

jotak · 2022-04-29T08:52:01Z

api/v1alpha1/flowcollector_types.go

@@ -46,6 +46,10 @@ type FlowCollectorSpec struct {
 	// Loki contains settings related to the loki client
 	Loki FlowCollectorLoki `json:"loki,omitempty"`

+	// Kafka configurations, if empty the operator will deploy a all-in-one FLP
+	// +optional
+	Kafka *FlowCollectorKafka `json:"kafka,omitempty"`


We need to have an explicit flag to enable/disable kafka, rather than relying on nil pointer. This is because of how the CRD is handled in OLD/console form, it doesn't allow to remove a section. We discovered it the hard way for eBPF: https://issues.redhat.com/browse/NETOBSERV-319

Good to knows, thanks. I will add an enable flag.

Enable flag added

jotak · 2022-04-29T09:01:58Z

I haven't reviewed everything yet, will continue next week

jotak · 2022-05-05T15:19:03Z

controllers/flowlogspipeline/flp_objects.go

@@ -32,40 +32,54 @@ const (
 	startupPeriodSeconds    = 10
 )

+const (
+	ConfSingle           = "allInOne"
+	ConfKafkaIngestor    = "kafkaIngestor"


actually, don't we say Ingester ? :)

jotak · 2022-05-05T15:24:48Z

controllers/flowlogspipeline/flp_objects.go

+	ConfKafkaIngestor:    "-kingestor",
+	ConfKafkaTransformer: "-ktransform",


I would remove the "k"
"-ingester"
"-transformer" (missing -er)

jotak · 2022-05-05T15:36:52Z

controllers/flowlogspipeline/flp_objects.go

 	}
 }

 func buildClusterRole() *rbacv1.ClusterRole {
 	return &rbacv1.ClusterRole{
 		ObjectMeta: metav1.ObjectMeta{
 			Name:   constants.FLPName,
-			Labels: buildAppLabel(),
+			Labels: buildAppLabel(""),
 		},
 		Rules: []rbacv1.PolicyRule{{
 			APIGroups: []string{""},


On cluster roles, actually, they need to be split, they are not the same for ingester and for transformer.
Only the transformer needs to watch pods/services/nodes/replicasets
And I guess only the ingester needs to use hostnetwork (I'm actually not sure why we need that, @mariomac can you remind me?)

We can create two clusterrole objects (even in the case of a single FLP if it makes it simpler), and deal with cluster role attribution via clusterrolebindings

Also I don't know why we need horizontalpodautoscalers here: it's not FLP itself who create the HPA ?

As far as I know, hostNetwork: true is only required by the eBPF agent, so as long as it should be associated with another cluster role, the hostnetwork label here shouldn't be necessary.

Don't we need host network for daemonset deployment? So we can bind node port which is used by ovs?

yeah I retrieved it: #29
So that's an ingester thing.

it will be automatically bound to the host port 9999 unless you explicitly set another hostPort property.

I am very surprised about this.

Do you have documentation about it? What kind of cluster are you using for this test? The k8s official documentation does not mention surch automatic bind and at the oposite state the following:

Note that the containers are not using port 80 on the node, nor are there any special NAT rules to route traffic to the pod.

You are right. The documentation says that you need to specify a hostPort in the container (but still doesn't require to set hostNetwork): https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/#communicating-with-daemon-pods

So maybe this is an "undocumented" or just "undefined" behavior. So for more safety I'll replace the containerPort in my example daemonset by hostPort

Self-correction: can't find the behavior description in the official K8s documentation but I found some other sources describing the behavior:

"you can omit the hostPort (or set it to 0) while specifying a containerPort and your container automatically receives a port in the ephemeral port range for your container instance operating system and Docker version." https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_PortMapping.html

"The container port will be exposed to the external network" https://docs.bridgecrew.io/docs/bc_k8s_25

For more safety, I'd specify hostPort to the same value as containerPort but it still should work without setting hostNetwork: true

JIRA created, thanks for clearing that up! https://issues.redhat.com/browse/NETOBSERV-362

jotak · 2022-05-05T16:14:42Z

controllers/ovs/flowsconfig_reconciler.go

@@ -40,7 +40,7 @@ func NewFlowsConfigController(client reconcilers.ClientHelper,
 // Reconcile reconciles the status of the ovs-flows-config configmap with
 // the target FlowCollector ipfix section map
 func (c *FlowsConfigController) Reconcile(
-	ctx context.Context, target *flowsv1alpha1.FlowCollector) error {
+	ctx context.Context, target *flowsv1alpha1.FlowCollector, flpServiceName string) error {


I think it's possible to avoid OVS reconciliation here, if we keep the same service regardless if it's flp or flp-ingester, it would avoid the OVS reconfig.
As you prefer, if you think it can make things simpler.

Both deployment now use the same service, but now that the refactoring was done, I kept using an argument in the ovs reconciliation instead of relaying on a constant.

Is it fine for you?

jotak · 2022-05-05T16:15:46Z

controllers/reconcilers/namespaced_objects_manager.go

@@ -75,7 +75,7 @@ func (m *NamespacedObjectManager) CleanupNamespace(ctx context.Context) {
 		ref.SetNamespace(namespace)
 		log.Info("Deleting old "+obj.kind, "Namespace", namespace, "Name", obj.name)
 		err := m.client.Delete(ctx, ref)
-		if err != nil {
+		if client.IgnoreNotFound(err) != nil {


jotak

tested and looks good, a couple of comments mostly on naming and splitting cluster roles

OlivierCazade · 2022-05-13T14:35:38Z

I pushed a few new commits:

ingestor to ingester name correction
deployment suffix changes from -kingester to -ingestor and from -ktransform to -transformer (for the record, -ktransformer was hitting some length limit this why I move to -ktransform but -transformer also fit)
refactoring cluster role to split into two different roles, please note that both roles have for now the same permissions, this is because for now, both ingester and transformer do the same thing, some permissions will be removed once different kafka configurations will be done
flp-ingester and the single flp configuration now use the same service to avoid ovs unnecessary reconciliation

jotak · 2022-05-16T09:10:15Z

controllers/flowlogspipeline/flp_objects.go

@@ -358,24 +372,27 @@ func (b *builder) service(old *corev1.Service) *corev1.Service {
 	}
 	// In case we're updating an existing service, we need to build from the old one to keep immutable fields such as clusterIP
 	newService := old.DeepCopy()
+	newService.Spec.Selector = b.selector
+	newService.Spec.SessionAffinity = corev1.ServiceAffinityClientIP


oh actually I guess this session affinity should be needed even without your work on kafka right?

jotak · 2022-05-16T12:21:07Z

While testing I noticed the transformer pod was stuck in a pending state, and it turns out it's because it is trying to listen on node port 2055, which obviously it shouldn't (only the ingester should listen there).
But maybe that's something you plan to change in NETOBSERV-256 ?

jotak · 2022-05-16T12:23:16Z

Also I'm seeing a couple of errors in the logs, which aren't too concerning (on first run, it's trying to delete stuff that doesn't exist), but we should try to fix, maybe via another PR / JIRA (I'm not sure if there's an easy fix or if it needs some refactoring)

OlivierCazade · 2022-05-16T13:06:58Z

While testing I noticed the transformer pod was stuck in a pending state, and it turns out it's because it is trying to listen on node port 2055, which obviously it shouldn't (only the ingester should listen there).
But maybe that's something you plan to change in NETOBSERV-256 ?

Yes, the goal of this PR is to prepare the Kafka configuration by deploying FLP twice when Kafka is enabled, for now they both have the same configuration and only the ingestor is usefull.

With the next PR the tranformer will have its own confing and will not listen anymore to this port.

Also I'm seeing a couple of errors in the logs, which aren't too concerning (on first run, it's trying to delete stuff that doesn't exist), but we should try to fix, maybe via another PR / JIRA (I'm not sure if there's an easy fix or if it needs some refactoring)

I missed this errors, thanks, they are due to trying to call cleanNamespace without having a previous namespace (namespace field is empty at the first iteration). I don't see any easy fix either, are you fine addressing this in a new PR?

jotak · 2022-05-16T13:40:39Z

Also I'm seeing a couple of errors in the logs, which aren't too concerning (on first run, it's trying to delete stuff that doesn't exist), but we should try to fix, maybe via another PR / JIRA (I'm not sure if there's an easy fix or if it needs some refactoring)

I missed this errors, thanks, they are due to trying to call cleanNamespace without having a previous namespace (namespace field is empty at the first iteration). I don't see any easy fix either, are you fine addressing this in a new PR?

Sure we can do that, I'll create another ticket

/lgtm
thanks @OlivierCazade !

jotak · 2022-05-16T14:07:09Z

=> https://issues.redhat.com/browse/NETOBSERV-365

OlivierCazade · 2022-05-17T09:10:13Z

/approve

openshift-ci · 2022-05-17T09:10:20Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: OlivierCazade

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [OlivierCazade]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

OlivierCazade requested review from jotak and jpinsonneau April 7, 2022 08:31

openshift-ci bot added the needs-rebase label Apr 7, 2022

OlivierCazade force-pushed the cr-kafka-rework branch from eeae529 to d68b72a Compare April 7, 2022 08:33

openshift-ci bot removed the needs-rebase label Apr 7, 2022

OlivierCazade changed the title ~~FLP multiple deployments~~ NETOBSERV-255: FLP multiple deployments Apr 7, 2022

OlivierCazade force-pushed the cr-kafka-rework branch from d68b72a to 3b60d7e Compare April 7, 2022 09:25

jpinsonneau reviewed Apr 8, 2022

View reviewed changes

api/v1alpha1/flowcollector_types.go Outdated Show resolved Hide resolved

api/v1alpha1/flowcollector_types.go Outdated Show resolved Hide resolved

openshift-ci bot added the needs-rebase label Apr 13, 2022

OlivierCazade force-pushed the cr-kafka-rework branch from f94b8df to 66078d3 Compare April 15, 2022 09:52

openshift-ci bot removed the needs-rebase label Apr 15, 2022

eranra reviewed Apr 17, 2022

View reviewed changes

api/v1alpha1/flowcollector_types.go Outdated Show resolved Hide resolved

eranra reviewed Apr 17, 2022

View reviewed changes

docs/FlowCollector.md Show resolved Hide resolved

openshift-ci bot added the needs-rebase label Apr 22, 2022

eranra requested a review from KalmanMeth April 24, 2022 11:08

jotak reviewed Apr 29, 2022

View reviewed changes

OlivierCazade force-pushed the cr-kafka-rework branch from 66078d3 to 38edcb6 Compare May 2, 2022 15:48

openshift-ci bot removed the needs-rebase label May 2, 2022

jotak mentioned this pull request May 4, 2022

NETOBSERV-219 PoC Operators #59

Closed

jotak self-assigned this May 5, 2022

jotak reviewed May 5, 2022

View reviewed changes

OlivierCazade added 2 commits May 10, 2022 17:38

Refactoring to prepare kafka integration

1536e33

Added kafka config in the CR

384b535

OlivierCazade added 5 commits May 10, 2022 17:38

Moved kafka config in flowcollector spec

c62ca52

Renamed singleFLPReconciler to singleDeploymentreconciler for clarity

a0e237f

Added kafka enable flag instead of using pointer value

e934dbe

Ingestor to ingester correction

2b1c25e

Kafka suffix changes

1541721

mariomac mentioned this pull request May 12, 2022

NETOBSERV-220 implement ovn-kubernetes reconciler #97

Merged

Refactoring to split cluster role in two (ingestor and transformer)

3edd5be

OlivierCazade force-pushed the cr-kafka-rework branch 2 times, most recently from e86d190 to 83e1672 Compare May 13, 2022 07:40

flp-ingestor and flp-single now use the same service

f534125

OlivierCazade force-pushed the cr-kafka-rework branch from 83e1672 to f534125 Compare May 13, 2022 14:36

jotak reviewed May 16, 2022

View reviewed changes

openshift-ci bot added the lgtm label May 16, 2022

openshift-ci bot added the approved label May 17, 2022

openshift-merge-robot merged commit 3189c39 into netobserv:main May 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NETOBSERV-255: FLP multiple deployments #78

NETOBSERV-255: FLP multiple deployments #78

OlivierCazade commented Apr 7, 2022

eranra Apr 17, 2022

OlivierCazade May 2, 2022

eranra May 2, 2022

jotak Apr 29, 2022

OlivierCazade May 2, 2022

OlivierCazade May 3, 2022

jotak commented Apr 29, 2022

jotak May 5, 2022

jotak May 5, 2022

jotak May 5, 2022

jotak May 5, 2022

mariomac May 6, 2022

OlivierCazade May 9, 2022 •

edited

Loading

jotak May 9, 2022

OlivierCazade May 12, 2022 •

edited

Loading

mariomac May 12, 2022

mariomac May 12, 2022

mariomac May 12, 2022

jotak May 16, 2022

jotak May 5, 2022

OlivierCazade May 13, 2022

jotak May 5, 2022

jotak left a comment

OlivierCazade commented May 13, 2022

jotak May 16, 2022

jotak commented May 16, 2022 •

edited by openshift-ci bot

Loading

jotak commented May 16, 2022

OlivierCazade commented May 16, 2022

jotak commented May 16, 2022

jotak commented May 16, 2022

OlivierCazade commented May 17, 2022

openshift-ci bot commented May 17, 2022

		ConfKafkaIngestor: "-kingestor",
		ConfKafkaTransformer: "-ktransform",

NETOBSERV-255: FLP multiple deployments #78

NETOBSERV-255: FLP multiple deployments #78

Conversation

OlivierCazade commented Apr 7, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jotak commented Apr 29, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OlivierCazade May 9, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OlivierCazade May 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jotak left a comment

Choose a reason for hiding this comment

OlivierCazade commented May 13, 2022

Choose a reason for hiding this comment

jotak commented May 16, 2022 • edited by openshift-ci bot Loading

jotak commented May 16, 2022

OlivierCazade commented May 16, 2022

jotak commented May 16, 2022

jotak commented May 16, 2022

OlivierCazade commented May 17, 2022

openshift-ci bot commented May 17, 2022

OlivierCazade May 9, 2022 •

edited

Loading

OlivierCazade May 12, 2022 •

edited

Loading

jotak commented May 16, 2022 •

edited by openshift-ci bot

Loading