NETOBSERV-578: removing arrays from pipeline interfaces #319

mariomac · 2022-10-03T10:23:15Z

This PR removes the batching logic in the Kafka ingester and processes flows 1 by 1 on each pipeline stage.

While the CPU usage doesn't seem to noticeably decrease:

It greatly improves memory consumption:

(as well as simplifies some parts of the code)

:

…nge-interfaces

github-actions · 2022-10-03T10:25:46Z

New image: ["quay.io/netobserv/flowlogs-pipeline:1e0449c"]. It will expire after two weeks.

mariomac · 2022-10-03T10:27:17Z

pkg/pipeline/transform/transform_network.go

 	// copy input entry before transform to avoid alteration on parallel stages
 	outputEntry := inputEntry.Copy()

 	// TODO: for efficiency and maintainability, maybe each case in the switch below should be an individual implementation of Transformer
 	for _, rule := range n.Rules {
 		switch rule.Type {
-		case api.TransformNetworkOperationName("AddRegExIf"):


Logic behind this change is that the result of this invocation is known at runtime while the value of api.OpAddRegexIf is known at compile time.

That allows a much more efficient implementation of switch, especially important when this is invoked for each flow, for each rule.

mariomac · 2022-10-03T10:31:33Z

pkg/pipeline/ingest/metrics.go

-	batchSizeSummary = operational.DefineMetric(
+	flowsProcessed = operational.DefineMetric(
 		"ingest_flows_processed", // This is intentionally named to emphasize its utility for flows counting, despite being a batch size distribution
-		"Provides number of flows processed, batches processed, and batch size stats (in number of flows)",
-		operational.TypeSummary,
+		"Provides number of flows processed",
+		operational.TypeCounter,
 		"stage",
 	)
-	batchSizeBytesSummary = operational.DefineMetric(
-		"ingest_batch_size_bytes",
-		"Ingested batch size distribution, in bytes",
-		operational.TypeSummary,


@jotak This PR reverts part of your previous work. Since we don't batch anymore at ingest time.

Another option would be keep this batchSize summary and apply it at the GRPC interceptor, so we know the batch sizes from the client. For the kafka use case, it would happen that the summary reports that every batch is size 1.

(other option, just keep different metrics: flowsProcessed, ingestBytes for kafka ingest and batchSizeSummary, batchSizeBytesSummary for GRPC ingest.

The idea behind these metrics was to get stats like knowing the effective size of agent's batches. As you said, it doesn't work with kafka because kafkago hides this detail, but that's still interesting for GRPC.
Since it's measuring ingesters input and not their output, I think removing the batches on output shouldn't affect them..

About using different metric names with or without kafka, the drawback is that it's harder to consume: we need to create more prom queries / grafana dashboards. So we need anyway to do a compromise somewhere. Personally I'd like to stick with the previous metrics.

Sounds reasonable. Bringing them back.

mariomac · 2022-10-03T14:11:53Z

It seems this PR is not possible unless we reimplement the connection tracking stage.

@ronensc do you have an estimation about how difficult would be to reimplement the function:

func (ct *conntrackImpl) Extract(flowLogs []config.GenericMap) []config.GenericMap

To:

func (ct *conntrackImpl) Extract(flowLogs config.GenericMap) config.GenericMap

Currently it seems it's not able to detect endConnections

KalmanMeth · 2022-10-03T14:30:24Z

Not only the connection tracking, but also the aggregation and timebased-topk would need changes. In each of these we save some state and do some computation after batches of flow logs. If now the flow logs come one a time, we have to check that our implementation only runs every so often and not after each call to Extract().

mariomac · 2022-10-04T13:16:40Z

@jotak @ronensc @KalmanMeth @eranra reopening this PR as I found a compromise solution. I added an intermediate batcher for these parts of the API that really need to work in batches, like the connection tracking and the timebased-topk.

This would improve overall performance and memory but specially in those FLP configurations that do not use extractors. At the same time, this gives us more room for a later reimplementation of extractors to avoid batching.

mariomac · 2022-10-05T11:00:54Z

pkg/pipeline/pipeline_builder.go

-			}
+			// TODO: replace batcher by rewriting the different extractor implementations
+			// to keep the status while processing flows one by one
+			utils.Batcher(utils.ExitChannel(), b.batchMaxLen, b.batchTimeout, in,


Didn't provide unit tests for Batcher as it is already tested as an internal component in the conntrack integration tests.

Nice idea to build the Batcher in this way for all the stages that need it!

jpinsonneau

LGTM, currently deployed and working as expected on LLC cluster

I'll let it run to compare after some hours 👍
Thanks @mariomac

Mario Macias added 2 commits October 3, 2022 11:46

removing arrays from pipeline interfaces

e39615d

:

Merge branch 'main' of github.com:netobserv/flowlogs2metrics into cha…

d6eda94

…nge-interfaces

mariomac added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Oct 3, 2022

mariomac requested review from jotak, eranra, KalmanMeth and OlivierCazade October 3, 2022 10:23

mariomac commented Oct 3, 2022

View reviewed changes

fixed linting

a51aeea

github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Oct 3, 2022

Mario Macias added 2 commits October 3, 2022 14:09

reverted batchSize and batchSizeBytes back

289527d

fixed linting

47d544a

mariomac closed this Oct 3, 2022

adding batcher only for stages that really need batching

120cc6a

mariomac reopened this Oct 4, 2022

Mario Macias added 8 commits October 4, 2022 15:21

make linter happy

b71d391

fix race condition

4f0020c

fix simple pipeline tests

41c1faa

fix again tests

768b352

increased times in slow test

6836559

synchronized tests to make them pass in CI

ee58715

try to fix e2e test

33be923

fix data race in test

85a5bbd

mariomac mentioned this pull request Oct 5, 2022

Optimize Network Transformer switch #320

Closed

mariomac commented Oct 5, 2022

View reviewed changes

jpinsonneau approved these changes Oct 5, 2022

View reviewed changes

mariomac merged commit 364c6c9 into netobserv:main Oct 5, 2022

mariomac deleted the change-interfaces branch October 5, 2022 12:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NETOBSERV-578: removing arrays from pipeline interfaces #319

NETOBSERV-578: removing arrays from pipeline interfaces #319

mariomac commented Oct 3, 2022 •

edited

Loading

github-actions bot commented Oct 3, 2022

mariomac Oct 3, 2022

mariomac Oct 3, 2022

mariomac Oct 3, 2022

jotak Oct 3, 2022

mariomac Oct 3, 2022

mariomac commented Oct 3, 2022

KalmanMeth commented Oct 3, 2022

mariomac commented Oct 4, 2022

mariomac Oct 5, 2022

KalmanMeth Oct 6, 2022

jpinsonneau left a comment

NETOBSERV-578: removing arrays from pipeline interfaces #319

NETOBSERV-578: removing arrays from pipeline interfaces #319

Conversation

mariomac commented Oct 3, 2022 • edited Loading

github-actions bot commented Oct 3, 2022

mariomac Oct 3, 2022

Choose a reason for hiding this comment

mariomac Oct 3, 2022

Choose a reason for hiding this comment

mariomac Oct 3, 2022

Choose a reason for hiding this comment

jotak Oct 3, 2022

Choose a reason for hiding this comment

mariomac Oct 3, 2022

Choose a reason for hiding this comment

mariomac commented Oct 3, 2022

KalmanMeth commented Oct 3, 2022

mariomac commented Oct 4, 2022

mariomac Oct 5, 2022

Choose a reason for hiding this comment

KalmanMeth Oct 6, 2022

Choose a reason for hiding this comment

jpinsonneau left a comment

Choose a reason for hiding this comment

mariomac commented Oct 3, 2022 •

edited

Loading