Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enrich-kinesis #481

Merged
merged 32 commits into from
Mar 4, 2022
Merged

enrich-kinesis #481

merged 32 commits into from
Mar 4, 2022

Conversation

@benjben benjben changed the base branch from master to develop July 22, 2021 13:43
@benjben benjben changed the base branch from develop to feature/enrich_kinesis July 22, 2021 13:43
@benjben benjben force-pushed the feature/enrich_kinesis branch 2 times, most recently from 3223fea to 6438310 Compare August 16, 2021 14:24
@benjben benjben force-pushed the feature/enrich-kinesis branch from c166e5d to 7368210 Compare August 19, 2021 15:37
@benjben benjben force-pushed the feature/enrich-kinesis branch from bc33c83 to 982471a Compare September 2, 2021 12:30
@benjben benjben force-pushed the feature/enrich_kinesis branch from abd6c67 to 1408928 Compare September 2, 2021 15:24
@benjben benjben force-pushed the feature/enrich-kinesis branch from 982471a to 42c4153 Compare September 2, 2021 15:32
@benjben benjben force-pushed the feature/enrich-kinesis branch from 6c75729 to 9b8df00 Compare September 13, 2021 13:56
@benjben benjben force-pushed the feature/enrich-kinesis branch 3 times, most recently from 582cf4e to 4c17829 Compare October 5, 2021 16:37
Copy link
Contributor

@istreeter istreeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only given this a quick look, but it looks very good indeed 🎉

Copy link
Contributor

@spenes spenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me in general 👍 Left a few small comments in there.

@benjben benjben force-pushed the feature/enrich-kinesis branch 2 times, most recently from 2204cb7 to 4f76c85 Compare October 13, 2021 17:13
@chuwy chuwy self-requested a review October 24, 2021 16:43
Copy link
Contributor

@chuwy chuwy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed there's a few weeks old unsubmitted review from me, but most of the stuff is addressed already. Just one nit has left. Otherwise - looks great!

@@ -12,22 +12,32 @@
*/
package com.snowplowanalytics.snowplow.enrich.common.fs2.io
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

io is shadowing namespace of dependencies like io.circe. This confuses IDE when importing io.* in test.
I suggest changing it to inout or some other name that is not used as a top-level domain.

@benjben benjben force-pushed the feature/enrich_kinesis branch from 1a97585 to 7a6dcfb Compare November 19, 2021 14:00
@benjben benjben force-pushed the feature/enrich-kinesis branch 3 times, most recently from d68be1a to 5778418 Compare December 1, 2021 19:52
@@ -0,0 +1,55 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many collector payloads do you think it's possible to be holding in memory at the same time with these default settings? If you're processing, say, 5 kinesis shards per enrich instance.

I count 190,000 (admittedly a worse case scenario). And a further 10,000 more per additional shard.

Here's how:

Each polling batch is 10000 records. The fs2-aws library uses a buffer that holds 10 batches. Plus there can be extra batch per shard on this line waiting to enqueue.

Then you have the batch you're currently processing, plus three extra batches because of the prefetches here, here and here

This is just to explain why I've been questioning memory problems, especially for large collector payloads. Understanding the scope of this problem will be difficult, because we would need to explore regular processing, and also error-handling scenarios, e.g. kinesis problems, enrichment problems. The latter are harder to predict or simulate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting!

I agree with your way of counting. There has been only a little change, I removed the .prefetch to now use >1 for sink.concurrency here so now it's concurrency - 1 additional shards instead of three.

Because of the enqueuing for each shard I'm afraid the memory used by the app is unbounded. So we need to make sure that our auto-scaling is good enough so that new instances are spawned before the app gets a chance to go OutOfMemory. I'll look at our scaling strategy in Terraform. But I think that whatever the strategy, it would be great to use the average record size of a customer to determine the memory allocated to the JVM. /cc @oguzhanunlu @jbeemster

@benjben benjben force-pushed the feature/enrich-kinesis branch from 6dc77b5 to 6b6f05b Compare December 8, 2021 18:26
@benjben benjben force-pushed the feature/enrich-kinesis branch 2 times, most recently from e7dd10b to fed1b4d Compare December 10, 2021 13:11
@benjben benjben force-pushed the feature/enrich_kinesis branch from 7a6dcfb to 1f1c938 Compare January 3, 2022 18:44
@benjben benjben force-pushed the feature/enrich-kinesis branch from ea9b2db to d008234 Compare January 3, 2022 20:47
benjben and others added 26 commits March 4, 2022 13:39
@benjben benjben force-pushed the feature/enrich-kinesis branch from c0815ee to 78c3a84 Compare March 4, 2022 12:40
@benjben benjben changed the base branch from feature/enrich_kinesis to develop March 4, 2022 12:40
@benjben benjben merged commit ebaa79f into develop Mar 4, 2022
@benjben benjben deleted the feature/enrich-kinesis branch March 7, 2022 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants