Retroactive User Recognition at scale without redis #565

viggin543 · 2021-09-29T12:32:44Z

Problem

According to docs the current implementation stores all anonymous events in Redis
This has two significant downsides:

A single point of failure ( One redis writer has a limit of how much traffic it can handle )
huge ram cost

As you pointed out in the documentation:

REDIS RAM = 1 Event Size * Events per month

1 event ~= 2 Kbyte
10 000 000 events per month ~= 20GB RAM per month

10M events / month is really not that much.
its ~ 231 events per minute or 3.4 events per second.

Large scale tracking load is mesured in thousends of events per second.

At this point redis ram consumption will explode.

Solution

Implement Retroactive User Recognition is a background task, no need to store those events in a hot cache like redis.
Instead they can be stored in any cloud storage as files ( under a path containing the user anonymous id )
This solves the ram consumption problem.
And once a user is identified
a background process can update the records according with the identified user.

In this scenario redis will only contain the coordinating info ( and can be updated asyncronuesly )

The text was updated successfully, but these errors were encountered:

vklimontovich · 2021-09-30T00:02:33Z

The whole user recognition flow relies on the fact that we could pull user by anonymous_id relatively quickly. Besides redis following storages will work:

S3. Unlike regular file system, they can handle dirs with unlimited files
LevelDB (or similar)

Here are the caveats:

I'm not sure we can write to S3 in real-time
With any local DB, we need to deal with sharding/merging. If LB is sitting between user and Jitsu, events of same user might be coming to different machines

I believe that the best way to deal with that would be a) sending data to S3 with Jitsu b) writing a Spark job that processes the data and sends the updated events to Jitsu. The real-time aspect of Jitsu will be lost, but anyway it will do a better job comparing to writing an in-house merger

viggin543 · 2021-09-30T15:09:42Z

@vklimontovich
With a db Like clickhouse that does not support UPDATE statements I see your Point.

But with a db like redshift or snowflake this can be achieved by only storing unique anonymous user ids in redis.
And on user recognition:

finish loading all anonymous user events
run an update statement ( that will set user details on the anonymous events )

Or do I miss something ?

I Believe that retroactive user recognition is inherently something that should not be real time

"retroactive" -> after the fact

That's my intuition.

vklimontovich · 2023-12-20T20:01:39Z

It's time to revisit this given a new architecture of Jitsu Next and the fact that we use Mongo as an underlying storage for user recognition. Here's a preliminary design:

We still should use Mongo as Identity Graph store, and maybe list of <message id, timestamp>, but not whole event
Once match has been found, user-recognition function should send a specially formated event that instructs Bulker to update certain fields in a table
Bulker should support update operation by a certain criteria ob by a set of messages

We should think through design in details, generally speaking it's easy to do if the database allows to pull events by id

vklimontovich · 2024-11-15T18:57:40Z

Closing, we got rid of Redis already and moved to Mongo

viggin543 added the Feature label Sep 29, 2021

vklimontovich added the Jitsu Server label Sep 30, 2021

vklimontovich added ⏳Postpone and removed 🚀 Jitsu Server labels Dec 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retroactive User Recognition at scale without redis #565

Retroactive User Recognition at scale without redis #565

viggin543 commented Sep 29, 2021

vklimontovich commented Sep 30, 2021

viggin543 commented Sep 30, 2021 •

edited

Loading

vklimontovich commented Dec 20, 2023

vklimontovich commented Nov 15, 2024

Retroactive User Recognition at scale without redis #565

Retroactive User Recognition at scale without redis #565

Comments

viggin543 commented Sep 29, 2021

Problem

Solution

vklimontovich commented Sep 30, 2021

viggin543 commented Sep 30, 2021 • edited Loading

vklimontovich commented Dec 20, 2023

vklimontovich commented Nov 15, 2024

viggin543 commented Sep 30, 2021 •

edited

Loading