Online store latency (redis) is growing over time #3597

RadionBik · 2023-04-13T11:18:54Z

Expected Behavior

The online-store latency remains the same and doesn't grow over the time. We assume that amount of data in redis is not growing.

Current Behavior

The median latency growth over the past 30 days is on the chart above.

Steps to reproduce

We use a custom aiohttp python service to relay requests to the online-store, where we invoke the native feast client. The service is completely stateless, and I don't expect it to be the source of the problem.

here is our feast config:

config = RepoConfig(
    project='feast',
    registry='gs://our-bucket/feast/registry_file_3.db',
    provider='gcp',
    online_store=RedisOnlineStoreConfig(
        connection_string=get_redis_connection_string(),
        redis_type='redis',
        key_ttl_seconds=REDIS_TTL_SECONDS,
    ),
    offline_store=BigQueryOfflineStoreConfig(
        dataset='feature_store',
        project_id='project-id',
    ),
    entity_key_serialization_version=2,
)

as you see, we use the default registry cache TTL (=600).

Specifications

Version: 0.28.0
Platform: amd64
Subsystem: debian 10

Possible Solution

We noticed that changing the path to file-based registry (i.e. effectively re-creating it) eliminates the latency growth and it back to normal (today's chart):

Therefore, a solution might be related to fixing the registry caching mechanism.

Let me know if further details are needed!

The text was updated successfully, but these errors were encountered:

RadionBik · 2023-04-14T14:10:41Z

I have noticed that timestamps of incremental materialization are stored in the registry and are sent to client as well. We run incr. mat. every 15 min, so over month it yields to 30 * 24 * 4 = 2880 timestamps per view, which might explain the gradual increase of the response time. Not sure if it is the reason, but decided to share the info here.

adchia · 2023-04-21T15:29:51Z

Thanks for filing! do you mean that you changed from the file based registry to have a TTL=0 and it was ok?

RadionBik · 2023-04-21T16:51:26Z

I have never adjusted registry cache TTL, it has always been set to the default value.

nturusin · 2023-05-05T08:46:08Z

Hi guys. Any news on this? This issue forces us to update the registry file in production every couple of weeks to reset the latency..

nturusin · 2023-05-23T09:50:44Z

Hi @adchia
Do you have any plans to fix this soon?

RadionBik · 2023-05-30T14:01:15Z

an update from us:
having disabled incremental materialization for a week, we do NOT notice increases in latency anymore, which confirms my hypothesis above. As you at the chart below, we recreated the registry and disabled incr. mat. around 25th of May:

Unfortunately, the cause nor a resolution of it is not obvious for me ATM.

stale · 2023-10-15T16:21:31Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

RadionBik · 2023-11-08T12:22:41Z

we found that logging to feast caused the problem, we disabled it and everything runs without leaks now

tokoko · 2024-03-23T06:47:29Z

@RadionBik hey, thanks for investigating this, can you clarify the last comments? Did you find that both incremental materialization payload and feast usage were the culprits here?

RadionBik · 2024-03-23T10:28:39Z

I might have left the last comment in a wrong issue.

So the current status is:

we disabled feast's telemetry to help with a memory leak problem we had (not this issue)
we do not use incremental materialisation anymore because of the increasing latency problem we reported in this issue

Hope this clarifies the situation a bit

tokoko · 2024-03-28T10:35:17Z

@RadionBik thanks for the clarification

RadionBik added kind/bug priority/p2 labels Apr 13, 2023

phil-park mentioned this issue Jul 28, 2023

feat: Apply cache to load proto registry for performance #3702

Merged

stale bot added the wontfix This will not be worked on label Oct 15, 2023

sudohainguyen mentioned this issue Oct 22, 2023

fix: Add async refresh to prevent synchronous refresh in main thread #3812

Merged

stale bot removed the wontfix This will not be worked on label Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Online store latency (redis) is growing over time #3597

Online store latency (redis) is growing over time #3597

RadionBik commented Apr 13, 2023

RadionBik commented Apr 14, 2023

adchia commented Apr 21, 2023

RadionBik commented Apr 21, 2023

nturusin commented May 5, 2023

nturusin commented May 23, 2023

RadionBik commented May 30, 2023

stale bot commented Oct 15, 2023

RadionBik commented Nov 8, 2023

tokoko commented Mar 23, 2024

RadionBik commented Mar 23, 2024

tokoko commented Mar 28, 2024 •

edited

Loading

Online store latency (redis) is growing over time #3597

Online store latency (redis) is growing over time #3597

Comments

RadionBik commented Apr 13, 2023

Expected Behavior

Current Behavior

Steps to reproduce

Specifications

Possible Solution

RadionBik commented Apr 14, 2023

adchia commented Apr 21, 2023

RadionBik commented Apr 21, 2023

nturusin commented May 5, 2023

nturusin commented May 23, 2023

RadionBik commented May 30, 2023

stale bot commented Oct 15, 2023

RadionBik commented Nov 8, 2023

tokoko commented Mar 23, 2024

RadionBik commented Mar 23, 2024

tokoko commented Mar 28, 2024 • edited Loading

tokoko commented Mar 28, 2024 •

edited

Loading