Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💡 Feature Request - Detect resiliency anti-pattern for CosmosDB multi-write+Bounded staleness #259

Open
davihern opened this issue Jul 4, 2024 · 5 comments
Assignees
Labels
Area: Resource Guidance 📝 Improvements or additions to documentation

Comments

@davihern
Copy link

davihern commented Jul 4, 2024

Describe the solution you'd like

In CosmosDB there is a documented anti-pattern. That is when the CosmosDB is configured as multi-write and has Bounded Staleness.
https://learn.microsoft.com/en-us/azure/cosmos-db/consistency-levels#bounded-staleness-consistency

In case those two settings are configured, WARA tool should add a warning, with the description: "Bounded Staleness in a multi-write account is an anti-pattern. This level would require a dependency on replication lag between regions, which shouldn't matter if data is read from the same region it was written to."

@FallenHoot
Copy link
Contributor

Relying on Bounded Staleness could indeed be considered an anti-pattern. The suitability of Bounded Staleness depends on the application's requirements for data consistency. If an application necessitates that reads always reflect the most recent writes across all regions, then Bounded Staleness, which allows for some lag, would not be appropriate.

The choice of consistency level should align with the application's needs:

  • Session consistency is ideal for applications needing strong consistency guarantees.
  • Consistent Prefix is suitable for applications that can handle some staleness but require ordered reads.
  • Eventual consistency is best for applications where low latency and high throughput are prioritized over read consistency.

This all goes back to the RTO/RPO. In essence, the decision on consistency levels should be tailored to the specific demands of your application.

Best way to explain it is using a video game analogy.
Imagine you're playing a video game with friends online, and you all are in different parts of the world. Now, the game's fun only if everyone sees the same game world at the same time, right? Bounded Staleness is like a setting that says it's okay if some friends see a few seconds of delay in the game world. It's fine for some games, but not for others where you need to see changes instantly.

So, if your game (or app) needs everyone to see the updates immediately, no matter where they are, Bounded Staleness isn't the best setting. You'd want something like Session consistency, which is like a game that updates for everyone as soon as anything changes, but only for the people playing right now.

If your game can handle a little delay and doesn't need the updates in exact order, then Consistent Prefix is like a setting that makes sure no one misses any part of the game, even if they see it a bit late.

And if it's okay for the game to update at different times for everyone, as long as it eventually gets updated, that's like Eventual consistency. It's the chill mode where the game doesn't stress about everyone being perfectly in sync.

So, it all depends on what kind of game you're playing—some need to be super in-sync, and some can be laid-back about updates. It really depends on how you want the application to perform,

@davihern
Copy link
Author

davihern commented Jul 4, 2024

I meant the combination of having multi-write regions AND bounded staleness. As the official documentation states: https://learn.microsoft.com/en-us/azure/cosmos-db/consistency-levels#bounded-staleness-consistency

image

There may be some corner-cases, where customer want multi-write, in order to have data geo-replicated as soon as possible, and limit read/writes in a single region (and have the other secondary write region as a fail over in DR scenario). But it is worth for WARA to alert and review if that is really what customer need, or if there are better alternatives.

@oZakari
Copy link
Collaborator

oZakari commented Jul 11, 2024

Hi @kovarikthomas, could you take a look at this one to determine how your team wants to handle this recommendation?

@oZakari oZakari added the Area: Resource Guidance 📝 Improvements or additions to documentation label Jul 11, 2024
@ejhenry
Copy link
Contributor

ejhenry commented Aug 5, 2024

@kovarikthomas are you able to review and comment on this issue?

@davihern is your ask here to add a new APRL recommendation for the described configuration, or modify an existing recommendation?

@kovarikthomas
Copy link
Contributor

Hi @TheovanKraay - Govind suggested you might be able to help here.

For context - there is a team within CSU that's building a set of ARG (resource graph) queries that aim to identify a suboptimal resource configuration within customer's environments. They are asking if a check for Bounded Staleness consistency + multi-region writes should be included as we call it an anti-pattern in our docs. I guess the question is primarily if there are any valid scenarios where someone would want to configure an account this way, or if it is always a bad idea. Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Resource Guidance 📝 Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

5 participants