Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transactions & Pipeline: the current issue and ideas to fix it #681

Closed
gvolpe opened this issue Mar 31, 2022 · 11 comments · Fixed by #685
Closed

Transactions & Pipeline: the current issue and ideas to fix it #681

gvolpe opened this issue Mar 31, 2022 · 11 comments · Fixed by #685
Labels
bug Something isn't working help wanted Extra attention is needed pinned

Comments

@gvolpe
Copy link
Member

gvolpe commented Mar 31, 2022

Introduction

I've had this on my mind for a very long time, so let me see if I can communicate the issue so that others may help.

The current implementation of transactions is almost right, but it can't make any guarantees about commands running in order, because that's up to the runtime system (i.e. the scheduler). Thus, I've been recommending people not to use it.

Note this issue also applies to pipelining, as these share the same core implementation (see runner.scala).

Problem

Say we have a transaction mixing SET and GET commands, just as the following (taken from examples):

commandsApi.use { redis =>
  val ops =
    redis.set(key1, "sad") :: redis.set(key2, "windows") :: redis.get(key1) ::
        redis.set(key1, "nix") :: redis.set(key2, "linux") :: redis.get(key1) :: HNil

  //type Res = Unit :: Unit :: Option[String] :: Unit :: Unit :: Option[String] :: HNil
  val prog =
    RedisTransaction(cmd)
      .exec(operations)
      .flatMap {
        case _ ~: _ ~: res1 ~: _ ~: _ ~: res2 ~: HNil =>
          IO.println(s"res1: $res1, res2: $res2")
      }
      .onError {
        case TransactionAborted =>
          IO.println("[Error] - Transaction Aborted")
        case TransactionDiscarded =>
          IO.println("[Error] - Transaction Discarded")
        case _: TimeoutException =>
          IO.println("[Error] - Timeout")
        case e =>
          IO.println(s"[Error] - ${e.getMessage}")
      }

  prog *> IO.println("keep doing stuff...")
}

Most of the time, this will work, but it's unreliable. And the reason lies deep down at the core of the Java implementation, and how redis4cats models it.

A transaction consists of any set of supported commands that are queued (not run) between a call to MULTI followed by either EXEC or DISCARD. Just like database transactions are modeled via BEGIN / COMMIT / ROLLBACK.

This library models this problem as a Resource, where the acquisition means MULTI, and the release means either EXEC or DISCARD, depending on the result of the inner use block.

Because all the Java commands are basically CompletableFutures lifted into F via Async, once evaluated after a MULTI call, it will never yield a result as these are queued on Redis waiting for the transaction to be committed or rolled back.

So we fork every command via .start in order (see runner). Once the Resource hits the release part, fibers are joined or canceled to avoid leaks, depending on the transaction result.

The MAIN ISSUE with this approach, is that we cannot guarantee the commands will run in order, even if we fork them in order. This is because the execution of every fiber is up to the scheduler, not to us. It is part of the runtime system.

Another implication of transactions, is that every command needs to be run in the same thread, otherwise it would be detected as part of a different transaction (or not even a transaction). To try and mitigate this issue, we introduce RedisExecutor, which simply operates on a fixed thread-pool of size one. This is wrong on so many levels, because it affects EVERYTHING, not only transactions.

This means that by default, the library will run all your commands on a single thread, affecting the overall throughput of your application. It can be configured to a higher value, but it is 1 by default.

Solutions (mainly ideas)

Custom Scheduler

Last night (for some odd reason) this issue appeared in my brain, and made me think about schedulers. Since the main issue is the scheduling of fibers, it seems all we need is a custom scheduler used for Redis transactions, no?

It is very low level, but the issue lays right there on the runtime system, so we need to tweak the JVM scheduler to ensure commands are run sequentially.

Is this doable? I don't know, but I'm laying down the idea here. Perhaps, at some point in the future, I'll give it a try if nobody does first.

Sync API

Another idea that should work — albeit less ideal — is to re-implement transactions using the underlying synchronous API, in which we don't need to deal with CompletableFutures.

I'm pretty sure this one would work, but it would involve a lot of code duplication, and who really wants to do that work? Definitely not me.

Remove support

Not really a solution, but I'd rather delete all this code instead of having a broken implementation. I guess this should be the last resort.

How can you help?

This issue is quite fundamental, so it would require knowledgeable people who like to tinker with code and runtimes a lot, but I believe there's also a lot to learn from the experience of diving deep into this interesting feature.

Another simple issue in which anyone can help is in updating the documentation for transactions and pipelining. We should add a big fat warning sign at the top, telling people not to use it until this problem is resolved (we can link this issue).

If you would like to help, simply comment below when you're ready to take on the issue. I don't really have much time at the moment, but I can probably provide a bit of guidance. Come chat on Matrix / Gitter.

@gvolpe gvolpe added bug Something isn't working help wanted Extra attention is needed pinned labels Mar 31, 2022
@kyri-petrou
Copy link
Collaborator

I'm not very knowledgable so feel free to ignore me or tell me off if any of my suggestions are terrible / don't make sense 😄

  1. Since Redis 2.6.5 you don't have to call DISCARD on error within a transaction. If any command errors, Redis will keep ignoring all commands within the transaction until EXEC is called, at which point it will raise an error. Would that help simplify the design in any way? (ref)
  2. Since all of the commands are being forked to a separate fiber already, would it make sense to create an executor / execution context with a threadpool of 1 at the start of that transaction, use .startOn(ec) instead of .start and shut it down at the end of it or would that be too expensive / bad?

Again, feel free to ignore me or tell me off if any of these doesn't make sense, just thought they might help spark an idea or something

@joroKr21
Copy link

It sounds like modelling the individual operations with IO might not be appropriate. E.g. Doobie has a special ConnectionIO for that purpose. Or maybe IO[IO[A]] so that you can start them independently from waiting for the result. Or just some free data structure RedisOp that you interpret later.

@gvolpe
Copy link
Member Author

gvolpe commented Mar 31, 2022

@kyri-petrou

  1. That's good to know, but doesn't change much, besides removing the call to DISCARD.
  2. I don't think that would work, as it is pretty much what we are doing now. The issue is that multiple calls to startOn(ec) does not guarantee the running order (that is up to the scheduler). The idea is good, though, we could use startOn(ec), where ec needs a custom scheduler.

@joroKr21

I believe it is doable to keep things within the same effect. I'm aware of Doobie's design using the Free Monad, that could definitely work (and I have considered it in the past), but it's the complexity and maintenance burden of such model that puts me off tbh. Also UX is compromised (see Doobie vs Skunk).

@joroKr21
Copy link

Maybe you can use IO.syncStep if you can make sure that starting an operation is synchronous.

@gvolpe
Copy link
Member Author

gvolpe commented Apr 18, 2022

@kyri-petrou your suggestion seems to work pretty well (see #685). Still on early stages but already showing promising results.

I think we'll still have the issue of not being able to guarantee the running order, but I guess that's a compromise we can make in favor of reliable execution of transactions.

@kyri-petrou
Copy link
Collaborator

kyri-petrou commented Apr 18, 2022

@gvolpe that's good news! I'm not too familiar with the cats / scala inners, but for running order would it make any difference if you used .foldLeft over the list instead of .traverse here? (with the necessary refactoring to make that work)

https://github.com/profunktor/redis4cats/pull/685/files#diff-c658d7aade2cc8ce11a81a5895799e8b68b7536fa836dd8745e2990aeabcf010R47

@gvolpe
Copy link
Member Author

gvolpe commented Apr 19, 2022

@kyri-petrou both traverse and foldLeft will go through the list in order. My ordering concern goes deep into the scheduler: i.e. what happens when I use start / startOn? If I start multiple operations in order, is the order of execution respected? I'd need to investigate this further, but my understanding was that the JVM scheduler doesn't make any guarantees about the order of execution when a task is "submitted" to it.

@kyri-petrou
Copy link
Collaborator

@gvolpe ah okay, I see your concern. I'm afraid I know very little on CE / JVM scheduling to have an opinion

What gave me the idea for my suggestion above came from Lettuce documentation and how they use the async API for transactions: https://github.com/lettuce-io/lettuce-core/wiki/Transactions#transactions-using-the-asynchronous-api

I might be extrapolating here but it seems to suggest that as long as you call their async api in order in a single thread it will be executed in order? Again my understanding is not too great so there might be missing other aspects I'm not aware of like the CE scheduling, but maybe this helps in some way

@gvolpe
Copy link
Member Author

gvolpe commented Apr 20, 2022

@kyri-petrou I think your understanding is correct. After all, you can only execute one transaction at a time per RedisCommands, and so far, it seems to work reliably. Perhaps the issue was always something else and I never really got it 🤷🏽

I will go ahead and merge it, I'm happy enough with the current UX, and I want to fix pipelining next.

@gvolpe
Copy link
Member Author

gvolpe commented Apr 23, 2022

FYI: #689 improves UX by a large margin 💪🏽

@kyri-petrou
Copy link
Collaborator

FYI: #689 improves UX by a large margin 💪🏽

😮 that's some great stuff there, well done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed pinned
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants