-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protocol support: RESP3 #2396
Protocol support: RESP3 #2396
Conversation
@shacharPash you might find this interesting... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @mgravell & @NickCraver - hope this finds you well. I'm sure you guys are swamped (my email is an almost continuous stream of updates from this repo 😄 ). @uglide asked me to take a look at this PR. I went through it all and tested it out a bit. Obviously, there are still a lot of TODOs and question marks on the subscriber/push side of the house, but I went through what's here.
Some comments
- Looks like something is blocking the connection logic from returning before the end of the
connectTimeout
. I noticed this while debugging a unit test in Rider, but was able to reproduce it in a console app pretty simply (it might only occur in environments that are more thread constrained) - It looks like the ResultProcessors are going to need some love to get the main command set compatible with RESP3 (try defaulting the tests to RESP3 and running the SortedSet tests and you'll see what I'm talking about). It's not totally shot - the vast majority of things still work, but it's definitely not isolated to sorted sets. I think we might need to do something to make all the tests run with RESP3 as well as RESP2 - might need to think about a way to do that effectively without just adding theories/inline data everywhere.
- IMO defaulting connections with a specified
version
of Redis 6.0+ to RESP3 is a breaking change. Even after we fix all the ResultProcessors, anyone who uses the ad-hoc API is going to have to handle the RESP3 rather than the RESP2 result structures. Take a look at this test case - in the context of Resp3Tests.cs to see what I'm talking about. I'd say keep RESP3 as opt-in only until 3.x - I'm a little iffy on the
Type
,Resp2Type
,Resp3Type
thing - I guess it's only for internals and ad-hoc users again but I wonder if it might just be better to leave it asType
andResp3Type
withResp3Type
perhaps being a combination ofType
with some bits above the current mask bit? If RESP3 is opt-in-only, we could probably just shove everything intoType
? I don't really have a great answer to this, but those are some initial thoughts and I worry that having the three properties makes it all a bit muddled.
Hopefully, it goes without saying that @uglide, @chayim, and @shacharPash's offer of aide here still stands. Obviously, the parts that require a more intimate knowledge of connections (e.g. pub/sub) are probably best left to the maintainers, but mending the result processors and straightening out the test framework, are stuff I'm sure we could help with. For my part, happy to pick up a shovel :)
[InlineData("someserver,version=5.9,protocol=2,$HELLO=", false)] | ||
[InlineData("someserver,version=5.9,protocol=2,$HELLO=BONJOUR", false)] | ||
// specify a post-6 version; attempt by default | ||
[InlineData("someserver,version=6.0", true)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the case I'm a little worried about, I'm not really sure how many users actually specify the version in their connecitonString
/ConfigurationOptions
, but this is the only case where they would be inadvertently switched over to RESP3 without asking for it on an upgrade.
public class ProtocolDependentConnectionCache : IDisposable // without this, test perf is intolerable | ||
{ | ||
private IInternalConnectionMultiplexer? resp2, resp3; | ||
internal IInternalConnectionMultiplexer GetConnection(TestBase obj, bool useResp3, [CallerMemberName] string caller = "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I try to debug with useResp3
set to true (doesn't happen on RESP2), I see it hang/deadlock during the connection. It's most likely an issue with the Wait
deeper in the test framework, but could speak to a larger issue with connections. Just try debugging this method:
[Fact]
public void BreakResp3Debug()
{
Fixture.GetConnection(this, true); // add breakpoint here and step over
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW it appears to come through alright in Redis:
1691084023.574314 [0 172.17.0.1:42706] "HELLO" "3" "SETNAME" "BreakResp3Debug"
1691084023.578812 [0 172.17.0.1:42706] "CLIENT" "SETNAME" "BreakResp3Debug"
1691084023.579017 [0 172.17.0.1:42706] "CLIENT" "ID"
1691084023.580443 [0 172.17.0.1:42706] "INFO" "replication"
1691084023.580458 [0 172.17.0.1:42706] "INFO" "server"
1691084023.580498 [0 172.17.0.1:42706] "CLUSTER" "NODES"
1691084023.581427 [0 172.17.0.1:42706] "GET" "__Booksleeve_TieBreak"
1691084023.582093 [0 172.17.0.1:42706] "ECHO" "\x00\xe9b$\x02\xa6\xc8B\x95\x1bz\xc39v\xfb\xb2"
1691084023.596301 [0 172.17.0.1:42706] "INFO" "replication"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might actually be something to do with the connection timeout - you can try just connecting in a console app too and you'll see it, it takes the whole time provided by the connectTimeout
and then it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm investigating the stall problem; I didn't have that locally when I last saw it, but I'm on a clean machine and I do repro that now; this is clearly a major thing - I'll figure it out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only see it when running console/debug/many parallelized tests, feels like task/thread contention.
@slorello89 brilliant feedback; I'm going to carve out some time to work through those and respond and/or act |
@slorello89 my primary focus right now is fixing the connect hang, but I'm very interesting in what feature difference you're seeing (you mention sorted sets, etc); the design was basically "users shouldn't notice any difference unless they ask for it very explicitly via |
@mgravell ☝️ FYI forgot to tag you. |
@mgravell, looks like there's an extra array embedded in the result set: RESP2
RESP3
|
Like I said, I think this is something that doesn't have to be visible to a user, but the Result Processors are making erroneous assumptions about the shape of the Result at the protocol level - in the case of As I was saying - nitty-gritty protocol-level stuff like this is something that we can definitely give you a hand with. Save you the hassle of having to go line by line over every single Result processor/data structure. |
Found this myself now, but yes: that shape change is unanticipated; I was expecting that However, I'm going to say that IMO the documentation of You're right, we're going to need to do a command-by-command look here; and we can't use the documentation, because it doesn't say; I have, however, got the tests now runnable in a with/without RESP3 side-by-side way that looks like this: public class Resp2SortedSetTests : SortedSetTests
{
public Resp2SortedSetTests(ITestOutputHelper output, ProtocolDependentFixture fixture) : base(output, fixture, false) { }
}
public class Resp3SortedSetTests : SortedSetTests
{
public Resp3SortedSetTests(ITestOutputHelper output, ProtocolDependentFixture fixture) : base(output, fixture, true) { }
}
public abstract class SortedSetTests : ProtocolFixedTestBase
{ ... tests here ...} |
I suspect I now agree with you on the topic of auto-enabling RESP3; the problem here is that even if we handle all the shape changes correctly, if someone uses note that this doesn't apply to Lua as far as I know - it only applies to actually, the problem gets even bigger than that; any library tool (I'm thinking of things like "asp.net distributed cache", or similar) now needs to be reviewed to see "do I issue ad-hoc commands? if so, can those commands suddenly change shape based on the RESP level? and do I 100% control the connection configuration, or could the external application code request RESP3 without me knowing about it?" oof; problems I'll say again: radically changing the result shape here was IMO a mistake; "array" => "map" ? fine; but this... so much oof |
Hey @mgravell let me follow up with what's going on with those docs. That's exactly what I'm worried about RE Execute[Async], StackExchange.Redis effectively exposes the protocol, so if you change the protocol without a code/config change from users, you change the underlying result structures in a way you'll never be able to fully map out. That's basically the definition of a breaking change, so I'd say having automatic RESP3 is 100% a 3.x thing. I'm aware that's not something you really want to do at this point (I know there are a lot of other features you want to put into 3.x when one day it comes out :) ) which is why I think Resp3 as opt-in is the correct course here. If Resp3 were opt-in, I would not be too worried about third-party libraries, for one thing, the more prominent ones don't use Execute[Async] - Microsoft.Extensions.Caching.StackExchangeRedis (which you were referring to) for is just scripts and some hash functions, that covers caching and session state in aspnet core, then the old WRT weird shape of ZRANGE, the .NET dev in me wants to agree with you that "hey we are mapping members to scores so that should be a dictionary", but the more I think about it, I think I agree with the resulting structure, maps are called out in the spec as "an unordered collection of key-value pairs", ZRANGES with scores are definitely not that, they are ordered pairs of member and score. |
I disagree. I'm pretty conversant in redis (from V2 days onwards) and RESP - I read and commented on the early RESP 3 spec the day it came out; I was part of the folks convincing Antirez not to go RESP3-only. I have implemented RESP3 parsers at least 3 times, and: I found out about this linear-to-jagged shape change today. I will acknowledge that I may have overlooked something, but IMO I'm a pretty reasonable litmus test that if I didn't know about this: it isn't obvious from the documentation. If it is there and I didn't register it: it probably isn't emphasized enough with examples. How could someone possibly know about this? The docs for ZRANGE don't mention this, and even if they did: what would cause someone to look there if they aren't actively touching their ZRANGE code but instead, just changing the hsndshake? The RESP3 docs do not, as far as I know, call this out - rather, they seem to imply that the new data types may be used to convey additional context, and optionally metadata (although IIRC that is never actually done). |
I Wil agree with you on the unordered part. What's weird about this particular scenario (ZRANGE linear to jagged) is that it doesn't actually use any RESP3 feature; the jagged array concept is entirely representable in RESP2 - in fact, with the exception of the encoding of the score: the exact response used in the RESP3 version: would work identically in RESP2 (and the score can be encoded as a simple string in RESP2). I think that's why it is so unexpected: it isn't "here's a change to use RESP3 features"; it is "hey, since we're using RESP3, let's represent the data in a different shape for unrelated reasons". Those unrelated reasons might relate to data purism or similar, but they clearly aren't related to RESP3 (since that shape works just fine in RESP2). As such, the thought process of someone updating to RESP3 can't be "I understand the RESP3 features, so I can anticipate what things might have been tweaked to use those features", but rather it needs to be "I need to read the minds of the server folks to predict what arbitrary data shape changes they might have chosen to make for whatever reasons, just because it was a milestone". And that is not a fair demand. Sorry if I'm sounding grumpy. If I've misunderstood, in very happy to re-evaluate and reconsider my position, and offer sincere apologies. But if I'm close to being right, then: no, nobody can predict those changes, and IMO the only correct solution is for the redis documentation to explicitly document each and every one of these, in two places:
|
Additional ; I checked the spec history; "map" was ordered, until Jan this year and Redis 6.2; if we're going to count ordered vs unordered as a significant reason not to use map for ZRANGE, then IMHO this it was an unconscionable spec change to rewrite "map" from ordered to unordered. I also wonder (but have not looked) when the ZRANGE jaggification happened, because I'm guessing the spec said "ordered" when it was, which kind of renders the "we can't use map because map is unordered": moot But: that ship has sailed. We are where we are. The spec has been changed in a minor to redefine map, and ZRANGE has been jaggified. I don't propose we can change those things now, but: oof, it is a bit of a mess :( |
Hey @mgravell - I think we are agreeing here. I 💯 agree with you, these differences need to be properly documented (which is why I led off with I'll try to track it down). I think we really need @uglide's input here, he's spearheading the client end of our RESP3 efforts, maybe he can add some color here as I might be missing something. My point before was that, given our experience here, we should flag for users in the This all started btw because I was looking at what you referred to as "minor" differences in the spec see that gist I referred to earlier, I was actually looking at was the result types in the assertions (actually, almost that exact minor differences -> Btw - I think they had a similar discussion about the ZRANGE result structure a couple of years ago (before my time) - it doesn't look like it was actually because of the spec (the spec change wouldn't happen for another year as you pointed out, and that seems to be completely unrelated). |
@slorello89 thanks for all the input; commit 4ffef63 adds generalized (i.e. not command-specific) support for scenarios where we are handling interleaved data that has been jaggified; fortunately |
unrelated, nice usage of C# pattern matching to make this code easier: - if (el.Resp2TypeArray != ResultType.Array || el.ItemsCount != 2) return false;
+ if (el is not { Resp2TypeArray: ResultType.Array, ItemsCount: 2 }) return false; |
@slorello89 as an update, I've added tests for a much wider range of tests; the other failure category I've found is |
@slorello89 started discussion here: redis/redis#12466 |
Hey @mgravell - I've been asking around a lot the last couple of days to see what I could pull together for you, the best resource I've found comes from @chayim - for background, Chayim is the engineering lead in charge of managing the clients team, so they've been doing most of the RESP2-> RESP3 upgrades for the clients Redis maintains, as you might imagine, they've been contending with these issues as well. He had a document his team pulled together from their collective wisdom that is a possibly incomplete list of the response changes when migrating from RESP2 -> RESP3. I've taken out the key table that we're most interested in, I've also shared a copy of the document in google drive - but I don't think anything else is super relevant to us here (I'll add the caveats from the document above the table here so it's clear). @uglide is going to work on compiling a comprehensive list. Caveats from the document
Table
|
That's a great start - it looks like my generalised jagged detection caught a good number of similar changes, then. I think I'm going to have to go "shape detection" on XREAD, as it is server version dependent. Fun. |
@mgravell @NickCraver I've opened redis/redis-doc#2511 to track the work on getting the public docs together for the Result Structure mapping. |
…-centric - introduce `Resp2Type` and `Resp3Type` shims (`Resp2Type` has reduced types) - mark existing `Type` as `[Obsolete]`, and proxy to `Resp2Type` for compat - deal with null handling differences - deal with `Boolean`, which works very differently (`t`/`f` instead of `1`/`0`) remove RedisResult.Type from Shipped.txt - handle RESP3 types - handle [+|-]{inf|nan} - avoid alloc when parsing doubles made the RedisResult constructor non-public (fix accidental API) remove RawResult.Type; fix broken loop format incorrect attribute check - nomenclature: MultiBulk => Array - efficiency: use bit-packing to RESP2 type conversion is a bit mask fix RawResult.HasValue fix null array return (EmptyMultiBulk is no longer helpful) add a ToString to Replica (other bits were local test setup issues) simplify switch in TryParseDouble - protocol configuration parsing - avoid inbuilt equality/comparison operations on Version - rules for when to try resp3 configuration documentation words fix ConfigurationOptions.Clone actually connect via RESP3 demand redis6 in the RESP3 connect test remove unnecessary directives Lua results more tests and tweaks to fix tests simplify handshake fallback connect move SETNAME back into HELLO message; add lots of documentation about *why* DEBUG PROTOCOL tests; some failures to look at fix protocol tests add missing "hide me" attribs add docs and release notes tyop redundant re-enable to get server-maintenance notifications - ConnectWithBrokenHello is inconclusive if not a v6 server - allow non-RESP3 tests on non-v6 servers "DEBUG PROTOCOL" tests are inconclusive on non-v6 fix TryConnect (CLIENT ID) not always available save all counting is hard expose IAsyncEnumerable on ChannelMessageQueue (#2402) * expose IAsyncEnumerable on ChannelMessageQueue fix #2400 * PR number * move ChannelMessageQueue.GetAsyncEnumerator to shipped LUA conversions version Lua RESP conversions true/false handling depends on setresp(3) revert "if" split in ResultProcessor use enum for RedisProtocol reinstate parameterless RedisResult .ctor fix resp 2/3 inversion snafu from enumification fix resp dependent connection reuse issue add failing Execute test re RESP2 vs RESP3 delta ValuePairInterleavedProcessorBase should auto-handle responses that have become jagged in RESP3 pattern match is easier to read here - move IsResp3; that is a PhysicalConnection thing, not a ServerEndPoint thing - allow RawResult to know whether it is RESP3; involved moving some flags (which removes a bit hack we were using, so: yay) - make the interleave un-jaggedify only apply on RESP3 add more RESP3 API change tests compensate for XREAD having a different shape in RESP3 disable implicit RESP3 based on target server version # Conflicts: # docs/Configuration.md # docs/ReleaseNotes.md # src/StackExchange.Redis/ConfigurationOptions.cs # src/StackExchange.Redis/ConnectionMultiplexer.cs # src/StackExchange.Redis/Interfaces/IConnectionMultiplexer.cs # src/StackExchange.Redis/ServerEndPoint.cs # tests/StackExchange.Redis.Tests/PubSubTests.cs # tests/StackExchange.Redis.Tests/TestBase.cs
- fix cluster tests (false positive: we *expect* zero subscriber connections in RESP3) - fix flakey Lua test (unrelated to resp 3) - separate the Me() key in protocol dependent tests
…rtion unless explicitly requested
…die) - fix tests that assumed Create returned a real ConnectionMultiplexer
@slorello89 for info, the handshake delay, RESP3 command deltas, and various other things are now all: done |
…t the fixture hold broken connections
…esting This is a simpler approach to duplicating tests, but also fixes many - when changing things I noticed several base classes were incorrect so we silently dropped many tests in the PR inadvertently.
@mgravell I simplified the test for RESP2/3 greatly, but RawResult needs some merge love after the pointer changes - can you take a look please? (have to switch to an outage, not sure when I'll get back to this) |
Will try to do tomoz |
We have to get off AppVeyor and on to something else, testing against 3.1 in the mainline is kinda bananapants at this point.
These were colliding on RESP2/RESP3 but also using an entirely different system...let's simplify!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a full review pass and minor doc tweaks and such here - looking good!
Added one comment to look at in RedisResult where we had:
internal static bool TryCreate(PhysicalConnection connection, in RawResult result, [NotNullWhen(true)] out RedisResult? redisResult)
{
try
{
if (result.Resp3Type == ResultType.Null)
{
Console.Write("hi");
}
...sanity check the removal, in case it was a throw-here case or something?
switch (result.Type) | ||
if (result.Resp3Type == ResultType.Null) | ||
{ | ||
Console.Write("hi"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgravell Was this just to debug something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to assume so and delete here - leaving comment to double check if this is some cases we maybe wanted to throw on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor docs questions, everything else looks good. For good measure, I ran the test suites from Redis OM and NRedisStack against the RESP 3 updates, no issues aside from a couple of Obsolete warnings I needed to suppress.
I think what's in: redis/redis-doc#2513 is pretty comprehensive insofar as response changes are concerned, so there shouldn't be any more surprises on that front.
Resp2Type
andResp3Type
shims (Resp2Type
has reduced types); existing code using[Obsolete] Type
usesResp2Type
for minimal code impactType
as[Obsolete]
, and proxy toResp2Type
for compatBoolean
, which works very differently (t
/f
instead of1
/0
)[+|-]{inf|nan}
when parsing doubles (explicitly called out in the RESP3 spec)HELLO
handshakestreamed RESP3omitting; not implemented by server; can revisit