-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Proposal: DbCommandSet -- allows multiple DbCommands to be transmitted to server in single message #28794
Comments
The above proposal is offered in the friendliest way possible as an alternative to #28633 (this general idea was discussed in that thread but didn't fit the direction being taken for that proposal). Both have the same high-level goal but differ in the details of how they achieve that goal. Differences#28633's approach flattens the result sets returned by the various commands so that they are accessed sequentially using a single data reader. For example, imagine a command set consists of a command returning one result set, a command returning two result sets and another command returning one result set. #28633 will expose the results for this set via a single data reader which is used to access all five result sets. With this proposal, results are grouped by command. Using the example scenario from the preceding paragraph, with this proposal, first a data reader containing the first command's single result set is provided, then a reader for the second command's two results sets, followed by a reader for the third command's single result set. The "flatten results from each command into a single data reader" approach of #28633 streamlines use-cases such as executing a set of select statements where each returns a fixed number of result sets and where errors are not expected, as well as executing sets of insert statements where all are expected to succeed together or the entire set should die on first failure. However, not every scenario where command sets would be useful aligns with this streamlining. For example, the ability to work with the results from each command separately comes in handy in the following situations:
In summary, by grouping results by command, this API proposal operating at a slightly lower level than #28633. By operating at this slightly lower level, it’s possible to support situations that flattened results either doesn't accommodate or doesn't accommodate as gracefully. It also more closely mirrors how results can be returned by the database server (e.g. when independent commands are executed in a single message, SQL Server segments the various result sets by command). Flattening can easily be implemented over top this proposal, for those use cases where flattening is ideal. |
@bgribaudo, sorry it took so long to answer - I only just recently realized this issue existed (always a good idea to mention relevant people to make sure they're aware). First, it's really great to see this: regardless of what I think of the proposal or what happens to it, it shows that there's real interest and passion around this subject, and it makes me confident that we're on the right track trying to evolve the API. So thanks for engaging. Having gone through the issue, my opinion is still the same as described in #28633. I will explain why I don't think this is the right API, point by point, but that's just my opinion and I really am open to having my mind changed (I'm also not making any decisions alone on this). In a nutshell:
In other words, I don't think your proposal is wrong or useless - it does solve real problems - but it does so by adding complexity for everyone, whereas the problems it solves seems rare and solvable via other means. Added complexityFirst, to be sure we're all in sync, here's what consuming two resultsets would look like according to #28633: DbCommandSet dbSet = ...; // Set up a command set
using (var reader = dbSet.ExecuteReader())
{
// Regular, familiar pattern for accessing DbDataReader
while (reader.Read()) {
// Process first resultset
}
reader.NextResult();
while (reader.Read()) {
// Process second resultset
}
} And here's what it would look like according to this proposal: DbCommandSet dbSet = ...; // Set up a command set
using (var results = dbSet.Execute())
{
using (var reader1 = results.GetReader()) {
while (reader.Read()) {
// Process first resultset
}
}
results.NextCommandSetResults();
using (var reader2 = results.GetReader()) {
while (reader.Read()) {
// Process secondresultset
}
}
}
To me, this added API complexity (which again, applies to everyone regardless of whether they're using variable resultsets) outweighs the value this proposal brings, even if I do agree that it does bring value. Rareness of the problemFirst, it's obviously hard to prove, but I have a strong feeling that in 99% of cases, users execute batches of a fixed number of commands, returning a fixed and known number of resultsets. This comes from a long experience working with users - variable resultsets have been a very minor edge-case - and #28633 was designed with this in mind. However, I could be biased (I come from the PostgreSQL world), and I'm open to hearing otherwise. Second, once a structured batching API is introduced, there shouldn't really be a reason for users to continue using the legacy, concatenation batching "API". This isn't to say it shouldn't work anymore - providers can continue supporting concatenation batching even within a structured batch ("batching within batching") - but at the moment I simply don't see a compelling reason for users to do this. If that's correct, that would mean that the variable resultset problem is in effect restriced to stored procedures only. Finally, even stored procedures returning variable resultsets seem to be a rare exception, rather than any sort of norm. Examples like Other solutionsIf there were no way at all to discern which resultset belongs to which command, I'd agree that we have a problem. However, as I wrote here, there are other solutions for this:
As variable resultsets are probably rare, the above seem to be perfectly reasonable ways to get the needed information. I'd personally want to understand what's insufficient about them before making the general API more complex for everyone. It's also worth remembering that this is only a batching API - it improves performance but there's always the option of not batching if absolutely necessary (although at the moment I'm really not aware of such a situation). Answers to your specific points
I'm still skeptical of the idea that query/command handling can continue in a reliable and sane way after an error occurs - or that it's a desirable feature. Examples of these up to now have been very theoretical, it would be good to see a reasonable, real-world example here, to avoid the risk that we're discussing an exotic/contrived scenario. But more importantly, error handling doesn't seem to add an additional argument here - the only point seems to be again, the purported difficulty to discern which resultset belongs to which command. It seems that the solutions proposed above (via resultset query shape, via marker query) work just as well in error situations, unless I'm missing something. I don't see any reason to suppose that the reader will be killed upon error - that's a provider-specific implementation detail. If the provider does support handling commands in the batch after an error (again, I'm skeptical), then there's no reason it wouldn't allow you to continue using the same reader for that.
Already addressed above. Yes, the consumer would need to apply more advanced logic in this scenario - but it's possible and seems reasonable given the supposed rareness of the need. On the other hand, if we change to a non-flattened API everyone has to do more work, all the time, just to go over regular batch results.
The problem with this argument, is that CommandBehavior seems to make sense only when given at execution time, before command(s) are sent to the database (or as part of them). In other words, by the time you pass Note that for SingleRow and SingleResult it's usually possible to achieve the same thing in SQL (e.g. with NOTE: When I wrote the above, I was referring to the overload of
It's true that your proposal provides a simple solution for getting the non-aggregated affected rows. However, we already have two other proposals which provide a solution to this. I do agree they're a bit less elegant, but on the other hand I also consider non-aggregated affected rows an advanced/rare scenario mainly to be used by ORMs and other layers, for features such as optimistic concurrency. I really don't suspect end consumers will have much need to directly use this API feature.
I don't really agree with this point, and think that from a compatibility/legacy point of view the situation is in fact the opposite. As I wrote above, lots of code already exists out there which makes use of concatenation batching. If #28633 is implemented as-is, then the existing code need only be modified to replace the concatenated DbCommand with a DbCommandSet - the reader both produce should be identical. However, if your proposal is implemented, then the consuming code must be modified since there's no longer a single reader as before, but multiple ones. In that sense your proposal makes it more complicated, and not easier, to switch to the new batching API. It is true that if you have an existing method that accepts a DbDataReader, closes it with the assumption that it has only one resultset, and you want to start passing it a batch-produced reader, then yes, it would need to be modified to not close the reader. However, that seems reasonable for that very specific design - I'd prefer current uses of concatenation batching to be easier to port. /cc @divega @ajcvickers |
@roji Often when there is a common, simple experience that will work for most people, but then some things that may need to be tweaked or may need more access for advanced cases, we approach that by creating building blocks that allow the customization when needed but then we also create convention/sugar experiences on top of that so that it doesn't become "harder for everyone". Do you think there is any room for this here? Of course, taking that approach too far leads to over-engineered and heavy solutions. For example, I'm not sure too many layers here is a great idea because of perf (including excessive allocations), but then again, maybe the low-level building blocks can be fast, and only the super sugary sugar that is a bit more allocate heavy. Regardless, just something to consider. |
@ajcvickers it's certainly possible to think about two layers here - a lower-level one which doesn't flatten command results and an upper-level one which does. As you wrote, I'm also not keen to go down that path for reasons of complexity, perf, etc. But a crucial point is that at the moment, I'm not aware of anything which wouldn't be possible with the simpler API being proposed (#35135) and that would be made possible with this proposal; mechanisms do exist for discerning command edges when consuming resultsets, it's just that the basic API doesn't do it for you automatically. In that sense, this proposal seems to add convenience rather than functionality, making a more complex (but IMO quite rare) task easier to perform. This is why I don't think it's a good idea to complicate things, either by providing one more complex API (this proposal) or one simple and one more complex API. I really could be wrong here - I'm just waiting for the additional information to show it. If we feel that variable resultsets really are a very common scenario, then it may make sense to make it easier to consume them via this proposal. Or if we find some serious fault with the proposed alternative solutions for discerning edges (i.e. resultset shape, marker query) - some important scenario which isn't handled - then once again it would make sense to look further. But at this point I haven't seen one. |
One additional important note about CommandBehavior. In your proposal there are two places where CommandBehavior can be specified:
The first one doesn't work as I explained above, since it occurs after everything has already been sent to the database, and so cannot really impact performance as it should. However, the second option doesn't suffer from this problem, and is definitely a possible improvement to the API. It's important to note that this is completely orthogonal to the rest of your proposal, i.e. it has nothing to do with flattened/non-flattened resultsets - the only change is to the DbCommandSet API itself. As such, I'm going to add it as an open question to #28633 and we can continue discussion there. We can keep this issue to focus more specifically on the resultset issue and the proposed DbCommandSetResult API. |
@roji, thank you for taking the time to read this proposal and for the detailed comments. We are both pulling toward the same big picture goal. Taking a step back from implementation details, one question I’ve been thinking about is “what are the use cases for command sets?” There are many different ways to slice and dice when grouping and describing use cases. Here’s one attempt…. Use CasesORM Scenario (a.k.a. the Template Command)I get the sense that (one of) the main use case(s) motivating #28633 comes from the ORM world. Say, changes have been made to a set of objects which need to be persisted to the database. When those changes are written to the db, revised values for the affected rows should be returned from the db (e.g. so that object properties can be updated with values from database computed columns, from columns updated by triggers, etc.). Also, applying the changes from the objects should all either succeed or fail together. The query that needs to be executed might be something like: The common way around this is to concatenate multiple statements together into a single command ( In this use case, the need is for:
In essence, a single command template is being applied multiple times with different arguments each time. General Use CaseThis includes all situations which do not exactly match the need list for the ORM Scenario/Template Command. :-) In essence, the developer decides that several distinct commands would be more efficiently executed if they were multiplexed together for transmission to the database server. For example:
Real-World ExampleA few years ago, I worked on a .Net project where command sets would (presumably) have significantly helped performance. In a nutshell, the tool being built batch imported data: read from source file, apply transformations then dynamically map to tables/columns in a SQL database (with mappings computed based both on the source file’s headers and the database schema). Database errors from foreign key constrain violations were expected and needed to both be logged and cause the current source row to be skipped but could not abort the entire import. Since errors needed to be tied to the specific source row, the data for each source row (which could target multiple tables) was inserted separately. Imagine a SQL batch something like: A great candidate for command sets—though definitely not a ORM Scenario/Command Template situation (multiple commands in the statement, continue on failure, etc.). One idea which we didn’t get to implement at the time but which came up and had real-world applicability was for the tool to log the identity values of the just-inserted rows. To do this, the INSERTs would change to something like “INSERT TableA … OUTPUT INSERTED.IdColumn;…”. If this were done and a foreign key constraint violation caused an insert in a particular batch to fail, the remaining inserts in that batch would have been skipped over by the server as it moved to start on the next batch in the set, leading to the result sets returned for that source row being less than the expected quantity (a.k.a. a variable number of results being returned). Is this a reasonably fair way to articulate the use cases for command sets? |
How results are returnedDescribing use cases still leaves the big question we’ve been discussing about how result are returned. :-) Under ether proposal, in the case of ORM Scenario/Template Command, the For the General Use Case, returning the results separated by command batch maintains symmetry with the fact that separate command batches were passed in and allows the data readers to behave almost exactly identically to how they would if the commands where executed separately. Since the developer is consciously aware of that they are multiplexing distinct command batches and is using an API from a section of .Net that works with lower-level data access functionality, I’d propose that it is not unduly burdensome to expose them to the same level of detail/responsibility on the result consumption side (that is, by returning results segmented by command batch). All this said, even if it is not unduly burdensome, it is still fair to ask if this level of detail is necessary or if it could be eliminated for convenience purposes. If it is felt that it could be appropriate to segment by command batch, at least in some cases, but the concern with providing an API for this is not wanting to complicate the use case inspiring the original proposal (ORM Use Case/Template Command), what about providing a simplified template command API for that use case and then have a more general API that accommodates the other/more advanced uses folks might make of it? An API tailored just for the idea of a template command could be streamlined even more than either of the current proposals. Instead of requiring the consumer to create a set of commands, why not let them define a command template, then pass an enumerable of enumerables of parameters (where each item in the outer enumerable corresponds with one invocation of the command and the items in the inner enumerables are the parameters for that particular invocation). Executing the set would then prepare the template then send transmit the set of commands using the prepared reference (instead of paying the transport byte cost to re-transmit the full SQL statement once for each invocation).
In other words, instead of trying to have a one-size-fits-all API that makes the simple more complex than it needs to be (the ORM Use Case) and the complex less accommodating than it could be (the General Use Case), what about splitting the two and so bypass the problem? :-) (Behind the scenes, the implementation of template command could certainly leverage the general purpose API.) |
@roji, some thoughts on specific comments you graciously shared. All of these are primarily given in the context of/from the prospective of the General Use Case.
Actually, only commands (a.k.a. batches) would have their results segmented into separate data readers. However, if a single command returns multiple result sets (e.g. because several SQL statements are concatenated together), nothing changes. Each of those result sets would still be returned by the same data reader (identical to how
Conceptually, I look at it as a symmetrical input-to-output flow: “consumer passes a set of distinct commands to the multiplexer, consumer receives back a set of distinct command results from the multiplexer.” :-)
I’m not so sure. :-) In simpler situations where using single DbCommand satisfies all of a consumer’s needs, I suspect there’s a good chance consumers will keep concatenating. After all, in the more basic scenarios, using a command set adds complexity without offering a measurable advantage (Imaging a train of thought of: Why create a command set instance and separate command instances for each statement if putting them all in the same string works fine?). If consumers keep doing this, then some of those concatenated-statement commands will find their way into command sets. Also, there are advanced situations where command sets will be most helpful but multiple statements need to be placed in a single command batch within that set to achieve the desired effect (e.g. executing an insert into several related tables where the key from the first table needs to be used as a foreign key reference by the others).
In situations like the batch insert tool I mentioned above or the idea of making a distinct calls to the ‘look up item details’ stored procs—in both cases, a failure with one command shouldn’t kill the entire set. Currently, if SqlClient encounters an error before the first result set (e.g. from a variable DECLARE/SET statement’s expression, from early in a stored proc’s logic), the exception for the error is returned from (I’m assuming that API implementers will maintain similar behavior between the data readers returned by their |
Maybe this varies client by client. With SqlClient, I believe only |
@bgribaudo, thanks for your responses - it's very valuable to get this kind of feedback and the discussion is very important. Use cases and a "SIMD" APIFirst, regarding use cases... I don't necessarily think that ORMs behave in some common way that's distinct from other non-ORM scenarios - different ORMs do different things in different situations, and a lot of what they do (possibly even all) overlaps with non-ORM usage. The "ORM Template command" scenario you described - where the same SQL is executed several times with different parameter sets - is indeed important; let's call it "SIMD" for brevity. EF Core, for one, makes heavy use of batching (when the user calls I think that as you suggest, an API tailored for SIMD batching does make some sense. FWIW this is exactly the way the JDBC API works - see section 14.1.4 of the JDBC 4.3 specification. Unfortunately, in JDBC this comes at the expense of batching multiple heterogeneous statements in a single PreparedStatement - only SIMD is supported. Multiple heterogeneous statement batching is supported by the non-prepared Statement, but this means that no parameters are supported (nor preparation, of course). We're trying to do better with ADO.NET. The important point about a SIMD-specific API, is that it can easily be implemented on top of either one of our proposals: some API would accept a single SQL statement and multiple parameters sets, and would simply create a DbCommandSet with many commands, each having the same statement and one of the parameter sets. One could even merge the resultsets coming back from this into a single resultset for easier access, as you suggested (although rows affected for each command would still need to be translated). But at the end of the day, since this can be seen as a purely additional layer, I don't think this belongs in ADO.NET, which is a lower-level API that IMO should aim only to expose access to provider features which cannot simply be implemented on top. Note that even if a SIMD batching API were introduced into ADO.NET alongside a more general, non-SIMD batching API, I still would want the latter to produce a single reader, for all the reasons already discussed (I still expect the vast majority of cases to not have variable numbers of resultsets). Error handlingFirst, in some of your "general use case" scenarios you describe stored procedure calls that return an error when an item is not found. As a general rule, I'd not expect an actual database error - DbException/SqlException - to be raised in this case, but rather some other way of saying "not found" (the procedure could simply return an empty resultset, or a null, or whatever). Similarly, rather than relying on constraint violations to be generated as part of normal processing, it may be possible to instruct the database to simply ignore the INSERT in that case (PostgreSQL has But the point is, as I wrote above in my response to errors, that there's nothing in either proposal to prevent this - a provider such as SqlClient can support continuing after error just fine. If we follow #28633, that would mean that the single reader returned from DbCommandSet is "resilient" with regards to errors; calling
One note - I'd recommend avoiding the use of the term "multiplexing" here - multiplexing usually means that more than one producer is producing SQL statements to be executed at more or less the same time, and all these go through the same connection - take a look at npgsql/npgsql#1982 for some thoughts on this in the context of the Npgsql provider specifically. I think I understand where this is command from - SQL Server using "batch" in a a way that corresponds to a multiple statements in a single command - so there's an amgibuity. So to avoid any doubt, when I write batching I'm referring to batching commands, not statements. On specific points
Right, but the point is that if an application is already using concatenation batching - which is likely if the user is already interested in batching/perf - then #28633 makes porting to the new DbCommandSet API easier (since a single reader is maintained).
As I wrote above, I'm not proposing that we disallow concatenation batching - a provider is free to continue allowing it. At least for Npgsql and SqlClient, we're assuming that there are some significant perf advantages to command batching as opposed to concatenation batching, and hopefully users will understand that. In any case, this doesn't really seem to matter in the context of this discussion.
I definitely don't know anything about the SqlClient behavior (and don't enough time to dive in), but if SingleResult and SingleRow don't actually reduce the data transferred over the wire, then I'd say they're largely useless... Skipping rows (or resultsets) beyond the first one can easily be done by the user themselves without any help from the driver (@Wraith2 or @divega may be able to provide more info on the actual SqlClient behavior). In any case, there are other providers out there where CommandBehavior does impact what is sent out, so the API needs to allow that. SequentialAccess is indeed a different case as it's not meant to have an effect on what data is transferred, only on how it's accessed. |
I agree. :-) I was trying to allow for this by "If a reading-specific behavior is passed when a command is added or an execution-level behavior is passed during reading, an exception should be thrown." In essence, each provider is responsible for deciding which behaviors are appropriate at which point and should then throw if a behavior is specified when it was inappropriate. As an alternative, the API could be simplified so that specifying behavior is only allowed when adding a command to the set (e.g. remove the option of specifying it when requesting a reader). |
Understood. I don't think it's a good way forward to allow this to be specified late if many main use-cases wouldn't be supported by it. More importantly, at least in my mind I'm not convinced that a multiple reader approach (as proposed in this issue) is the right way forward, and this question of where to specify the command behavior (execution-time or reader-get-time) is only relevant within this proposal - in #28633 only one reader is returned anyway. Regardless, the value/desirability of allowing multiple command behaviors within a single batch is still relevant within the scope of #28633. |
@bgribaudo as things currently stand I don't think it would make sense to go with this proposal, for the reasons detailed above - do you plan on continuing the discussion here? Note that #28633 still has an open question of allowing behavior to be specified per command, and that I've adopted your suggestion for a lightweight DbCommand-replacement in that proposal. |
Hi @roji! Thanks for taking the time to discuss this topic. I agree that the cases we’ve been talking about aren’t essential to the major use case for the command sets idea (though I’d submit they are valid minor use cases of that idea) and that there seems to always be some kind of work around (generating placeholder result sets, schema detection, etc.) to make the minor use cases workable. (Actually, I don’t think we’ve ever disagreed on either of these points—just on how to factor in the significance of the minor use cases.) For the whopping grand 2 cents it’s worth, when one of the minor cases comes into play, I still don’t care for how #28633 necessitates extra development work and possibly database/transport costs (e.g. when placeholder queries/result sets are involved) when the database server is already providing the needed information for free—just it’s not exposed by the API. Without access to that info, it’s not always possible to provide generic workarounds for these minor cases, meaning that each developer who encounters one of them needs to become vested in the details of the situation then reinvent the wheel to solve it. In contrast, with something like this proposal, the challenges associated with the minor use cases (almost completely) go away because it’s possible to run just about any valid query in a command set without modification or special handing. For the major use case, that doesn’t need result sets segmented by command, possibly a couple helper methods or extension methods could be provided that provide flattened results—this way, those who want to the flattening have it but those who need the lower level details each don’t have to reinvent the wheel. Again, for the two cents its’s worth… :-) |
Thanks for the response @bgribaudo, I appreciate the proposal and the arguments. I think we're more or less in sync about everything - the difference is in how much complexity I consider the non-flattened API to add (i.e. a bit more than you) and the rareness of the cases where I consider non-flattened resultsets to be useful (again, probably quite a bit more than you :)). It's always possible to allow all options by providing multiple layers for users to choose from, but that also has its complexity price as users have to wrestle with understanding when and why the low-level layer is useful, etc. I think we should really aim for as much simplicity as possible without sacrificing actual functionality. In some way, ADO.NET already suffers from being a "low-level API" that doesn't look like it was conceived for end-users (as opposed to ORMs and other layers). Hopefully we can change that somewhat. Any any case, thanks again for engaging and for all of your ideas! I'll go ahead and close this for now although if you wish to continue the conversation please don't hesitate. |
Thanks for the good conversation, @roji! |
Likewise @bgribaudo, hope we'll do it again soon! |
This is offered in the friendliest way possible as an alternative to #28633 (this general idea was discussed in that thread but didn't fit the direction being taken for that proposal). Both have the same high-level goal but differ in the details of how they achieve that goal.
Need/Problem Statement
Within the realm of SQL-based relational database engines:
SELECT * FROM TableA; SELECT * FROM TableB
).The first two scenarios are supported by ADO.Net's DbCommand. However, no public ADO.Net API currently exists which allows multiple DbCommands to be grouped into a single wire protocol level message. This proposal suggests one way to add this functionality.
Design Strategy
Commands are passed in as independent units. Results are returned in similar fashion—as independent units, one unit per command passed in.
In general, the consumer's experience in retrieving results should be (almost) identical to the experience they'd have if they executed each command separately. For example, the consumer should be able to easily determine which result set (or result sets, as a single command can possibly return several) came from which command, affected row counts should be determinable on a per-command basis and the reader-level optimizations achieved by methods like
ExecuteScalar
as well asCommandBehavior
options should be available on a per-command basis (as different optimizations may apply to different commands in the set).API Proposal Draft
Notes
CommandBehavior
may optionally be passed when adding commands to the set and when reading results. Depending on the database server, some behaviors (e.g.KeyInfo
,SchemaOnly
) affect the query executed, so need to be specified before execution (e.g. when the command is added to the set). Other behaviors may only be of interest to the data reader (for example, so it can optimized reading:SequentialAccess
,SingleResult
,SingleRow
) and so only need to be passed at the time of reading. If a reading-specific behavior is passed when a command is added or an execution-level behavior is passed during reading, an exception should be thrown.Example
The text was updated successfully, but these errors were encountered: