Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: always enable sql_instances maintenance #95355

Merged
merged 2 commits into from
Jan 21, 2023

Conversation

dt
Copy link
Member

@dt dt commented Jan 17, 2023

Fixes #95571.

Previously the system.sql_instances table was only maintained by SQL servers that were operating in "pod" mode, i.e. not in mixed KV and SQL process nodes, where KV-level liveness and gossip provides an alternative means of node discovery that can be used by the SQL layer when searching for other SQL instances. However this inconsistency makes writing correct remote-node discovery and interaction SQL-level code difficult: in some cases such code needs to consult the instances list, and in some the KV liveness store, which when combined with complexities of doing so around initialization, dependency-injection, etc can become hard to maintain.

Additionally such a design precludes a cluster where some SQL instances are in mixed KV nodes and some are not, as the non-KV nodes would have no way discover the KV ones. Such deployments are not currently possible but could be in the future.

Instead, this change enabled maintenance of the sql_instances table by all SQL servers, whether running in their own processes or embedded in a KV storage node process. This paves the way for making the means of discovery of SQL servers uniform across all SQL server types: they will all be able to simply consult the instances list, to find any other SQL servers, regardless of where those SQL servers are running.

A follow-up change could simplify DistSQLPhysicalPlanner, specifically the SetupAllNodesPlanning method that has two different implementations due to the previous inconsistency in the available APIs.

Release note: none.
Epic: CRDB-14537

@dt dt requested review from andreimatei, knz and ajwerner January 17, 2023 14:35
@dt dt requested a review from a team as a code owner January 17, 2023 14:35
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't so straightforward, is it? The existing instance infrastructure is going to lease an ID. That ID is not going to be the same as the node ID. That's going to be very confusing. I think we need an implementation of instance storage which always uses the node ID or something like that.

@dt
Copy link
Member Author

dt commented Jan 17, 2023

That ID is not going to be the same as the node ID. That's going to be very confusing.

I... guess? I donno, I feel like we need a pretty good reason to make it behave differently and I'm not sure this is one?
Also, if you imagine a world where we have some fixed pool of mixed KV storage and SQL node processes, and then an elastic pool of SQL instances on top of that -- either in separate processes that get started and stopped for load reasons or maybe we start spinning up additional, single-use in-process SQL servers for admin tasks which is something we've thought we might do for UA -- it seems likely that you could end up with node ID / instance ID conflicts and the idea that they're always equivalent is impossible anyway.

It seems like we should just say kv node IDs are kv node IDs, and are meaningful in the realm of kv nodes, and sql server ids are sql server ids and are meaningful in the world of sql servers -- and the sql_instances table is the source of truth for who has which ID -- and that they have nothing to do with each other?

@knz
Copy link
Contributor

knz commented Jan 17, 2023

That ID is not going to be the same as the node ID. That's going to be very confusing.

It's fine. That is what we were planning to do.

@ajwerner
Copy link
Contributor

Let's let @knz weigh in. I'm fine with making them different, but it leads to some questions. In particular, it may become possible for the instance ID to change throughout the life of the process. With some more work, we could make it so that that doesn't really happen, but that's going to take some real work.

@dt
Copy link
Member Author

dt commented Jan 17, 2023

I guess I can make it the same for now if it lets us just merge this sooner rather than later and then have that debate in isolation. Let me see how that'll look.

@dt dt force-pushed the sql-instances branch 3 times, most recently from 1761922 to e033ed3 Compare January 17, 2023 15:30
@knz
Copy link
Contributor

knz commented Jan 17, 2023

it may become possible for the instance ID to change throughout the life of the process

No, it's the other way around. We need to inject the node ID as instance ID explicitly. This is also what #84602 is about.

@dt
Copy link
Member Author

dt commented Jan 17, 2023

Okay, IDs are fixed to nodeIDs for now with a TODO suggesting we remove that at a later point. RFAL.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @dt)


pkg/server/server_sql.go line 505 at r1 (raw file):

	if isMixedSQLAndKVNode {
		cfg.podNodeDialer = cfg.nodeDialer

I believe this special case shouldn't be needed. Is it?


pkg/server/server_sql.go line 813 at r1 (raw file):

		nodeDialer = cfg.nodeDialer
		// TODO(dt): any reason not to just always use the instance reader? And just
		// pass it directly instead of making a new closure here?

I don't understand your question. How does the instance reader know about the decommissioned bit?


pkg/server/server_sql.go line 837 at r1 (raw file):

			instanceIDs := make([]roachpb.NodeID, len(instances))
			for i, instance := range instances {
				instanceIDs[i] = roachpb.NodeID(instance.InstanceID)

How does this code know how to skip over decommissioned nodes?
(The answer to this question is probably what answers the one above too.)


pkg/sql/sqlinstance/instancestorage/instancestorage.go line 187 at r1 (raw file):

			// and avoiding conflicts. This is true today but may not be in more
			// complex deployments.
			if nodeID != 0 {

Please do not rely on a sentinel value as a conditional; instead, use a fixInstanceId bool parameter or something equivalently verbose, with an explanation on the function docstring.

Copy link
Member Author

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @dt, and @knz)


pkg/server/server_sql.go line 813 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

I don't understand your question. How does the instance reader know about the decommissioned bit?

The TODO is just to revisit why there are two implementations; I am inclined to agree that they appear different (one skips decom) so I didn't eliminate them now, just left a TODO to evaluate it later.


pkg/server/server_sql.go line 837 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

How does this code know how to skip over decommissioned nodes?
(The answer to this question is probably what answers the one above too.)

no idea; i didn't write or move this, but if it is good enough for sql pods, it seems like it should be good enough for non-pods, which was my point above. if it isn't good enough for pods, then we should fix that, but that's not really what this PR is about.


pkg/sql/sqlinstance/instancestorage/instancestorage.go line 187 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Please do not rely on a sentinel value as a conditional; instead, use a fixInstanceId bool parameter or something equivalently verbose, with an explanation on the function docstring.

This signature was getting cumbersome enough, so I decided I'd rather just split it to have two different public API methods (and then I don't really mind the sentinel value if is strictly package-internal for their common implementation helper)

Copy link
Member Author

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @knz)


pkg/server/server_sql.go line 505 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

I believe this special case shouldn't be needed. Is it?

I don't really know; it was here before and I left it as-is and just pulled the instance storage stuff out of the conditional and left it as-is since I didn't want to change anything not strictly required here given proximity to stability and my desire to get this done asap and move on to the feature work it is blocking.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the system.sql_instances table was only maintained by SQL servers that were operating in "pod" mode, i.e. not in mixed KV and SQL process nodes, where KV-level liveness and gossip provides an alternative means of node discovery that can be used by the SQL layer when searching for other SQL instances. However this inconsistency makes writing correct remote-node discovery and interaction SQL-level code difficult: in some cases such code needs to consult the instances list, and in some the KV liveness store, which when combined with complexities of doing so around initialization, dependency-injection, etc can become hard to maintain.

My opinion is that it seems better to not give shared-process tenants rows in system.sql_instances. I know not everybody seems to agree, but that seems to me like that simple answer. If there are no rows in that table, there are no questions about them. For example, there's no question about whether the instance ID should be the same as the node ID (btw, my opinion on that one is that they should definitely be the same; making them different would be a big regression to me). Another big question is when should these rows be created. This dovetails with an area I've generally felt that it needs some thought/decision: when do shared-process tenant servers get created? I think things are in flux; I think @knz wants to create them dynamically, the first time when a SQL or HTTP or RPC connection comes along to a node for a particular tenant. But, then, if we also do the creation of the sql_instance row at that time (like I think your PR does, right?), DistSQL would not find any instances around it. Which would be different from the non-UA clusters, for the worse.
Yet another question is when should these rows be deleted. We're already forced to deal with node status (alive, dead, decommissioned, etc) - and I'd note that we haven't been known to do a great job at it - so why reinvent it for sql instances?

So, I for one would rather we push on making better abstractions for listing nodes that better hide whether sql_instances is used or not. I see you've run into some difficulties there, and I remember everything being messy, but I imagine that we could make it better.

this inconsistency

There are other inconsistencies between shared-process tenants and SQL pods, particularly around how they deal with their "liveness/sql sessions". SQL pods heartbeat a liveness session, and the instance ID is tied to that session id. If there's no session ID, than do we leave the session_id field of sql_instances empty? I think that's ugly and forces the code dealing with these instances ID to understand the different types of clusters.
There's a related discussion about whether shared-process tenants should also get sessions, in the name of consistency. I, for one, would be even more against that than against giving them sql_instances.

Additionally such a design precludes a cluster where some SQL instances are in mixed KV nodes and some are not, as the non-KV nodes would have no way discover the KV ones. Such deployments are not currently possible but could be in the future.

Indeed. I discussed this a bit in the past with Werner and Rafa, and nobody seemed to care about such clusters. So, I feel like we can leave this hanging.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)

@dt
Copy link
Member Author

dt commented Jan 17, 2023

like I think your PR does, right?

No, I'm registering at server startup, whether I'm a pod or in-proc. I'm registering using the node ID if I'm in-proc, and the next available ID if I'm a pod (since a single cluster never mixes these, conflicts are not a concern).

More broadly, on the other points in this paragraph, I don't really see this line of argument about it being complex/inconsistent/confusing to reason about when there is or isn't a row or whatever -- if anything I think this change makes it simpler: If you started a sql server, i.e. a server that I might want to know about when I'm scheduling a distsql flow, then you put it in the table that I'll look in to find the sql servers, and do so linked to the session that'll let me know that that server is still alive and well. That's it, end of story, "well, if it is..." things we have to handle specially.

It doesn't matter if that sql server is in a kv proc or sql pod to me -- that's irrelevant to this question of what are the sql servers, where (locality-wise) are they, and how can I contact them to ask them to run my flows.

If there's no session ID, than do we leave the session_id field of sql_instances empty

all sql servers -- in proc or standalone -- get sessions. The jobs system, for example, depends on this, as job leases are tied to those sessions. Having this be uniforms good and I don't want to change that. Perhaps we could go further though and unify the idea of a sql liveness session with a sql_instance row, so they become one and the same. But that's for another day. For now I just want to make sql-process-discovery -- including the sessions in use by those processes -- uniform across in-proc and pod.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like I think your PR does, right?

No, I'm registering at server startup, whether I'm a pod or in-proc. I'm registering using the node ID if I'm in-proc, and the next available ID if I'm a pod (since a single cluster never mixes these, conflicts are not a concern).

I'm not sure, but I think "server startup" is not what you think it is. You've changed newSQLServer(), right? That doesn't get called when the node starts; it gets called later, dynamically, when the serverController has decided to instantiate a particular tenant.
Right?

If there's no session ID, than do we leave the session_id field of sql_instances empty

all sql servers -- in proc or standalone -- get sessions. The jobs system, for example, depends on this, as job leases are tied to those sessions. Having this be uniforms good and I don't want to change that. Perhaps we could go further though and unify the idea of a sql liveness session with a sql_instance row, so they become one and the same. But that's for another day. For now I just want to make sql-process-discovery -- including the sessions in use by those processes -- uniform across in-proc and pod.

You're right. I had forgotten how stuff works.
Still, apart for shared-process tenants, I think it's a bad idea to tie the instance id to a particular session.
So when you do this:

		instance, err = s.sqlInstanceStorage.CreateNodeInstance(
			ctx, session.ID(), session.Expiration(), s.cfg.AdvertiseAddr, s.cfg.SQLAdvertiseAddr, s.distSQLServer.Locality, nodeId)

I don't think we should be referencing the session. We don't want this instance ID to ever "expire"; there's no reason for it to expire. If it were to expire, various things that want a non-expired session would stop working for no good reason. It just so happens that the primary user of the instance ID (the unique_rowid() function) does not bother to check the expiration status, but that's simply bug for serverless (#90459). (btw, for stand-alone tenants, the process is shut down when the session expires particularly because the instance ID has become invalid; we don't do the shutdown for the shared-process tenants.).
So if we do give instance IDs to shared-proc tenants, I think their ids should be decoupled from the session (but I maintain my opinion for now that we shouldn't create instance IDs at all).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)

@dt
Copy link
Member Author

dt commented Jan 17, 2023

various things that want a non-expired session would stop working for no good reason

Can you elaborate? why did it expire? if it expires, it implies that this server has stopped being able to reach and update liveness, which implies it is no longer heathy and function. We tie things to liveness session specifically because we want to know just that -- it that node or whatever we tied to it still alive and heathy. If an in-proc SQL server has stopped being able to update its liveness session then I probably don't really want it to keep being the coordinator for my job or assign any new flow processors to it, since something it clearly wrong with it.

@dt
Copy link
Member Author

dt commented Jan 17, 2023

My motivation here is specifically because I need a mechanism for discovery of all the sql servers on which I could schedule a flow and where they are located as well as their in current session, so that I can tie the transfer of work to them to that session -- if they go away while the transfer is happening, I don't want to leave the transferred work orphaned, so I explicitly want to transfer it with a specific liveness session attached.

@dt
Copy link
Member Author

dt commented Jan 17, 2023

We have exactly the thing I need -- the list of all sql servers, where they are located, how to contact them and their associated liveness session - already implemented in the sql_instances table and have all the code to maintain it. And we know it all works and we'll keep maintaining it because we rely on it for MT clusters.

But then I discover that this is only run for MT servers, not in-proc servers. There is no persisted directory of in-proc servers and their current liveness session, so I'd need to go add a new RPC to ask each server for its session, and even getting the list of sql servers now involves hitting raw kv keys instead of a system table. Having this be wildly different just because of the process the server lives in seems prone to making it harder to use, more code and more bugs.

I also don't really see "some things don't check liveness" as a reason not to have the list. like, sure, then we should fix them! but surely having two different ways to check liveness and knowing which you need to check is much worse isn't it?

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate? why did it expire? if it expires, it implies that this server has stopped being able to reach and update liveness, which implies it is no longer heathy and function.

I also don't really see "some things don't check liveness" as a reason not to have the list. like, sure, then we should fix them! but surely having two different ways to check liveness and knowing which you need to check is much worse isn't it?

In shared-proc tenants, for the purposes of unique_rowid(), whether or not the sql server's session is expired or not does not matter. We're able to generate unique IDs just fine whether or not we're able to heartbeat the session record. We're also able to generate unique IDs just fine even if we can't heartbeat the node's liveness record, or even if the local storage device is unresponsive, etc. We shouldn't tie together things that don't need to be tied.
The reason why we have to tie unique_rowid to the session expiration in standalone tenants is because there some other sql server might take your instance ID away, at which point you don't have the ability to generate unique ids any more.

My motivation here is specifically because I need a mechanism for discovery of all the sql servers on which I could schedule a flow and where they are located as well as their in current session, so that I can tie the transfer of work to them to that session -- if they go away while the transfer is happening, I don't want to leave the transferred work orphaned, so I explicitly want to transfer it with a specific liveness session attached.

I'm not generally against the use of the sql sessions. You can use them as far as I'm concerned. I'm not entirely convinced they needed to exist for single-tenant and shared-proc tenant servers, since it seems to me that they very much overlap in functionality with the node liveness record and its epoch, but whatever. So I take back my comment about how I don't like shared-proc tenants having them; I had forgotten that even single-tenant processes have them.
The instance IDs, though, are another matter in my opinion.

We have exactly the thing I need -- the list of all sql servers, where they are located, how to contact them and their associated liveness session - already implemented in the sql_instances table and have all the code to maintain it.

Except I think we don't. Does my note about how I don't think that the sql_instances table is populated at node start time, like I think you assume it is, give you pause?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)

@dt
Copy link
Member Author

dt commented Jan 17, 2023

Does my note about how I don't think that the sql_instances table is populated at node start time, like I think you assume it is, give you pause?

For mixed kv and sql nodes, it looks like it is called at start, isn't it? pkg/server/server.go:891 ? and those are the only ones I'm suggesting changing -- we already enable this for any non-mixed servers.

@dt
Copy link
Member Author

dt commented Jan 17, 2023

I'm not entirely convinced they needed to exist for single-tenant and shared-proc tenant servers, since it seems to me that they very much overlap in functionality with the node liveness

Sure, but why make SQL users deal with the complexity of using one of two different systems depending on where they're running, when they could just always use one that works in both? Why use KV node listing to find sql servers when they're colocated, and use instance table when they're not, vs just always using the instance table? If there's some argument that the instance table, or sql liveness, isn't good enough and we should use the better kv system when it is available instead, that seems like a problem for our pods where it is their only choice. So if it has to be good enough for pods to use, why not just use it all the time? Fewer branches and conditionals and special cases means less code, easier feature development, and fewer bugs.

@ajwerner
Copy link
Contributor

Here's my take: while node IDs and epochs are similar and overlapping, they are not identical and they have different APIs. It's nice to not have to build adapters everywhere -- and -- for better or for worse, we have sql sessions in all the processes. After some listening, I find David's point about wanting to discover these sessions compelling. I also find arguments for uniformity compelling.

What I don't love is that in the short term we'll get confusing new IDs in play. I don't want to have to do the mapping of instance to node IDs while we don't need to. Does anybody expect a world with a mixture of kv nodes and sql pods of the same tenant any time soon?

My proposal is that we indeed leverage the sql instance subsystem in kv nodes to propagate the information in the same way, but we don't do the same leasing dance -- we just plop the node ID into the instance ID slot and state the session. That way it'll behave uniformly on the reading path, but it'll unify the two concepts explicitly in the case where the server is in a kv node.

@andreimatei
Copy link
Contributor

andreimatei commented Jan 17, 2023

Does my note about how I don't think that the sql_instances table is populated at node start time, like I think you assume it is, give you pause?

For mixed kv and sql nodes, it looks like it is called at start, isn't it? pkg/server/server.go:891 ? and those are the only ones I'm suggesting changing -- we already enable this for any non-mixed servers.

I believe the code you point to is only relevant for the sql server corresponding to the system tenant.
All the other tenants are created dynamically, here.

I'm not entirely convinced they needed to exist for single-tenant and shared-proc tenant servers, since it seems to me that they very much overlap in functionality with the node liveness

Sure, but why make SQL users deal with the complexity of using one of two different systems depending on where they're running, when they could just always use one that works in both? Why use KV node listing to find sql servers when they're colocated, and use instance table when they're not, vs just always using the instance table? If there's some argument that the instance table, or sql liveness, isn't good enough and we should use the better kv system when it is available instead, that seems like a problem for our pods where it is their only choice. So if it has to be good enough for pods to use, why not just use it all the time? Fewer branches and conditionals and special cases means less code, easier feature development, and fewer bugs.

In the case of sql instance IDs, I think the differences between these IDs are enough to not make it a good idea for shared-proc tenants to use them:

  1. in dedicated tenants, the IDs are dynamic, whereas in shared-proc we want the node ID (at least I do)
  2. in dedicated tenants, the IDs can be revoked and are tied to a session, whereas in shared-proc they're never revoked and don't have to be tied to a session (and shouldn't be tied, in my view).
  3. in dedicated tenants, we want to shut down the process when the ID is revoked whereas in shared-proc we don't need and don't want that
  4. in dedicated tenants, there's an outside process for deleting dead IDs; for shared-proc we'd have to invent this process inside crdb (whereas we already have it for node statuses and node liveness records).
  5. similarly for creating these IDs; again, it does not look to me like you're getting what you want because, at least the way things currently stand, the instance IDs get written dynamically and that table will be empty when I think you want it to be full

So ultimately it's all subjective and there's tradeoffs, etc, but my opinion is that we'd end up with worse code by trying to hide the differences between the instance IDs between shared-proc and dedicated tenants pods while using the system.instance_ids table versus not using it for shared-proc.

Btw, as far as I'm concerned, I don't think you particularly need my nod of approval here, in particular if you get something that both Rafa and Werner are happy with. I don't feel any ownership here, fyi.
But, again, I think number 5 alone seems like a big problem for you.

@ajwerner
Copy link
Contributor

There is one other thing to note: in a world with repaving, the node ID can get above 15 bits without too much work. Once that becomes true, all claims about unique_rowid() go out the window. I think that's bad. A solution would be to have instance ID diverge from node ID and use the same protocol.

@dt
Copy link
Member Author

dt commented Jan 18, 2023

After a lengthy discussion with @andreimatei and @ajwerner offline, we concluded a few things/observations to state explicitly:

  1. outside of the sql_instances table, there is no way to map a node to a session
  2. we cannot add a node id column to the sessions table because we acquire a session in order to write to the instances table
  3. we care a lot about sql servers having a unique ID because it is used in unique_rowid. we cannot just use a uuid sql server id since using just 15 bits of it for rowed isn't enough entropy if they're random -- we have to deliberately avoid conflicts. similarly sql instance IDs are reused so we don't run out of 15 bit IDs, but that means we need to be careful to track session and expiration and only use an ID while we know it is ours.
  4. if we fix sql server ids to be the same as node ids when those servers are in the same process as a kv node, we're guaranteed they are unique because node ids are unique
  5. an ephemeral sql instance will shutdown if it loses the session on its claimed id, since someone might now be using it; a kv-node-hosted sql server will not, since it has no such fear of its id being stole since it is tied to the node id. it will just get a new liveness session id.
  6. but this means that if a kv-node hosted sql server starts writing a row to sql instances with its current session id, it needs to also update that row if that session changes, since the whole point of writing it is to have a mapping of server ids to sessions.
  7. currently sql code discovers servers one of two ways: getting the kv-backed list from gossip or getting the sql_instances backed list from the instance storage cache. the latter is currently empty in non-MT configs.
  8. sql code currently switches on the tenant being system or not to decide which impl to use; thus a non-system tenant in a UA cluster would try to use the instances table (which prior to this change would be empty and/or crash)
  9. we all agree the sql code should stop doing this and just call the same method all the time. if we wanted to have that method have different implementations below SQL that read the table or not, we could but...
  10. we can't/don't want implement the API that has session_id in it backed by gossip, since if there is more than one sql server in a node, each with their own sql session, you would need to... put them all in gossip? eww?
  11. the sql_instances table already does the thing we want -- list all the running sql servers, how to contact them, their localities and their sessions, if we just populate it unconditionally; the changes we'd need to make to gossip or something else to get the same result seem worse.
  12. there's a caveat above that the instances table will only contain running (started) sql servers; this is all the servers in pod-based MT, or all the servers in non-UA however currently in UA there may be potential sql servers not yet running, i.e. nodes that have not yet started them due to lazy init of the sql server on first connection / rpc. These unstarted servers are not in this list with a change like this but a discovery mechanism based on gossip/listing nodes would include them -- do we want them?
    12a) something like jobs absolutely does not; an unstarted server is as good as no server since it won't notice if someone writes a job for it.
    12b) a distsql flow might -- it could elect to schedule on it, but it might not want to if that proc would see a boot time.

I'm sure I'm missing some things since we chatted for the better part of two hours and this is a brief summary from recollection, but I think the big take-aways:
A) we do, in fact, like the instances table and the infra around it. we're going to make it correct in all sql servers, not just MT pods. This PR was a start there, but required an additional change to update the row if the session changes in a long-lived node (since added). The alternatives to a SQL table end up worse, as is making a wholly new table. All SQL code will just use the instance reader and messy conditionals go away.
B) as it stands this will only list running SQL servers, which is all of them for default 23.1 configs (non-UA or MT-pods). For UA, it will only include started servers, but our discussion suggested we should have another -- separate -- discussion of lazy starts.
C) In the further future, perhaps we should pull a unique-id-prefix-generator-reservation table into its own thing, so some point in sql startup or even first use we do some sort of lease of a 15 bit int for this function, and then don't need to keep our sql server ids constrained to 15 bits while also unique, and could just use UUIDs i.e. the session ID, and then sessions and instances unify into a single table. We need to do this "soon" anyway as node IDs could overflow 15 bits quickly in repaving as @ajwerner points out.

@dt
Copy link
Member Author

dt commented Jan 18, 2023

Added a test showing session ID being kept up to date after the session is expired both when the instance row is removed in the interim and when it is not (they're the same, due to Put, but seemed good to show both)

@knz
Copy link
Contributor

knz commented Jan 19, 2023

what in this PR needs to change in light of the recent mini-RFC?

@dt
Copy link
Member Author

dt commented Jan 19, 2023

what in this PR needs to change in light of the recent mini-RFC?

My read is nothing. This diff is about populating the tables when you are running and that RFC is about when you are or are not running, but doesn't change what we do here.

@dt
Copy link
Member Author

dt commented Jan 19, 2023

So after discussions today/yesterday/etc -- are we good here? Anything else to do?

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 4 files at r2, 1 of 1 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt)


pkg/sql/sqlinstance/instancestorage/instancestorage.go line 176 at r4 (raw file):

	nodeID roachpb.NodeID,
) (instance sqlinstance.InstanceInfo, _ error) {

nit: empty line

dt added 2 commits January 20, 2023 21:27
Previously the system.sql_instances table was only maintained by SQL
servers that were operating in "pod" mode, i.e. not in mixed KV and SQL
process nodes, where KV-level liveness and gossip provides an alternative
means of node discovery that can be used by the SQL layer when searching
for other SQL instances. However this inconsistency makes writing correct
remote-node discovery and interaction SQL-level code difficult: in some
cases such code needs to consult the instances list, and in some the KV
liveness store, which when combined with complexities of doing so around
initialization, dependency-injection, etc can become hard to maintain.

Additionally such a design precludes a cluster where some SQL instances
are in mixed KV nodes and some are not, as the non-KV nodes would have
no way discover the KV ones. Such deployments are not currently possible
but could be in the future.

Instead, this change enabled maintenance of the sql_instances table by
all SQL servers, whether running in their own processes or embedded in
a KV storage node process. This paves the way for making the means of
discovery of SQL servers uniform across all SQL server types: they will
all be able to simply consult the instances list, to find any other SQL
servers, regardless of where those SQL servers are running.

A follow-up change could simplify DistSQLPhysicalPlanner, specifically
the SetupAllNodesPlanning method that has two different implementations
due to the previous inconsistency in the available APIs.

Release note: none.
Epic: none.
Copy link
Member Author

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

bors r=knz

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)


pkg/sql/sqlinstance/instancestorage/instancestorage.go line 176 at r4 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

nit: empty line

fixed, thanks!

@craig
Copy link
Contributor

craig bot commented Jan 21, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jan 21, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

server: need to maintain sql_instances in all cases
6 participants