-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Expose gRPC service to handle provisioning #758
Conversation
e8d71b5
to
5daab08
Compare
Where did all the provisioner.ts code move to? Provisioning the logs & metadata. |
The removed code was a provision hook for existing Indexers, and is no longer needed. We still provision logs/metadata for new indexers in the main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of my comments are conversational. Happy to proceed with the current PR as its mainly establishing the foundation, but I'd love to get a clearer picture of where we're headed with the service with schema editing and future QueryApi migrations in mind.
Also, having multiple gRPC dispatch destinations from the same server running on one port was a surprise. Very cool!
FAILED = 3; | ||
} | ||
|
||
message ProvisionResponse { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my assumption is that the DataLayer service will also be the location where we intend to action schema editing. So, jsut wanted to know if that's indeed that we would end up doing? If so, perhaps it makes sense to also have some standardized DataAction field which states different things we might be doing:
- Publishing new indexer
- Actioning schema update
- Migrating indexer from one version to another (e.g. An update like adding logs tables)
I believe our intended behavior in Coordinator would be different based on if there is a failure in any of the above cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my intention is for this service to manage all "Data Layer" related tasks; provisioning, deprovisioning, deleting data etc.
I'm not sure I follow what this DataAction field would do? Regardless, I'm usually against adding things we don't need right now. I'd prefer to do that when we add those features, as we'll probably get the shape wrong if we do it pre-maturely.
@@ -2,6 +2,53 @@ import crypto from 'crypto'; | |||
import { type StartExecutorRequest__Output } from '../generated/runner/StartExecutorRequest'; | |||
import { LogLevel } from '../indexer-meta/log-entry'; | |||
|
|||
class BaseConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea! We can easily enrich the original config for different purposes.
string function_name = 2; | ||
} | ||
|
||
enum ProvisioningStatus { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this status be used for any other data layer tasks we will need to do in the future? Such as schema edits or migrating Indexers between versions. We can rename this to TaskStatus or something. Or, if this status is specific to provisioning, should we distinguish between retryable failure modes and manual intervention failure modes? Such as failing to create a PG DB vs Hasura failed to track foreign keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TaskStatus
is a good idea, but I'm not quite sure if we'd re-use this for the other methods. I can rename if if/when it does get re-used :).
In regards to failure modes, yes, we should eventually have that. But I think that also can be added later. I wanted to go for a 1:1 refactor, just so I can have Coordinator control provisioning, we can build on the robustness later.
createDataLayerService(undefined, tasks).Provision(call, callback); | ||
}); | ||
|
||
it('should return ALREADY_EXISTS if the task has already completed', (done) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of scenario is this testing? Is this Coordinator tries to start the same task on its next loop or Coordinator trying to start this task when it first connects to the DataLayer service? We are going to store the state of an indexer's data in Redis still as a source of truth right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really testing any specific scenario. It's unlikely this will happen, but provides a fail safe in case we do something wrong in Coordinator :)
public failed: boolean; | ||
public pending: boolean; | ||
public completed: boolean; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are there three different booleans exposed instead of one Status enum or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah true, I could just use an enum
.
I started the refactor, but think the current state actually reads better, i.e. we can just do task.completed
or task.pending
. But with a enum
we need to pass the type around and do task.status === TaskStatus.COMPLETED
.
Which do you think is better? I'll merge as is but can refactor later if you prefer the other :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. I guess it depends on what we end up doing with this list of statuses. If we want to add another one, it would be more difficult. Similarly, if we are passing them around, the parameter type will be a simple boolean, which is fine, but I think TaskStatus more readily describes what this boolean actually is, without needing to encode that in the variable name itself. If we intend to make it a part of another enum, that will also help. Basically, if the usage is simple, I think I also am cool with multiple values. But if we want to do more with them or use them frequently, I feel an enum makes more sense.
|
||
type ProvisioningTasks = Record<string, ProvisioningTask>; | ||
|
||
const generateTaskId = (accountId: string, functionName: string): string => `${accountId}:${functionName}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will ensure that DataLayer cannot spin up more than one task for one indexer. I believe this should make sense even going forward when we add more tasks.
tasks: ProvisioningTasks = {} | ||
): DataLayerHandlers { | ||
return { | ||
CheckProvisioningStatus (call: ServerUnaryCall<CheckProvisioningStatusRequest__Output, ProvisionResponse>, callback: sendUnaryData<ProvisionResponse>): void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this return if a task exists or if the indexer was ever provisioned? If it is the former, maybe we can rename this to CheckProvisioningTaskStatus?
The way I currently envision how this service is used is that Coordinator reads the permanent Indexer provisioning status from Redis. It then starts a provisioning task if needed. We need to handle partial provisioning failures somehow since not all provisioning errors are retryable and the service could also just restart in the middle of provisioning something successfully. Previously, Runner was doing this unilaterally by checking Postgres and Hasura. I'm wondering if that logic should still exist in DataLayer. It's obviously easier to write, but it would also be nice if we had that information stored in Redis for Coordinator to instead be the one to make that decision.
Then we need to extend this to have other tasks like I mentioned earlier. So, we should probably figure out a JSON or something else format to store in Redis which makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only returns relevant information if the task exists, so good point, I will rename to CheckProvisioningTaskStatus
.
Perhaps, we should be returning be returning a random taskId
, which Coordinator then uses to check the status of. That would be more clear, but there wouldn't be any reason to have multiple tasks for a given indexer so maybe not 🤔 I'll take a deeper look at this when I integrate with Coordinator.
Runner was doing this unilaterally by checking Postgres and Hasura. I'm wondering if that logic should still exist in DataLayer
Yes this still exists, if you call Provision
on an Indexer which has already been provisioned, it will check Pg/Hasura and return ALREADY_EXISTS
.
return; | ||
}; | ||
|
||
provisioner.fetchUserApiProvisioningStatus(provisioningConfig).then((isProvisioned) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So seeing this, it seems like the design is to have Coordinator call provision on all indexers and have DataLayer check if it did exist, and cache the result of that as a successful run of the task. Do you think storing the status of an indexer's data in Redis makes sense over using our current function? My issue with the current function is that reading from Hasura is slow and prone to transient failures. Especially when many requests to metadata are made. I think Hasura must be doing something behind the scenes to ensure metadata state across all instances are mathcing, which slows down any calls to it. It makes sense to have Hasura be our source of truth though. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should definitely store the provisioning state in Redis, but we should still have this here as a safe guard incase there is a bug in Coordinator which makes us call this again.
Even though we are the only consumers of this API, we may still make mistakes, so we should design it so it can't be abused 😅.
Also, sorry I took so long with the review. Really wanted to look at it during the trip but when I saw the PR content, I wanted to be thorough with it. 😅 |
This PR adds the
DataLayerService
, which allows for provisioning the "Data Layer" of a QueryAPI Indexer. Data Layer in this case refers to the Postgres Database and Hasura GraphQL endpoint. I wanted to move away from the more general "provisioning" term as it is a bit overloaded, and may get conflated with other provisioning steps related to Executors (which may become more involved with Firecracker). This PR doesn't replace the existing provisioning flow - yet, I'll do that in a follow up PR.As provisioning can take some time, rather than keeping the request/connection open while we wait, the
Provision
command triggers the process, and returns immediately. TheCheckProvsioningStatus
command will then be used to poll its completion.On top the new service, the following changes have been made:
server/
directory has been slightly restructured to accommodate additional services, i.e.DataLayerService
, as a result you'll see lots ofrunner
files being moved aroundProvisioner
now takesProvisioningConfig
, which is a subset ofIndexerConfig
. On the gRPC side we don't have a fullIndexerConfig
, nor do we need it, so I created the subset to make it possible to provision with only the data we need.IndexerConfig extends ProvisioningConfig
so it's not a breaking change to the existing use-cases.Provisioner