-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass all specs to ingestion by file #154
Conversation
… whenever they are set, ignore what the feature specs say
Also worth noting, ImportJobSpec is not a proto for public consumption, just for communicating between core and ingestion. It is a great idea for us to find a way for users to define multiple feature specs and an import spec all in one yaml file, but that is a completely independent problem (and would suit a much more concise format than this one is). So let's stay on topic and not get caught on that. |
…uch change for one PR
…the importJobSpecs instead of options
…that to ingestion
…use stores with a test
/hold cancel |
@pradithya ready for review |
/retest |
I original include additional changes that limited to single data store for serving and warehouse, but I removed that to make this easier to review. Instead: the additional core changes include: Core can now also be configured with the following additional properties:
If present, these will be substituted when a feature spec is applied that does not provide any store id. The following core properties are now mandatory:
And This workspace must point to a directory in core and ingestion have read and write access. Core will create a subdirectory per job and provide that as the ingestion job's workspace directory, then write a importJobSpecs.yaml file to it. Ingestion will read that file to configure start the beam pipeline. Additionally, if the importErrorsSpecs proto does not include a errorsStorageSpec in it, ingestion will default to writing errors with type "file.json" to the it's workspace/errors directory. Question for reviewers. Core and ingestion overload the name workspace, for core it means the base path, for ingestion it means the {base path}/{jobId}. Is this acceptable or should we change configuration of core to be baseWorkspace? |
/hold updating the helm chart |
/hold there's an issue where core is not using google-cloud-nio to access the gcs filesystem. |
I'm getting these errors:
This does not make sense to ,me since I've added the google-cloud-nio library and it passes the tests. I can reproduce the error in the tests if I remove it from the pom.xml. :( Any suggestions for why this might not be included in build? |
out of paranoia I confirmed that it does exist in BOOT-INF/lib/google-cloud-nio-0.83.0-alpha.jar if I unzip the core jar. hmmmm... 🤔 |
Are these mandatory? I am not able to start core without setting them |
I got following error when trying to submit job and the workspace is in GCS
|
This is strange, because I got a similar error from core, when it was trying to write the specs, you got it ingestion when it was trying to read them. |
…res will fail validation if none is registered
They are both optional. Just fixed. |
For me, ImportJobSpec is successfully uploaded to GCS, but ingestion is unable to access it with above error. Note that, I have to change following line when passing the workspace
|
ingestion/src/main/java/feast/ingestion/config/ImportJobSpecsSupplier.java
Show resolved
Hide resolved
core/src/main/java/feast/core/job/direct/DirectRunnerJobManager.java
Outdated
Show resolved
Hide resolved
/hold cancel |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: zhilingc The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Ingestion now takes a
--workspace
option which should be a path to a directory containing importJobSpecs.yaml file.This yaml file should align with a new
feast.types.ImportJobSpecs
protobuf which bundles all the used specs by an import job into a single proto.The workspace directory is also used to write out the errors if no errors storageSpec is provided.
This importJobSpecs.yaml is be generated by core before starting the job. It is not intended to be written by humans (thought it's useful for testing). It is intended to make it easier for the core api to make all the necessary (and only the necessary) specs available to ingestion. Without Ingestion needing to call back to core. This make deploying feast simpler. It also could make debugging what happened with a job a lot easier.
An alternative approach was to have spec in a separate file, but I think a single file this is easier and neater.
The proto looks like:
Here is an example of the corresponding yaml: