Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass all specs to ingestion by file #154

Merged
merged 28 commits into from
Mar 21, 2019

Conversation

tims
Copy link
Contributor

@tims tims commented Mar 6, 2019

Ingestion now takes a --workspace option which should be a path to a directory containing importJobSpecs.yaml file.
This yaml file should align with a new feast.types.ImportJobSpecs protobuf which bundles all the used specs by an import job into a single proto.
The workspace directory is also used to write out the errors if no errors storageSpec is provided.

This importJobSpecs.yaml is be generated by core before starting the job. It is not intended to be written by humans (thought it's useful for testing). It is intended to make it easier for the core api to make all the necessary (and only the necessary) specs available to ingestion. Without Ingestion needing to call back to core. This make deploying feast simpler. It also could make debugging what happened with a job a lot easier.

An alternative approach was to have spec in a separate file, but I think a single file this is easier and neater.

The proto looks like:

message ImportJobSpecs {
  string jobId = 1;
  feast.specs.ImportSpec importSpec = 2;
  repeated feast.specs.EntitySpec entitySpecs = 3;
  repeated feast.specs.FeatureSpec featureSpecs = 4;
  repeated StorageSpec servingStorageSpecs = 5;
  repeated StorageSpec warehouseStorageSpecs = 6;
  StorageSpec errorsStorageSpec = 7;
}

Here is an example of the corresponding yaml:

importSpec: 
  type: file.csv
  options:
    path: /tmp/file.csv
  entities:
    - testEntity
  schema:
    entityIdColumn: id
    timestampValue: 2018-09-25T00:00:00.000Z
    fields:
      - name: timestamp
      - name: id
      - featureId: testEntity.none.testInt64
servingStorageSpecs:
  - id: TEST_SERVING
    type: serving.mock
    options: {}
warehouseStorageSpecs:
  - id: TEST_WAREHOUSE
    type: warehouse.mock
    options: {}
entitySpecs:
  - name: testEntity
    description: This is a test entity
    tags: []
featureSpecs:
  - id: testEntity.day.testInt64
    entity: testEntity
    granularity: DAY
    name: testInt64
    owner: [email protected]
    description: This is test feature of type integer
    uri: https://example.com/
    valueType: INT64
    tags: []
    options: {}

@tims
Copy link
Contributor Author

tims commented Mar 6, 2019

Also worth noting, ImportJobSpec is not a proto for public consumption, just for communicating between core and ingestion.

It is a great idea for us to find a way for users to define multiple feature specs and an import spec all in one yaml file, but that is a completely independent problem (and would suit a much more concise format than this one is). So let's stay on topic and not get caught on that.

@tims
Copy link
Contributor Author

tims commented Mar 8, 2019

/hold cancel

@tims
Copy link
Contributor Author

tims commented Mar 8, 2019

@pradithya ready for review

@tims
Copy link
Contributor Author

tims commented Mar 10, 2019

/retest

@tims
Copy link
Contributor Author

tims commented Mar 10, 2019

I original include additional changes that limited to single data store for serving and warehouse, but I removed that to make this easier to review.

Instead: the additional core changes include:

Core can now also be configured with the following additional properties:

feast.store.serving.type = ${STORE_SERVING_TYPE:}
feast.store.serving.options = ${STORE_SERVING_OPTIONS:{}}
feast.store.warehouse.type = ${STORE_WAREHOUSE_TYPE:}
feast.store.warehouse.options = ${STORE_WAREHOUSE_OPTIONS:{}}
feast.store.errors.type = ${STORE_ERRORS_TYPE:}
feast.store.errors.options = ${STORE_ERRORS_OPTIONS:{}}

If present, these will be substituted when a feature spec is applied that does not provide any store id.
The serving and warehouse stores are registered on startup, the same way a use might register a store. So if you only want these stores, you never need to provide any store references in the feature specs.

The following core properties are now mandatory:

feast.jobs.workspace=${JOB_WORKSPACE}

And feast.jobs.coreUri has been removed as it is no longer necessary for ingestion to know the uri of core.

This workspace must point to a directory in core and ingestion have read and write access. Core will create a subdirectory per job and provide that as the ingestion job's workspace directory, then write a importJobSpecs.yaml file to it. Ingestion will read that file to configure start the beam pipeline.

Additionally, if the importErrorsSpecs proto does not include a errorsStorageSpec in it, ingestion will default to writing errors with type "file.json" to the it's workspace/errors directory.

Question for reviewers. Core and ingestion overload the name workspace, for core it means the base path, for ingestion it means the {base path}/{jobId}. Is this acceptable or should we change configuration of core to be baseWorkspace?

@tims
Copy link
Contributor Author

tims commented Mar 10, 2019

/hold

updating the helm chart

@tims
Copy link
Contributor Author

tims commented Mar 16, 2019

/hold

there's an issue where core is not using google-cloud-nio to access the gcs filesystem.

@tims
Copy link
Contributor Author

tims commented Mar 16, 2019

I'm getting these errors:

   extendedStackTrace: [17]    
   localizedMessage:  "Provider "gs" not installed"    
   message:  "Provider "gs" not installed"    
   name:  "java.nio.file.FileSystemNotFoundException"    
   extendedStackTrace: [
    0: {
     class:  "java.nio.file.Paths"      
     exact:  false      
     file:  "Paths.java"      
     line:  147      
     location:  "?"      
     method:  "get"      
     version:  "1.8.0_181"      
    }
    1: {
     class:  "feast.core.util.PathUtil"      
     exact:  false      
     file:  "PathUtil.java"      
     line:  33      
     location:  "classes!/"      
     method:  "getPath"      
     version:  "?"      
    }

This does not make sense to ,me since I've added the google-cloud-nio library and it passes the tests. I can reproduce the error in the tests if I remove it from the pom.xml. :(

Any suggestions for why this might not be included in build?

@tims
Copy link
Contributor Author

tims commented Mar 16, 2019

out of paranoia I confirmed that it does exist in BOOT-INF/lib/google-cloud-nio-0.83.0-alpha.jar if I unzip the core jar. hmmmm... 🤔

@pradithya
Copy link
Collaborator

feast.store.serving.type = ${STORE_SERVING_TYPE:}
feast.store.serving.options = ${STORE_SERVING_OPTIONS:{}}
feast.store.warehouse.type = ${STORE_WAREHOUSE_TYPE:}
feast.store.warehouse.options = ${STORE_WAREHOUSE_OPTIONS:{}}
feast.store.errors.type = ${STORE_ERRORS_TYPE:}
feast.store.errors.options = ${STORE_ERRORS_OPTIONS:{}}

Are these mandatory? I am not able to start core without setting them

@pradithya
Copy link
Collaborator

I got following error when trying to submit job and the workspace is in GCS

Optional[Exception in thread "main" java.lang.UnsupportedOperationException: GCS objects aren't available locally
	at com.google.cloud.storage.contrib.nio.CloudStoragePath.toFile(CloudStoragePath.java:303)
	at feast.ingestion.util.ProtoUtil.decodeProtoYamlFile(ProtoUtil.java:37)
	at feast.ingestion.config.ImportJobSpecsSupplier.create(ImportJobSpecsSupplier.java:45)
	at feast.ingestion.config.ImportJobSpecsSupplier.get(ImportJobSpecsSupplier.java:57)
	at feast.ingestion.ImportJob.mainWithResult(ImportJob.java:121)
	at feast.ingestion.ImportJob.main(ImportJob.java:109)]
	at feast.core.job.direct.DirectRunnerJobManager.runProcess(DirectRunnerJobManager.java:113)

@tims
Copy link
Contributor Author

tims commented Mar 17, 2019

I got following error when trying to submit job and the workspace is in GCS

Optional[Exception in thread "main" java.lang.UnsupportedOperationException: GCS objects aren't available locally
	at com.google.cloud.storage.contrib.nio.CloudStoragePath.toFile(CloudStoragePath.java:303)
	at feast.ingestion.util.ProtoUtil.decodeProtoYamlFile(ProtoUtil.java:37)
	at feast.ingestion.config.ImportJobSpecsSupplier.create(ImportJobSpecsSupplier.java:45)
	at feast.ingestion.config.ImportJobSpecsSupplier.get(ImportJobSpecsSupplier.java:57)
	at feast.ingestion.ImportJob.mainWithResult(ImportJob.java:121)
	at feast.ingestion.ImportJob.main(ImportJob.java:109)]
	at feast.core.job.direct.DirectRunnerJobManager.runProcess(DirectRunnerJobManager.java:113)

This is strange, because I got a similar error from core, when it was trying to write the specs, you got it ingestion when it was trying to read them.

@tims
Copy link
Contributor Author

tims commented Mar 17, 2019

feast.store.serving.type = ${STORE_SERVING_TYPE:}
feast.store.serving.options = ${STORE_SERVING_OPTIONS:{}}
feast.store.warehouse.type = ${STORE_WAREHOUSE_TYPE:}
feast.store.warehouse.options = ${STORE_WAREHOUSE_OPTIONS:{}}
feast.store.errors.type = ${STORE_ERRORS_TYPE:}
feast.store.errors.options = ${STORE_ERRORS_OPTIONS:{}}

Are these mandatory? I am not able to start core without setting them

They are both optional. Just fixed.
A serving store still needs to be registered before feature specs can pass validation

@pradithya
Copy link
Collaborator

This is strange, because I got a similar error from core, when it was trying to write the specs, you got it ingestion when it was trying to read them.

For me, ImportJobSpec is successfully uploaded to GCS, but ingestion is unable to access it with above error.

Note that, I have to change following line when passing the workspace

    commands.add(option("workspace", workspace.toUri().toString()));

@tims
Copy link
Contributor Author

tims commented Mar 19, 2019

/hold cancel

@pradithya
Copy link
Collaborator

/lgtm

@zhilingc
Copy link
Collaborator

/approve

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zhilingc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot feast-ci-bot merged commit 6646402 into feast-dev:master Mar 21, 2019
@zhilingc zhilingc mentioned this pull request Mar 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants