Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unique ingestion id for all batch ingestions #656

Merged
merged 2 commits into from
May 3, 2020

Conversation

zhilingc
Copy link
Collaborator

@zhilingc zhilingc commented Apr 26, 2020

What this PR does / why we need it:
This is a split-off off #612 that introduces the model changes made in that PR in a more digestible chunk.

Adds a unique identifier for all discrete batch loads of data into feast for re-referencing the set of data later on.

Does this PR introduce a user-facing change?:

Adds dataset_id field into BQ

@zhilingc
Copy link
Collaborator Author

/test test-end-to-end

@zhilingc
Copy link
Collaborator Author

/test test-end-to-end-batch

@woop woop mentioned this pull request Apr 29, 2020
@woop
Copy link
Member

woop commented Apr 29, 2020

I am personally happy to merge this at any time. I'll leave it open for a little bit.

@ches
Copy link
Member

ches commented Apr 30, 2020

Fine from me functionally. I have the exact same reaction as @woop on #612 (comment) – the name "dataset" feels off to me for this, and my first inclination was also ingestion_id.

@zhilingc
Copy link
Collaborator Author

zhilingc commented May 1, 2020

@ches I just felt it was more intuitive to provide a dataset id over an ingestion id, and in the old PR I originally offered the functionality for the user to be able to set it, but it was punted to when we integrate these ids into batch retrieval. But i can't say i'm too opinionated about this. if you guys think it should be ingestion_id I'm happy to change it.

@woop
Copy link
Member

woop commented May 1, 2020

@ches I just felt it was more intuitive to provide a dataset id over an ingestion id, and in the old PR I originally offered the functionality for the user to be able to set it, but it was punted to when we integrate these ids into batch retrieval. But i can't say i'm too opinionated about this. if you guys think it should be ingestion_id I'm happy to change it.

I think your previous concern was around the retrieval of statistics based on an ingestion filter being unintuitive? I can see that.

We are starting to overload this ingestion term as well, with IngestionJob in the Python SDK referring to the population jobs. There is a bit of disconnect between those two ingestion flows right now, but I do think they can be one flow (ingest, populate) in the future so keeping the names the same seems reasonable to me.

So anyway, my preference is ingestion_id, but I also do think dataset_id would work in some sense.

@zhilingc
Copy link
Collaborator Author

zhilingc commented May 1, 2020

Renamed to ingestion_id and rebased on master.

@woop woop changed the title Add unique dataset id for all batch ingestions Add unique ingestion id for all batch ingestions May 2, 2020
@woop
Copy link
Member

woop commented May 3, 2020

/lgtm

@woop woop added the kind/feature New feature or request label May 3, 2020
@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: woop, zhilingc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot feast-ci-bot merged commit 3d9bafd into feast-dev:master May 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants