Python SDK create_dataset is actually creating dataset in BQ #201

budi · 2019-05-27T03:12:25Z

Expected Behavior

"Dataset" that was meant on the sdk is a collection of features selected by the user. While this term is also used in BigQuery, the SDK should not create a dataset whenever this function is called, rather just create a view in the feast dataset.

Current Behavior

It creates a new BigQuery Dataset.

Steps to reproduce

Use the quickstart:

feature_set = FeatureSet(entity="ride", 
  features=["ride.log_trip_duration", 
     "ride.distance_haversine",
     "ride.distance_dummy_manhattan",
     "ride.direction",
     "ride.month",
     "ride.day_of_month",
     "ride.hour",
     "ride.day_of_week",
     "ride.vi_1",
     "ride.vi_2",
     "ride.sf_n",
     "ride.sf_y"])
dataset_info = fs.create_dataset(feature_set, "2016-06-01", "2016-08-01")
dataset = fs.download_dataset_to_df(dataset_info, staging_location=STAGING_LOCATION)

dataset.head()

Specifications

Version: latest
Platform: as per suggested by install guide
Subsystem: as per suggested by install guide

Possible Solution

Fix create_dataset to create view in feast dataset instead
Write down dataset definition
Change to create_view?

The text was updated successfully, but these errors were encountered:

woop · 2019-06-05T04:31:12Z

Hey @budi. This issue is a bit confusing.

The idea of a dataset to the client should be a materialization of a feature set. Basically a collection of rows for specific columns (features). We should not care what happens in BQ in terms of naming (bq table vs dataset).

In this case I think it is important that we create a table as a snapshot of the data (not a BQ dataset) in order to make the dataset immutable. A view would not provide that. In the event that the user wants to have new data, they should create a new dataset using the same features and time range query.

budi mentioned this issue May 30, 2019

fix create_dataset #208

Merged

feast-ci-bot closed this as completed in #208 Jun 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python SDK create_dataset is actually creating dataset in BQ #201

Python SDK create_dataset is actually creating dataset in BQ #201

budi commented May 27, 2019 •

edited

Loading

woop commented Jun 5, 2019

Python SDK create_dataset is actually creating dataset in BQ #201

Python SDK create_dataset is actually creating dataset in BQ #201

Comments

budi commented May 27, 2019 • edited Loading

Expected Behavior

Current Behavior

Steps to reproduce

Specifications

Possible Solution

woop commented Jun 5, 2019

budi commented May 27, 2019 •

edited

Loading