diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index 5a82a190fe7..1173a693efb 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -76,7 +76,7 @@ * [Azure Synapse + Azure SQL (contrib)](reference/data-sources/mssql.md) * [Offline stores](reference/offline-stores/README.md) * [Overview](reference/offline-stores/overview.md) - * [File](reference/offline-stores/file.md) + * [Dask](reference/offline-stores/dask.md) * [Snowflake](reference/offline-stores/snowflake.md) * [BigQuery](reference/offline-stores/bigquery.md) * [Redshift](reference/offline-stores/redshift.md) @@ -119,7 +119,7 @@ * [Feature servers](reference/feature-servers/README.md) * [Python feature server](reference/feature-servers/python-feature-server.md) * [\[Alpha\] Go feature server](reference/feature-servers/go-feature-server.md) - * [Offline Feature Server](reference/feature-servers/offline-feature-server) + * [Offline Feature Server](reference/feature-servers/offline-feature-server.md) * [\[Beta\] Web UI](reference/alpha-web-ui.md) * [\[Beta\] On demand feature view](reference/beta-on-demand-feature-view.md) * [\[Alpha\] Vector Database](reference/alpha-vector-database.md) diff --git a/docs/reference/feature-servers/README.md b/docs/reference/feature-servers/README.md index 124834f8a73..2ceaf5807f3 100644 --- a/docs/reference/feature-servers/README.md +++ b/docs/reference/feature-servers/README.md @@ -8,7 +8,6 @@ Feast users can choose to retrieve features from a feature server, as opposed to {% content-ref url="go-feature-server.md" %} [go-feature-server.md](go-feature-server.md) -======= {% endcontent-ref %} {% content-ref url="offline-feature-server.md" %} diff --git a/docs/reference/offline-stores/file.md b/docs/reference/offline-stores/dask.md similarity index 87% rename from docs/reference/offline-stores/file.md rename to docs/reference/offline-stores/dask.md index 4b76d9af904..d8698ba544b 100644 --- a/docs/reference/offline-stores/file.md +++ b/docs/reference/offline-stores/dask.md @@ -1,9 +1,8 @@ -# File offline store +# Dask offline store ## Description -The file offline store provides support for reading [FileSources](../data-sources/file.md). -It uses Dask as the compute engine. +The Dask offline store provides support for reading [FileSources](../data-sources/file.md). {% hint style="warning" %} All data is downloaded and joined using Python and therefore may not scale to production workloads. @@ -17,18 +16,18 @@ project: my_feature_repo registry: data/registry.db provider: local offline_store: - type: file + type: dask ``` {% endcode %} -The full set of configuration options is available in [FileOfflineStoreConfig](https://rtd.feast.dev/en/latest/#feast.infra.offline_stores.file.FileOfflineStoreConfig). +The full set of configuration options is available in [DaskOfflineStoreConfig](https://rtd.feast.dev/en/latest/#feast.infra.offline_stores.dask.DaskOfflineStoreConfig). ## Functionality Matrix The set of functionality supported by offline stores is described in detail [here](overview.md#functionality). -Below is a matrix indicating which functionality is supported by the file offline store. +Below is a matrix indicating which functionality is supported by the dask offline store. -| | File | +| | Dask | | :-------------------------------- | :-- | | `get_historical_features` (point-in-time correct join) | yes | | `pull_latest_from_table_or_query` (retrieve latest feature values) | yes | @@ -36,9 +35,9 @@ Below is a matrix indicating which functionality is supported by the file offlin | `offline_write_batch` (persist dataframes to offline store) | yes | | `write_logged_features` (persist logged features to offline store) | yes | -Below is a matrix indicating which functionality is supported by `FileRetrievalJob`. +Below is a matrix indicating which functionality is supported by `DaskRetrievalJob`. -| | File | +| | Dask | | --------------------------------- | --- | | export to dataframe | yes | | export to arrow table | yes | diff --git a/docs/reference/offline-stores/overview.md b/docs/reference/offline-stores/overview.md index 4d7681e38c8..182eac65864 100644 --- a/docs/reference/offline-stores/overview.md +++ b/docs/reference/offline-stores/overview.md @@ -25,13 +25,13 @@ The first three of these methods all return a `RetrievalJob` specific to an offl ## Functionality Matrix -There are currently four core offline store implementations: `FileOfflineStore`, `BigQueryOfflineStore`, `SnowflakeOfflineStore`, and `RedshiftOfflineStore`. +There are currently four core offline store implementations: `DaskOfflineStore`, `BigQueryOfflineStore`, `SnowflakeOfflineStore`, and `RedshiftOfflineStore`. There are several additional implementations contributed by the Feast community (`PostgreSQLOfflineStore`, `SparkOfflineStore`, and `TrinoOfflineStore`), which are not guaranteed to be stable or to match the functionality of the core implementations. Details for each specific offline store, such as how to configure it in a `feature_store.yaml`, can be found [here](README.md). Below is a matrix indicating which offline stores support which methods. -| | File | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino | +| | Dask | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino | | :-------------------------------- | :-- | :-- | :-- | :-- | :-- | :-- | :-- | | `get_historical_features` | yes | yes | yes | yes | yes | yes | yes | | `pull_latest_from_table_or_query` | yes | yes | yes | yes | yes | yes | yes | @@ -42,7 +42,7 @@ Below is a matrix indicating which offline stores support which methods. Below is a matrix indicating which `RetrievalJob`s support what functionality. -| | File | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino | DuckDB | +| | Dask | BigQuery | Snowflake | Redshift | Postgres | Spark | Trino | DuckDB | | --------------------------------- | --- | --- | --- | --- | --- | --- | --- | --- | | export to dataframe | yes | yes | yes | yes | yes | yes | yes | yes | | export to arrow table | yes | yes | yes | yes | yes | yes | yes | yes | diff --git a/sdk/python/docs/source/feast.infra.contrib.rst b/sdk/python/docs/source/feast.infra.contrib.rst index 7b2fa3cc9c5..1f46ff0abf8 100644 --- a/sdk/python/docs/source/feast.infra.contrib.rst +++ b/sdk/python/docs/source/feast.infra.contrib.rst @@ -4,14 +4,6 @@ feast.infra.contrib package Submodules ---------- -feast.infra.contrib.azure\_provider module ------------------------------------------- - -.. automodule:: feast.infra.contrib.azure_provider - :members: - :undoc-members: - :show-inheritance: - feast.infra.contrib.grpc\_server module --------------------------------------- diff --git a/sdk/python/docs/source/feast.infra.feature_servers.rst b/sdk/python/docs/source/feast.infra.feature_servers.rst index 334b5859053..ca5203504df 100644 --- a/sdk/python/docs/source/feast.infra.feature_servers.rst +++ b/sdk/python/docs/source/feast.infra.feature_servers.rst @@ -7,8 +7,6 @@ Subpackages .. toctree:: :maxdepth: 4 - feast.infra.feature_servers.aws_lambda - feast.infra.feature_servers.gcp_cloudrun feast.infra.feature_servers.local_process feast.infra.feature_servers.multicloud diff --git a/sdk/python/docs/source/feast.infra.offline_stores.rst b/sdk/python/docs/source/feast.infra.offline_stores.rst index 052a114cfb3..c770e5c13b0 100644 --- a/sdk/python/docs/source/feast.infra.offline_stores.rst +++ b/sdk/python/docs/source/feast.infra.offline_stores.rst @@ -28,18 +28,18 @@ feast.infra.offline\_stores.bigquery\_source module :undoc-members: :show-inheritance: -feast.infra.offline\_stores.duckdb module ------------------------------------------ +feast.infra.offline\_stores.dask module +--------------------------------------- -.. automodule:: feast.infra.offline_stores.duckdb +.. automodule:: feast.infra.offline_stores.dask :members: :undoc-members: :show-inheritance: -feast.infra.offline\_stores.file module ---------------------------------------- +feast.infra.offline\_stores.duckdb module +----------------------------------------- -.. automodule:: feast.infra.offline_stores.file +.. automodule:: feast.infra.offline_stores.duckdb :members: :undoc-members: :show-inheritance: diff --git a/sdk/python/docs/source/feast.infra.registry.contrib.rst b/sdk/python/docs/source/feast.infra.registry.contrib.rst index 44b89736adb..83417109b86 100644 --- a/sdk/python/docs/source/feast.infra.registry.contrib.rst +++ b/sdk/python/docs/source/feast.infra.registry.contrib.rst @@ -8,7 +8,6 @@ Subpackages :maxdepth: 4 feast.infra.registry.contrib.azure - feast.infra.registry.contrib.postgres Module contents --------------- diff --git a/sdk/python/docs/source/feast.infra.rst b/sdk/python/docs/source/feast.infra.rst index a1dfc864926..b0046a2719e 100644 --- a/sdk/python/docs/source/feast.infra.rst +++ b/sdk/python/docs/source/feast.infra.rst @@ -19,22 +19,6 @@ Subpackages Submodules ---------- -feast.infra.aws module ----------------------- - -.. automodule:: feast.infra.aws - :members: - :undoc-members: - :show-inheritance: - -feast.infra.gcp module ----------------------- - -.. automodule:: feast.infra.gcp - :members: - :undoc-members: - :show-inheritance: - feast.infra.infra\_object module -------------------------------- @@ -51,14 +35,6 @@ feast.infra.key\_encoding\_utils module :undoc-members: :show-inheritance: -feast.infra.local module ------------------------- - -.. automodule:: feast.infra.local - :members: - :undoc-members: - :show-inheritance: - feast.infra.passthrough\_provider module ---------------------------------------- diff --git a/sdk/python/feast/infra/offline_stores/contrib/postgres_offline_store/postgres.py b/sdk/python/feast/infra/offline_stores/contrib/postgres_offline_store/postgres.py index c4740a960ef..c3bbfd97bc7 100644 --- a/sdk/python/feast/infra/offline_stores/contrib/postgres_offline_store/postgres.py +++ b/sdk/python/feast/infra/offline_stores/contrib/postgres_offline_store/postgres.py @@ -365,7 +365,9 @@ def build_point_in_time_query( full_feature_names: bool = False, ) -> str: """Build point-in-time query between each feature view table and the entity dataframe for PostgreSQL""" - template = Environment(loader=BaseLoader()).from_string(source=query_template) + template = Environment(autoescape=True, loader=BaseLoader()).from_string( + source=query_template + ) final_output_feature_names = list(entity_df_columns) final_output_feature_names.extend( diff --git a/sdk/python/feast/infra/offline_stores/file.py b/sdk/python/feast/infra/offline_stores/dask.py similarity index 97% rename from sdk/python/feast/infra/offline_stores/file.py rename to sdk/python/feast/infra/offline_stores/dask.py index af2570ebc08..4a63baf6467 100644 --- a/sdk/python/feast/infra/offline_stores/file.py +++ b/sdk/python/feast/infra/offline_stores/dask.py @@ -39,20 +39,20 @@ from feast.saved_dataset import SavedDatasetStorage from feast.utils import _get_requested_feature_views_to_features_dict -# FileRetrievalJob will cast string objects to string[pyarrow] from dask version 2023.7.1 +# DaskRetrievalJob will cast string objects to string[pyarrow] from dask version 2023.7.1 # This is not the desired behavior for our use case, so we set the convert-string option to False # See (https://github.com/dask/dask/issues/10881#issuecomment-1923327936) dask.config.set({"dataframe.convert-string": False}) -class FileOfflineStoreConfig(FeastConfigBaseModel): - """Offline store config for local (file-based) store""" +class DaskOfflineStoreConfig(FeastConfigBaseModel): + """Offline store config for dask store""" - type: Literal["file"] = "file" + type: Union[Literal["dask"], Literal["file"]] = "dask" """ Offline store type selector""" -class FileRetrievalJob(RetrievalJob): +class DaskRetrievalJob(RetrievalJob): def __init__( self, evaluation_function: Callable, @@ -122,7 +122,7 @@ def supports_remote_storage_export(self) -> bool: return False -class FileOfflineStore(OfflineStore): +class DaskOfflineStore(OfflineStore): @staticmethod def get_historical_features( config: RepoConfig, @@ -133,7 +133,7 @@ def get_historical_features( project: str, full_feature_names: bool = False, ) -> RetrievalJob: - assert isinstance(config.offline_store, FileOfflineStoreConfig) + assert isinstance(config.offline_store, DaskOfflineStoreConfig) for fv in feature_views: assert isinstance(fv.batch_source, FileSource) @@ -283,7 +283,7 @@ def evaluate_historical_retrieval(): return entity_df_with_features.persist() - job = FileRetrievalJob( + job = DaskRetrievalJob( evaluation_function=evaluate_historical_retrieval, full_feature_names=full_feature_names, on_demand_feature_views=OnDemandFeatureView.get_requested_odfvs( @@ -309,7 +309,7 @@ def pull_latest_from_table_or_query( start_date: datetime, end_date: datetime, ) -> RetrievalJob: - assert isinstance(config.offline_store, FileOfflineStoreConfig) + assert isinstance(config.offline_store, DaskOfflineStoreConfig) assert isinstance(data_source, FileSource) # Create lazy function that is only called from the RetrievalJob object @@ -372,7 +372,7 @@ def evaluate_offline_job(): return source_df[list(columns_to_extract)].persist() # When materializing a single feature view, we don't need full feature names. On demand transforms aren't materialized - return FileRetrievalJob( + return DaskRetrievalJob( evaluation_function=evaluate_offline_job, full_feature_names=False, ) @@ -387,10 +387,10 @@ def pull_all_from_table_or_query( start_date: datetime, end_date: datetime, ) -> RetrievalJob: - assert isinstance(config.offline_store, FileOfflineStoreConfig) + assert isinstance(config.offline_store, DaskOfflineStoreConfig) assert isinstance(data_source, FileSource) - return FileOfflineStore.pull_latest_from_table_or_query( + return DaskOfflineStore.pull_latest_from_table_or_query( config=config, data_source=data_source, join_key_columns=join_key_columns @@ -410,7 +410,7 @@ def write_logged_features( logging_config: LoggingConfig, registry: BaseRegistry, ): - assert isinstance(config.offline_store, FileOfflineStoreConfig) + assert isinstance(config.offline_store, DaskOfflineStoreConfig) destination = logging_config.destination assert isinstance(destination, FileLoggingDestination) @@ -441,7 +441,7 @@ def offline_write_batch( table: pyarrow.Table, progress: Optional[Callable[[int], Any]], ): - assert isinstance(config.offline_store, FileOfflineStoreConfig) + assert isinstance(config.offline_store, DaskOfflineStoreConfig) assert isinstance(feature_view.batch_source, FileSource) pa_schema, column_names = get_pyarrow_schema_from_batch_source( diff --git a/sdk/python/feast/infra/offline_stores/offline_utils.py b/sdk/python/feast/infra/offline_stores/offline_utils.py index 2d4fa268e40..6036ba54729 100644 --- a/sdk/python/feast/infra/offline_stores/offline_utils.py +++ b/sdk/python/feast/infra/offline_stores/offline_utils.py @@ -186,7 +186,9 @@ def build_point_in_time_query( full_feature_names: bool = False, ) -> str: """Build point-in-time query between each feature view table and the entity dataframe for Bigquery and Redshift""" - template = Environment(loader=BaseLoader()).from_string(source=query_template) + template = Environment(autoescape=True, loader=BaseLoader()).from_string( + source=query_template + ) final_output_feature_names = list(entity_df_columns) final_output_feature_names.extend( diff --git a/sdk/python/feast/repo_config.py b/sdk/python/feast/repo_config.py index 137023ef226..fc2792e3237 100644 --- a/sdk/python/feast/repo_config.py +++ b/sdk/python/feast/repo_config.py @@ -68,7 +68,8 @@ } OFFLINE_STORE_CLASS_FOR_TYPE = { - "file": "feast.infra.offline_stores.file.FileOfflineStore", + "file": "feast.infra.offline_stores.dask.DaskOfflineStore", + "dask": "feast.infra.offline_stores.dask.DaskOfflineStore", "bigquery": "feast.infra.offline_stores.bigquery.BigQueryOfflineStore", "redshift": "feast.infra.offline_stores.redshift.RedshiftOfflineStore", "snowflake.offline": "feast.infra.offline_stores.snowflake.SnowflakeOfflineStore", @@ -205,7 +206,7 @@ def __init__(self, **data: Any): self.registry_config = data["registry"] self._offline_store = None - self.offline_config = data.get("offline_store", "file") + self.offline_config = data.get("offline_store", "dask") self._online_store = None self.online_config = data.get("online_store", "sqlite") @@ -348,7 +349,7 @@ def _validate_offline_store_config(cls, values: Any) -> Any: # Set the default type if "type" not in values["offline_store"]: - values["offline_store"]["type"] = "file" + values["offline_store"]["type"] = "dask" offline_store_type = values["offline_store"]["type"] diff --git a/sdk/python/tests/integration/feature_repos/universal/data_sources/file.py b/sdk/python/tests/integration/feature_repos/universal/data_sources/file.py index 4a4a7360d8c..5174e160465 100644 --- a/sdk/python/tests/integration/feature_repos/universal/data_sources/file.py +++ b/sdk/python/tests/integration/feature_repos/universal/data_sources/file.py @@ -20,8 +20,8 @@ from feast.data_format import DeltaFormat, ParquetFormat from feast.data_source import DataSource from feast.feature_logging import LoggingDestination +from feast.infra.offline_stores.dask import DaskOfflineStoreConfig from feast.infra.offline_stores.duckdb import DuckDBOfflineStoreConfig -from feast.infra.offline_stores.file import FileOfflineStoreConfig from feast.infra.offline_stores.file_source import ( FileLoggingDestination, SavedDatasetFileStorage, @@ -84,7 +84,7 @@ def get_prefixed_table_name(self, suffix: str) -> str: return f"{self.project_name}.{suffix}" def create_offline_store_config(self) -> FeastConfigBaseModel: - return FileOfflineStoreConfig() + return DaskOfflineStoreConfig() def create_logged_features_destination(self) -> LoggingDestination: d = tempfile.mkdtemp(prefix=self.project_name) @@ -334,7 +334,7 @@ def get_prefixed_table_name(self, suffix: str) -> str: return f"{suffix}" def create_offline_store_config(self) -> FeastConfigBaseModel: - return FileOfflineStoreConfig() + return DaskOfflineStoreConfig() def teardown(self): self.minio.stop() diff --git a/sdk/python/tests/unit/infra/offline_stores/test_offline_store.py b/sdk/python/tests/unit/infra/offline_stores/test_offline_store.py index 3589c8a3fad..50f048928dc 100644 --- a/sdk/python/tests/unit/infra/offline_stores/test_offline_store.py +++ b/sdk/python/tests/unit/infra/offline_stores/test_offline_store.py @@ -23,7 +23,7 @@ from feast.infra.offline_stores.contrib.trino_offline_store.trino import ( TrinoRetrievalJob, ) -from feast.infra.offline_stores.file import FileRetrievalJob +from feast.infra.offline_stores.dask import DaskRetrievalJob from feast.infra.offline_stores.offline_store import RetrievalJob, RetrievalMetadata from feast.infra.offline_stores.redshift import ( RedshiftOfflineStoreConfig, @@ -100,7 +100,7 @@ def metadata(self) -> Optional[RetrievalMetadata]: @pytest.fixture( params=[ MockRetrievalJob, - FileRetrievalJob, + DaskRetrievalJob, RedshiftRetrievalJob, SnowflakeRetrievalJob, AthenaRetrievalJob, @@ -112,8 +112,8 @@ def metadata(self) -> Optional[RetrievalMetadata]: ] ) def retrieval_job(request, environment): - if request.param is FileRetrievalJob: - return FileRetrievalJob(lambda: 1, full_feature_names=False) + if request.param is DaskRetrievalJob: + return DaskRetrievalJob(lambda: 1, full_feature_names=False) elif request.param is RedshiftRetrievalJob: offline_store_config = RedshiftOfflineStoreConfig( cluster_id="feast-int-bucket", diff --git a/sdk/python/tests/unit/infra/online_store/test_dynamodb_online_store.py b/sdk/python/tests/unit/infra/online_store/test_dynamodb_online_store.py index 6045dbc6ce0..6ff7b3c3605 100644 --- a/sdk/python/tests/unit/infra/online_store/test_dynamodb_online_store.py +++ b/sdk/python/tests/unit/infra/online_store/test_dynamodb_online_store.py @@ -5,7 +5,7 @@ import pytest from moto import mock_dynamodb -from feast.infra.offline_stores.file import FileOfflineStoreConfig +from feast.infra.offline_stores.dask import DaskOfflineStoreConfig from feast.infra.online_stores.dynamodb import ( DynamoDBOnlineStore, DynamoDBOnlineStoreConfig, @@ -40,7 +40,7 @@ def repo_config(): provider=PROVIDER, online_store=DynamoDBOnlineStoreConfig(region=REGION), # online_store={"type": "dynamodb", "region": REGION}, - offline_store=FileOfflineStoreConfig(), + offline_store=DaskOfflineStoreConfig(), entity_key_serialization_version=2, ) diff --git a/setup.py b/setup.py index 8ef202a7572..80a3eb53aa1 100644 --- a/setup.py +++ b/setup.py @@ -71,9 +71,9 @@ GCP_REQUIRED = [ "google-api-core>=1.23.0,<3", "googleapis-common-protos>=1.52.0,<2", - "google-cloud-bigquery[pandas]>=2,<3.14.0", - "google-cloud-bigquery-storage >= 2.0.0,<4", - "google-cloud-datastore>=2.1.0,<3", + "google-cloud-bigquery[pandas]>=2,<4.0", + "google-cloud-bigquery-storage >= 2.0.0,<3", + "google-cloud-datastore>=2.16.0,<3", "google-cloud-storage>=1.34.0,<3", "google-cloud-bigtable>=2.11.0,<3", "fsspec<=2024.1.0",