diff --git a/docs/README.md b/docs/README.md index d391069429..6652eaddc8 100644 --- a/docs/README.md +++ b/docs/README.md @@ -2,7 +2,16 @@ ## What is Feast? -Feast (**Fea**ture **St**ore) is a customizable operational data system that re-uses existing infrastructure to manage and serve machine learning features to realtime models. +Feast (**Fea**ture **St**ore) is an [open-source](https://github.com/feast-dev/feast) feature store that helps teams +operate production ML systems at scale by allowing them to define, manage, validate, and serve features for production +AI/ML. + +Feast's feature store is composed of two foundational components: (1) an [offline store](getting-started/components/offline-store.md) +for historical feature extraction used in model training and an (2) [online store](getting-started/components/online-store.md) +for serving features at low-latency in production systems and applications. + +Feast is a configurable operational data system that re-uses existing infrastructure to manage and serve machine learning +features to realtime models. For more details please review our [architecture](getting-started/architecture/overview.md). Feast allows ML platform teams to: @@ -20,38 +29,31 @@ Feast allows ML platform teams to: **Note:** Feast uses a push model for online serving. This means that the feature store pushes feature values to the online store, which reduces the latency of feature retrieval. This is more efficient than a pull model, where the model serving system must make a request to the feature store to retrieve feature values. See -[this document](getting-started/architecture-and-components/push-vs-pull-model.md) for a more detailed discussion. -{% endhint %} - -{% hint style="info" %} +[this document](getting-started/architecture/push-vs-pull-model.md) for a more detailed discussion. {% endhint %} ## Who is Feast for? -Feast helps ML platform teams with DevOps experience productionize real-time models. Feast can also help these teams build towards a feature platform that improves collaboration between engineers and data scientists. +Feast helps ML platform/MLOps teams with DevOps experience productionize real-time models. Feast also helps these teams +build a feature platform that improves collaboration between data engineers, software engineers, machine learning +engineers, and data scientists. Feast is likely **not** the right tool if you - * are in an organization that’s just getting started with ML and is not yet sure what the business impact of ML is -* rely primarily on unstructured data -* need very low latency feature retrieval (e.g. p99 feature retrieval << 10ms) -* have a small team to support a large number of use cases ## What Feast is not? ### Feast is not -* **an** [**ETL**](https://en.wikipedia.org/wiki/Extract,\_transform,\_load) / [**ELT**](https://en.wikipedia.org/wiki/Extract,\_load,\_transform) **system:** Feast is not (and does not plan to become) a general purpose data transformation or pipelining system. Users often leverage tools like [dbt](https://www.getdbt.com) to manage upstream data transformations. +* **an** [**ETL**](https://en.wikipedia.org/wiki/Extract,\_transform,\_load) / [**ELT**](https://en.wikipedia.org/wiki/Extract,\_load,\_transform) **system.** Feast is not a general purpose data pipelining system. Users often leverage tools like [dbt](https://www.getdbt.com) to manage upstream data transformations. Feast does support some [transformations](getting-started/architecture/feature-transformetion.md). * **a data orchestration tool:** Feast does not manage or orchestrate complex workflow DAGs. It relies on upstream data pipelines to produce feature values and integrations with tools like [Airflow](https://airflow.apache.org) to make features consistently available. * **a data warehouse:** Feast is not a replacement for your data warehouse or the source of truth for all transformed data in your organization. Rather, Feast is a light-weight downstream layer that can serve data from an existing data warehouse (or other data sources) to models in production. * **a database:** Feast is not a database, but helps manage data stored in other systems (e.g. BigQuery, Snowflake, DynamoDB, Redis) to make features consistently available at training / serving time ### Feast does not _fully_ solve - * **reproducible model training / model backtesting / experiment management**: Feast captures feature and model metadata, but does not version-control datasets / labels or manage train / test splits. Other tools like [DVC](https://dvc.org/), [MLflow](https://www.mlflow.org/), and [Kubeflow](https://www.kubeflow.org/) are better suited for this. -* **batch + streaming feature engineering**: Feast primarily processes already transformed feature values but is investing in supporting batch and streaming transformations. +* **batch feature engineering**: Feast supports on demand and streaming transformations. Feast is also investing in supporting batch transformations. * **native streaming feature integration:** Feast enables users to push streaming features, but does not pull from streaming sources or manage streaming pipelines. -* **feature sharing**: Feast has experimental functionality to enable discovery and cataloguing of feature metadata with a [Feast web UI (alpha)](https://docs.feast.dev/reference/alpha-web-ui). Feast also has community contributed plugins with [DataHub](https://datahubproject.io/docs/generated/ingestion/sources/feast/) and [Amundsen](https://github.com/amundsen-io/amundsen/blob/4a9d60176767c4d68d1cad5b093320ea22e26a49/databuilder/databuilder/extractor/feast\_extractor.py). * **lineage:** Feast helps tie feature values to model versions, but is not a complete solution for capturing end-to-end lineage from raw data sources to model versions. Feast also has community contributed plugins with [DataHub](https://datahubproject.io/docs/generated/ingestion/sources/feast/) and [Amundsen](https://github.com/amundsen-io/amundsen/blob/4a9d60176767c4d68d1cad5b093320ea22e26a49/databuilder/databuilder/extractor/feast\_extractor.py). * **data quality / drift detection**: Feast has experimental integrations with [Great Expectations](https://greatexpectations.io/), but is not purpose built to solve data drift / data quality issues. This requires more sophisticated monitoring across data pipelines, served feature values, labels, and model versions. @@ -74,7 +76,7 @@ Explore the following resources to get started with Feast: * [Quickstart](getting-started/quickstart.md) is the fastest way to get started with Feast * [Concepts](getting-started/concepts/) describes all important Feast API concepts -* [Architecture](getting-started/architecture-and-components/) describes Feast's overall architecture. +* [Architecture](getting-started/architecture/) describes Feast's overall architecture. * [Tutorials](tutorials/tutorials-overview/) shows full examples of using Feast in machine learning applications. * [Running Feast with Snowflake/GCP/AWS](how-to-guides/feast-snowflake-gcp-aws/) provides a more in-depth guide to using Feast. * [Reference](reference/feast-cli-commands.md) contains detailed API and design documents. diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index 87c3626254..a6a40fc91d 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -17,15 +17,19 @@ * [Point-in-time joins](getting-started/concepts/point-in-time-joins.md) * [Registry](getting-started/concepts/registry.md) * [\[Alpha\] Saved dataset](getting-started/concepts/dataset.md) -* [Architecture](getting-started/architecture-and-components/README.md) - * [Overview](getting-started/architecture-and-components/overview.md) - * [Language](getting-started/architecture-and-components/language.md) - * [Push vs Pull Model](getting-started/architecture-and-components/push-vs-pull-model.md) - * [Registry](getting-started/architecture-and-components/registry.md) - * [Offline store](getting-started/architecture-and-components/offline-store.md) - * [Online store](getting-started/architecture-and-components/online-store.md) - * [Batch Materialization Engine](getting-started/architecture-and-components/batch-materialization-engine.md) - * [Provider](getting-started/architecture-and-components/provider.md) +* [Architecture](getting-started/architecture/README.md) + * [Overview](getting-started/architecture/overview.md) + * [Language](getting-started/architecture/language.md) + * [Push vs Pull Model](getting-started/architecture/push-vs-pull-model.md) + * [Write Patterns](getting-started/architecture/write-patterns.md) + * [Feature Transformation](getting-started/architecture/feature-transformation.md) +* [Components](getting-started/components/README.md) + * [Overview](getting-started/components/overview.md) + * [Registry](getting-started/components/registry.md) + * [Offline store](getting-started/components/offline-store.md) + * [Online store](getting-started/components/online-store.md) + * [Batch Materialization Engine](getting-started/components/batch-materialization-engine.md) + * [Provider](getting-started/components/provider.md) * [Third party integrations](getting-started/third-party-integrations.md) * [FAQ](getting-started/faq.md) diff --git a/docs/getting-started/architecture/README.md b/docs/getting-started/architecture/README.md new file mode 100644 index 0000000000..a45f4ed6ec --- /dev/null +++ b/docs/getting-started/architecture/README.md @@ -0,0 +1,21 @@ +# Architecture + +{% content-ref url="overview.md" %} +[overview.md](overview.md) +{% endcontent-ref %} + +{% content-ref url="language.md" %} +[language.md](language.md) +{% endcontent-ref %} + +{% content-ref url="push-vs-pull-model.md" %} +[push-vs-pull-model.md](push-vs-pull-model.md) +{% endcontent-ref %} + +{% content-ref url="write-patterns.md" %} +[write-patterns.md](write-patterns.md) +{% endcontent-ref %} + +{% content-ref url="feature-transformation-model.md" %} +[feature-transformation.md](feature-transformation.md) +{% endcontent-ref %} diff --git a/docs/getting-started/architecture/feature-transformation.md b/docs/getting-started/architecture/feature-transformation.md new file mode 100644 index 0000000000..457e71d85e --- /dev/null +++ b/docs/getting-started/architecture/feature-transformation.md @@ -0,0 +1,20 @@ +# Feature Transformation + +A *feature transformation* is a function that takes some set of input data and +returns some set of output data. Feature transformations can happen on either raw data or derived data. + +Feature transformations can be executed by three types of "transformation engines": + +1. The Feast Feature Server +2. An Offline Store (e.g., Snowflake, BigQuery, DuckDB, Spark, etc.) +3. A Stream processor (e.g., Flink or Spark Streaming) + +The three transformation engines are coupled with the [communication pattern used for writes](write-patterns.md). + +Importantly, this implies that different feature transformation code may be +used under different transformation engines, so understanding the tradeoffs of +when to use which transformation engine/communication pattern is extremely critical to +the success of your implementation. + +In general, we recommend transformation engines and network calls to be chosen by aligning it with what is most +appropriate for the data producer, feature/model usage, and overall product. \ No newline at end of file diff --git a/docs/getting-started/architecture-and-components/language.md b/docs/getting-started/architecture/language.md similarity index 100% rename from docs/getting-started/architecture-and-components/language.md rename to docs/getting-started/architecture/language.md diff --git a/docs/getting-started/architecture/overview.md b/docs/getting-started/architecture/overview.md new file mode 100644 index 0000000000..7d1180bfd1 --- /dev/null +++ b/docs/getting-started/architecture/overview.md @@ -0,0 +1,18 @@ +# Overview + +![Feast Architecture Diagram](<../../assets/feast_marchitecture.png>) + +Feast's architecture is designed to be flexible and scalable. It is composed of several components that work together to provide a feature store that can be used to serve features for training and inference. + +* Feast uses a [Push Model](push-vs-pull-model.md) to ingest data from different sources and store feature values in the +online store. +This allows Feast to serve features in real-time with low latency. + +* Feast supports On Demand and Streaming Transformations for [feature computation](feature-transformation.md) and + will support Batch transformations in the future. For Streaming and Batch, Feast requires a separate Feature Transformation + Engine (in the batch case, this is typically your Offline Store). We are exploring adding a default streaming engine to Feast. + +* Domain expertise is recommended when integrating a data source with Feast understand the [tradeoffs from different + write patterns](write-patterns.md) to your application + +* We recommend [using Python](language.md) for your Feature Store microservice. As mentioned in the document, precomputing features is the recommended optimal path to ensure low latency performance. Reducing feature serving to a lightweight database lookup is the ideal pattern, which means the marginal overhead of Python should be tolerable. Because of this we believe the pros of Python outweigh the costs, as reimplementing feature logic is undesirable. Java and Go Clients are also available for online feature retrieval. diff --git a/docs/getting-started/architecture-and-components/push-vs-pull-model.md b/docs/getting-started/architecture/push-vs-pull-model.md similarity index 66% rename from docs/getting-started/architecture-and-components/push-vs-pull-model.md rename to docs/getting-started/architecture/push-vs-pull-model.md index a1f404221b..b205e97fc5 100644 --- a/docs/getting-started/architecture-and-components/push-vs-pull-model.md +++ b/docs/getting-started/architecture/push-vs-pull-model.md @@ -6,15 +6,23 @@ in the online store, to serve features in real-time. In a [Pull Model](https://en.wikipedia.org/wiki/Pull_technology), Feast would pull data from the data producers at request time and store the feature values in -the online store before serving them (storing them would actually be unneccessary). +the online store before serving them (storing them would actually be unnecessary). This approach would incur additional network latency as Feast would need to orchestrate a request to each data producer, which would mean the latency would be at least as long as your slowest call. So, in order to serve features as fast as possible, we push data to Feast and store the feature values in the online store. -The trade-off with the Push Model is that strong consistency is not gauranteed out -of the box. Instead, stong consistency has to be explicitly designed for in orchestrating +The trade-off with the Push Model is that strong consistency is not guaranteed out +of the box. Instead, strong consistency has to be explicitly designed for in orchestrating the updates to Feast and the client usage. The significant advantage with this approach is that Feast is read-optimized for low-latency -feature retrieval. \ No newline at end of file +feature retrieval. + +# How to Push + +Implicit in the Push model are decisions about _how_ and _when_ to push feature values to the online store. + +From a developer's perspective, there are three ways to push feature values to the online store with different tradeoffs. + +They are discussed further in the [Write Patterns](getting-started/architecture/write-patterns.md) section. diff --git a/docs/getting-started/architecture/write-patterns.md b/docs/getting-started/architecture/write-patterns.md new file mode 100644 index 0000000000..4674b5504d --- /dev/null +++ b/docs/getting-started/architecture/write-patterns.md @@ -0,0 +1,67 @@ +# Writing Data to Feast + +Feast uses a [Push Model](getting-started/architecture/push-vs-pull-model.md) to push features to the online store. + +This has two important consequences: (1) communication patterns between the Data Producer (i.e., the client) and Feast (i.e,. the server) and (2) feature computation and +_feature value_ write patterns to Feast's online store. + +Data Producers (i.e., services that generate data) send data to Feast so that Feast can write feature values to the online store. That data can +be either raw data where Feast computes and stores the feature values or precomputed feature values. + +## Communication Patterns + +There are two ways a client (or Data Producer) can *_send_* data to the online store: + +1. Synchronously + - Using a synchronous API call for a small number of entities or a single entity (e.g., using the [`push` or `write_to_online_store` methods](../../reference/data-sources/push.md#pushing-data)) or the Feature Server's [`push` endpoint](../../reference/feature-servers/python-feature-server.md#pushing-features-to-the-online-and-offline-stores)) +2. Asynchronously + - Using an asynchronous API call for a small number of entities or a single entity (e.g., using the [`push` or `write_to_online_store` methods](../../reference/data-sources/push.md#pushing-data)) or the Feature Server's [`push` endpoint](../../reference/feature-servers/python-feature-server.md#pushing-features-to-the-online-and-offline-stores)) + - Using a "batch job" for a large number of entities (e.g., using a [batch materialization engine](../components/batch-materialization-engine.md)) + +Note, in some contexts, developers may "batch" a group of entities together and write them to the online store in a +single API call. This is a common pattern when writing data to the online store to reduce write loads but we would +not qualify this as a batch job. + +## Feature Value Write Patterns + +Writing feature values to the online store (i.e., the server) can be done in two ways: Precomputing the transformations client-side or Computing the transformations On Demand server-side. + +### Combining Approaches + +In some scenarios, a combination of Precomputed and On Demand transformations may be optimal. + +When selecting feature value write patterns, one must consider the specific requirements of your application, the acceptable correctness of the data, the latency tolerance, and the computational resources available. Making deliberate choices can help the performance and reliability of your service. + +There are two ways the client can write *feature values* to the online store: + +1. Precomputing transformations +2. Computing transformations On Demand +3. Hybrid (Precomputed + On Demand) + +### 1. Precomputing Transformations +Precomputed transformations can happen outside of Feast (e.g., via some batch job or streaming application) or inside of the Feast feature server when writing to the online store via the `push` or `write-to-online-store` api. + +### 2. Computing Transformations On Demand +On Demand transformations can only happen inside of Feast at either (1) the time of the client's request or (2) when the data producer writes to the online store. + +### 3. Hybrid (Precomputed + On Demand) +The hybrid approach allows for precomputed transformations to happen inside or outside of Feast and have the On Demand transformations happen at client request time. This is particularly convenient for "Time Since Last" types of features (e.g., time since purchase). + +## Tradeoffs + +When deciding between synchronous and asynchronous data writes, several tradeoffs should be considered: + +- **Data Consistency**: Asynchronous writes allow Data Producers to send data without waiting for the write operation to complete, which can lead to situations where the data in the online store is stale. This might be acceptable in scenarios where absolute freshness is not critical. However, for critical operations, such as calculating loan amounts in financial applications, stale data can lead to incorrect decisions, making synchronous writes essential. +- **Correctness**: The risk of data being out-of-date must be weighed against the operational requirements. For instance, in a lending application, having up-to-date feature data can be crucial for correctness (depending upon the features and raw data), thus favoring synchronous writes. In less sensitive contexts, the eventual consistency offered by asynchronous writes might be sufficient. +- **Service Coupling**: Synchronous writes result in tighter coupling between services. If a write operation fails, it can cause the dependent service operation to fail as well, which might be a significant drawback in systems requiring high reliability and independence between services. +- **Application Latency**: Asynchronous writes typically reduce the perceived latency from the client's perspective because the client does not wait for the write operation to complete. This can enhance the user experience and efficiency in environments where operations are not critically dependent on immediate data freshness. + +The table below can help guide the most appropriate data write and feature computation strategies based on specific application needs and data sensitivity. + +| Data Write Type | Feature Computation | Scenario | Recommended Approach | +|----------|-----------------|---------------------|----------------------| +| Asynchronous | On Demand | Data-intensive applications tolerant to staleness | Opt for asynchronous writes with on-demand computation to balance load and manage resource usage efficiently. | +| Asynchronous | Precomputed | High volume, non-critical data processing | Use asynchronous batch jobs with precomputed transformations for efficiency and scalability. | +| Synchronous | On Demand | High-stakes decision making | Use synchronous writes with on-demand feature computation to ensure data freshness and correctness. | +| Synchronous | Precomputed | User-facing applications requiring quick feedback | Use synchronous writes with precomputed features to reduce latency and improve user experience. | +| Synchronous | Hybrid (Precomputed + On Demand) | High-stakes decision making that want to optimize for latency under constraints| Use synchronous writes with precomputed features where possible and a select set of on demand computations to reduce latency and improve user experience. | diff --git a/docs/getting-started/architecture-and-components/README.md b/docs/getting-started/components/README.md similarity index 63% rename from docs/getting-started/architecture-and-components/README.md rename to docs/getting-started/components/README.md index 050a430c97..d468714bd4 100644 --- a/docs/getting-started/architecture-and-components/README.md +++ b/docs/getting-started/components/README.md @@ -1,16 +1,4 @@ -# Architecture - -{% content-ref url="language.md" %} -[language.md](language.md) -{% endcontent-ref %} - -{% content-ref url="overview.md" %} -[overview.md](overview.md) -{% endcontent-ref %} - -{% content-ref url="push-vs-pull-model.md" %} -[push-vs-pull-model.md](push-vs-pull-model.md) -{% endcontent-ref %} +# Components {% content-ref url="registry.md" %} [registry.md](registry.md) diff --git a/docs/getting-started/architecture-and-components/batch-materialization-engine.md b/docs/getting-started/components/batch-materialization-engine.md similarity index 100% rename from docs/getting-started/architecture-and-components/batch-materialization-engine.md rename to docs/getting-started/components/batch-materialization-engine.md diff --git a/docs/getting-started/architecture-and-components/offline-store.md b/docs/getting-started/components/offline-store.md similarity index 100% rename from docs/getting-started/architecture-and-components/offline-store.md rename to docs/getting-started/components/offline-store.md diff --git a/docs/getting-started/architecture-and-components/online-store.md b/docs/getting-started/components/online-store.md similarity index 100% rename from docs/getting-started/architecture-and-components/online-store.md rename to docs/getting-started/components/online-store.md diff --git a/docs/getting-started/architecture-and-components/overview.md b/docs/getting-started/components/overview.md similarity index 85% rename from docs/getting-started/architecture-and-components/overview.md rename to docs/getting-started/components/overview.md index f4d543cd5a..393f436e5b 100644 --- a/docs/getting-started/architecture-and-components/overview.md +++ b/docs/getting-started/components/overview.md @@ -28,11 +28,3 @@ A complete Feast deployment contains the following components: * **Batch Materialization Engine:** The [Batch Materialization Engine](batch-materialization-engine.md) component launches a process which loads data into the online store from the offline store. By default, Feast uses a local in-process engine implementation to materialize data. However, additional infrastructure can be used for a more scalable materialization process. * **Online Store:** The online store is a database that stores only the latest feature values for each entity. The online store is either populated through materialization jobs or through [stream ingestion](../../reference/data-sources/push.md). * **Offline Store:** The offline store persists batch data that has been ingested into Feast. This data is used for producing training datasets. For feature retrieval and materialization, Feast does not manage the offline store directly, but runs queries against it. However, offline stores can be configured to support writes if Feast configures logging functionality of served features. - -{% hint style="info" %} -Java and Go Clients are also available for online feature retrieval. - -In general, we recommend [using Python](language.md) for your Feature Store microservice. - -As mentioned in the document, precomputing features is the recommended optimal path to ensure low latency performance. Reducing feature serving to a lightweight database lookup is the ideal pattern, which means the marginal overhead of Python should be tolerable. Because of this we believe the pros of Python outweigh the costs, as reimplementing feature logic is undesirable. -{% endhint %} diff --git a/docs/getting-started/architecture-and-components/provider.md b/docs/getting-started/components/provider.md similarity index 100% rename from docs/getting-started/architecture-and-components/provider.md rename to docs/getting-started/components/provider.md diff --git a/docs/getting-started/architecture-and-components/registry.md b/docs/getting-started/components/registry.md similarity index 100% rename from docs/getting-started/architecture-and-components/registry.md rename to docs/getting-started/components/registry.md diff --git a/docs/getting-started/architecture-and-components/stream-processor.md b/docs/getting-started/components/stream-processor.md similarity index 100% rename from docs/getting-started/architecture-and-components/stream-processor.md rename to docs/getting-started/components/stream-processor.md diff --git a/docs/getting-started/concepts/dataset.md b/docs/getting-started/concepts/dataset.md index d55adb4703..829ad4284e 100644 --- a/docs/getting-started/concepts/dataset.md +++ b/docs/getting-started/concepts/dataset.md @@ -2,7 +2,7 @@ Feast datasets allow for conveniently saving dataframes that include both features and entities to be subsequently used for data analysis and model training. [Data Quality Monitoring](https://docs.google.com/document/d/110F72d4NTv80p35wDSONxhhPBqWRwbZXG4f9mNEMd98) was the primary motivation for creating dataset concept. -Dataset's metadata is stored in the Feast registry and raw data (features, entities, additional input keys and timestamp) is stored in the [offline store](../architecture-and-components/offline-store.md). +Dataset's metadata is stored in the Feast registry and raw data (features, entities, additional input keys and timestamp) is stored in the [offline store](../components/offline-store.md). Dataset can be created from: diff --git a/docs/getting-started/faq.md b/docs/getting-started/faq.md index d603e12ab6..6567ae181d 100644 --- a/docs/getting-started/faq.md +++ b/docs/getting-started/faq.md @@ -29,7 +29,7 @@ Feature views once they are used by a feature service are intended to be immutab ### What is the difference between data sources and the offline store? -The data source itself defines the underlying data warehouse table in which the features are stored. The offline store interface defines the APIs required to make an arbitrary compute layer work for Feast (e.g. pulling features given a set of feature views from their sources, exporting the data set results to different formats). Please see [data sources](concepts/data-ingestion.md) and [offline store](architecture-and-components/offline-store.md) for more details. +The data source itself defines the underlying data warehouse table in which the features are stored. The offline store interface defines the APIs required to make an arbitrary compute layer work for Feast (e.g. pulling features given a set of feature views from their sources, exporting the data set results to different formats). Please see [data sources](concepts/data-ingestion.md) and [offline store](components/offline-store.md) for more details. ### Is it possible to have offline and online stores from different providers? diff --git a/docs/getting-started/quickstart.md b/docs/getting-started/quickstart.md index 01c039e9c5..ffc01c9d6e 100644 --- a/docs/getting-started/quickstart.md +++ b/docs/getting-started/quickstart.md @@ -623,6 +623,6 @@ show up in the upcoming concepts + architecture + tutorial pages as well. ## Next steps * Read the [Concepts](concepts/) page to understand the Feast data model. -* Read the [Architecture](architecture-and-components/) page. +* Read the [Architecture](architecture/) page. * Check out our [Tutorials](../tutorials/tutorials-overview/) section for more examples on how to use Feast. * Follow our [Running Feast with Snowflake/GCP/AWS](../how-to-guides/feast-snowflake-gcp-aws/) guide for a more in-depth tutorial on using Feast. diff --git a/docs/how-to-guides/scaling-feast.md b/docs/how-to-guides/scaling-feast.md index ce63f027c9..7e4f27b1dd 100644 --- a/docs/how-to-guides/scaling-feast.md +++ b/docs/how-to-guides/scaling-feast.md @@ -20,7 +20,7 @@ The recommended solution in this case is to use the [SQL based registry](../tuto The default Feast materialization process is an in-memory process, which pulls data from the offline store before writing it to the online store. However, this process does not scale for large data sets, since it's executed on a single-process. -Feast supports pluggable [Materialization Engines](../getting-started/architecture-and-components/batch-materialization-engine.md), that allow the materialization process to be scaled up. +Feast supports pluggable [Materialization Engines](../getting-started/components/batch-materialization-engine.md), that allow the materialization process to be scaled up. Aside from the local process, Feast supports a [Lambda-based materialization engine](https://rtd.feast.dev/en/master/#alpha-lambda-based-engine), and a [Bytewax-based materialization engine](https://rtd.feast.dev/en/master/#bytewax-engine). Users may also be able to build an engine to scale up materialization using existing infrastructure in their organizations. \ No newline at end of file diff --git a/docs/reference/batch-materialization/README.md b/docs/reference/batch-materialization/README.md index 8511fd81d0..a05d6d75e5 100644 --- a/docs/reference/batch-materialization/README.md +++ b/docs/reference/batch-materialization/README.md @@ -1,6 +1,6 @@ # Batch materialization -Please see [Batch Materialization Engine](../../getting-started/architecture-and-components/batch-materialization-engine.md) for an explanation of batch materialization engines. +Please see [Batch Materialization Engine](../../getting-started/components/batch-materialization-engine.md) for an explanation of batch materialization engines. {% page-ref page="snowflake.md" %} diff --git a/docs/reference/codebase-structure.md b/docs/reference/codebase-structure.md index 8eb5572679..7077e48fef 100644 --- a/docs/reference/codebase-structure.md +++ b/docs/reference/codebase-structure.md @@ -34,7 +34,7 @@ There are also several important submodules: * `ui/` contains the embedded Web UI, to be launched on the `feast ui` command. Of these submodules, `infra/` is the most important. -It contains the interfaces for the [provider](getting-started/architecture-and-components/provider.md), [offline store](getting-started/architecture-and-components/offline-store.md), [online store](getting-started/architecture-and-components/online-store.md), [batch materialization engine](getting-started/architecture-and-components/batch-materialization-engine.md), and [registry](getting-started/architecture-and-components/registry.md), as well as all of their individual implementations. +It contains the interfaces for the [provider](getting-started/components/provider.md), [offline store](getting-started/components/offline-store.md), [online store](getting-started/components/online-store.md), [batch materialization engine](getting-started/components/batch-materialization-engine.md), and [registry](getting-started/components/registry.md), as well as all of their individual implementations. ``` $ tree --dirsfirst -L 1 infra diff --git a/docs/reference/feature-servers/python-feature-server.md b/docs/reference/feature-servers/python-feature-server.md index 0d8a0aef75..33dfe77ae1 100644 --- a/docs/reference/feature-servers/python-feature-server.md +++ b/docs/reference/feature-servers/python-feature-server.md @@ -153,7 +153,7 @@ curl -X POST \ ### Pushing features to the online and offline stores -The Python feature server also exposes an endpoint for [push sources](../../data-sources/push.md). This endpoint allows you to push data to the online and/or offline store. +The Python feature server also exposes an endpoint for [push sources](../data-sources/push.md). This endpoint allows you to push data to the online and/or offline store. The request definition for `PushMode` is a string parameter `to` where the options are: \[`"online"`, `"offline"`, `"online_and_offline"`]. diff --git a/docs/reference/offline-stores/README.md b/docs/reference/offline-stores/README.md index 33eca6d426..87c92bfcf8 100644 --- a/docs/reference/offline-stores/README.md +++ b/docs/reference/offline-stores/README.md @@ -1,6 +1,6 @@ # Offline stores -Please see [Offline Store](../../getting-started/architecture-and-components/offline-store.md) for a conceptual explanation of offline stores. +Please see [Offline Store](../../getting-started/components/offline-store.md) for a conceptual explanation of offline stores. {% content-ref url="overview.md" %} [overview.md](overview.md) diff --git a/docs/reference/online-stores/README.md b/docs/reference/online-stores/README.md index 0acf6701f9..bf5419b249 100644 --- a/docs/reference/online-stores/README.md +++ b/docs/reference/online-stores/README.md @@ -1,6 +1,6 @@ # Online stores -Please see [Online Store](../../getting-started/architecture-and-components/online-store.md) for an explanation of online stores. +Please see [Online Store](../../getting-started/components/online-store.md) for an explanation of online stores. {% content-ref url="overview.md" %} [overview.md](overview.md) diff --git a/docs/reference/providers/README.md b/docs/reference/providers/README.md index 20686a1e14..925ae8ebc1 100644 --- a/docs/reference/providers/README.md +++ b/docs/reference/providers/README.md @@ -1,6 +1,6 @@ # Providers -Please see [Provider](../../getting-started/architecture-and-components/provider.md) for an explanation of providers. +Please see [Provider](../../getting-started/components/provider.md) for an explanation of providers. {% page-ref page="local.md" %} diff --git a/sdk/python/feast/templates/gcp/README.md b/sdk/python/feast/templates/gcp/README.md index 7929dc2bdf..bc9e51769c 100644 --- a/sdk/python/feast/templates/gcp/README.md +++ b/sdk/python/feast/templates/gcp/README.md @@ -11,7 +11,7 @@ You can run the overall workflow with `python test_workflow.py`. ## To move from this into a more production ready workflow: 1. `feature_store.yaml` points to a local file as a registry. You'll want to setup a remote file (e.g. in S3/GCS) or a SQL registry. See [registry docs](https://docs.feast.dev/getting-started/concepts/registry) for more details. -2. This example uses an already setup BigQuery Feast data warehouse as the [offline store](https://docs.feast.dev/getting-started/architecture-and-components/offline-store) +2. This example uses an already setup BigQuery Feast data warehouse as the [offline store](https://docs.feast.dev/getting-started/components/offline-store) to generate training data. You'll need to connect your own BigQuery instance to make this work. 3. Setup CI/CD + dev vs staging vs prod environments to automatically update the registry as you change Feast feature definitions. See [docs](https://docs.feast.dev/how-to-guides/running-feast-in-production#1.-automatically-deploying-changes-to-your-feature-definitions). 4. (optional) Regularly scheduled materialization to power low latency feature retrieval (e.g. via Airflow). See [Batch data ingestion](https://docs.feast.dev/getting-started/concepts/data-ingestion#batch-data-ingestion) diff --git a/sdk/python/feast/templates/local/README.md b/sdk/python/feast/templates/local/README.md index daf3a686fb..1e617cc442 100644 --- a/sdk/python/feast/templates/local/README.md +++ b/sdk/python/feast/templates/local/README.md @@ -18,7 +18,7 @@ You can run the overall workflow with `python test_workflow.py`. - You can see your options if you run `feast init --help`. 2. `feature_store.yaml` points to a local file as a registry. You'll want to setup a remote file (e.g. in S3/GCS) or a SQL registry. See [registry docs](https://docs.feast.dev/getting-started/concepts/registry) for more details. -3. This example uses a file [offline store](https://docs.feast.dev/getting-started/architecture-and-components/offline-store) +3. This example uses a file [offline store](https://docs.feast.dev/getting-started/components/offline-store) to generate training data. It does not scale. We recommend instead using a data warehouse such as BigQuery, Snowflake, Redshift. There is experimental support for Spark as well. 4. Setup CI/CD + dev vs staging vs prod environments to automatically update the registry as you change Feast feature definitions. See [docs](https://docs.feast.dev/how-to-guides/running-feast-in-production#1.-automatically-deploying-changes-to-your-feature-definitions).