From 8204b8c2869e573fe595cf0334fd187aecd69b9c Mon Sep 17 00:00:00 2001 From: Francisco Javier Arceo Date: Wed, 14 Aug 2024 08:15:44 -0400 Subject: [PATCH 01/21] merging changes Signed-off-by: Francisco Javier Arceo --- .../push-vs-pull-model.md | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/docs/getting-started/architecture-and-components/push-vs-pull-model.md b/docs/getting-started/architecture-and-components/push-vs-pull-model.md index a1f404221b..7df77c67b7 100644 --- a/docs/getting-started/architecture-and-components/push-vs-pull-model.md +++ b/docs/getting-started/architecture-and-components/push-vs-pull-model.md @@ -6,15 +6,23 @@ in the online store, to serve features in real-time. In a [Pull Model](https://en.wikipedia.org/wiki/Pull_technology), Feast would pull data from the data producers at request time and store the feature values in -the online store before serving them (storing them would actually be unneccessary). +the online store before serving them (storing them would actually be unnecessary). This approach would incur additional network latency as Feast would need to orchestrate a request to each data producer, which would mean the latency would be at least as long as your slowest call. So, in order to serve features as fast as possible, we push data to Feast and store the feature values in the online store. -The trade-off with the Push Model is that strong consistency is not gauranteed out +The trade-off with the Push Model is that strong consistency is not guaranteed out of the box. Instead, stong consistency has to be explicitly designed for in orchestrating the updates to Feast and the client usage. The significant advantage with this approach is that Feast is read-optimized for low-latency -feature retrieval. \ No newline at end of file +feature retrieval. + +# How to Push + +Implicit in the Push model are decisions about _how_ and _when_ to push feature values to the online store. + +From a developer's perspective, there are three ways to push feature values to the online store with different tradeoffs. + +They are discussed further in the [Write Patterns](getting-started/architecture-and-components/write-patterns.md) section. From 114c8594c698eeb8fe97c6872fa5f78d37d1e71b Mon Sep 17 00:00:00 2001 From: Francisco Javier Arceo Date: Wed, 14 Aug 2024 08:16:38 -0400 Subject: [PATCH 02/21] merging Signed-off-by: Francisco Javier Arceo --- .../write-patterns.md | 27 +++++++++++++++++++ 1 file changed, 27 insertions(+) create mode 100644 docs/getting-started/architecture-and-components/write-patterns.md diff --git a/docs/getting-started/architecture-and-components/write-patterns.md b/docs/getting-started/architecture-and-components/write-patterns.md new file mode 100644 index 0000000000..4ef5661256 --- /dev/null +++ b/docs/getting-started/architecture-and-components/write-patterns.md @@ -0,0 +1,27 @@ +# Writing Data to Feast + +Feast uses a [Push Model](getting-started/architecture-and-components/push-vs-pull-model.md) to push features to the online store. + +This means Data Producers (i.e., services that generate data) have to push data to Feast. +Said another way, users have to send Feast data to Feast so Feast can write it to the online store. + +## Communication Patterns + +There are two ways to *_send_* data to the online store: + +1. Synchronously + - Using an API call for a small number of entities or a single entity +2. Asynchronously + - Using an API call for a small number of entities or a single entity + - Using a "batch job" for a large number of entities + +It is worth noting that, in some contexts, developers may "batch" a group of entities together and write them to the +online store in a single API call. This is a common pattern when writing data to the online store to reduce write loads +but this would not qualify as a batch job. + +## Feature Value Write Patterns +There are two ways to write feature values to the online store: + +1. Precomputing the transformations +2. Computing the transformations "On Demand" + From 0e267380b76bbf4efee9d99fba3810045b4079f0 Mon Sep 17 00:00:00 2001 From: Francisco Javier Arceo Date: Wed, 14 Aug 2024 08:17:27 -0400 Subject: [PATCH 03/21] updated Signed-off-by: Francisco Javier Arceo --- .../architecture-and-components/push-vs-pull-model.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/getting-started/architecture-and-components/push-vs-pull-model.md b/docs/getting-started/architecture-and-components/push-vs-pull-model.md index 7df77c67b7..9b20b622e3 100644 --- a/docs/getting-started/architecture-and-components/push-vs-pull-model.md +++ b/docs/getting-started/architecture-and-components/push-vs-pull-model.md @@ -13,7 +13,7 @@ your slowest call. So, in order to serve features as fast as possible, we push d Feast and store the feature values in the online store. The trade-off with the Push Model is that strong consistency is not guaranteed out -of the box. Instead, stong consistency has to be explicitly designed for in orchestrating +of the box. Instead, strong consistency has to be explicitly designed for in orchestrating the updates to Feast and the client usage. The significant advantage with this approach is that Feast is read-optimized for low-latency From 2768ad6f0f408087254eff7b7e48aea1a51bfb89 Mon Sep 17 00:00:00 2001 From: Francisco Arceo Date: Sun, 11 Aug 2024 12:15:28 -0400 Subject: [PATCH 04/21] Update write-patterns.md Signed-off-by: Francisco Javier Arceo --- .../write-patterns.md | 24 ++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/docs/getting-started/architecture-and-components/write-patterns.md b/docs/getting-started/architecture-and-components/write-patterns.md index 4ef5661256..bc5bbfdc30 100644 --- a/docs/getting-started/architecture-and-components/write-patterns.md +++ b/docs/getting-started/architecture-and-components/write-patterns.md @@ -20,8 +20,30 @@ online store in a single API call. This is a common pattern when writing data to but this would not qualify as a batch job. ## Feature Value Write Patterns -There are two ways to write feature values to the online store: +There are two ways to write *feature values* to the online store: 1. Precomputing the transformations 2. Computing the transformations "On Demand" +## Tradeoffs + +When deciding between synchronous and asynchronous data writes, several tradeoffs related to data consistency and operational impacts should be considered: + +- **Data Consistency**: Asynchronous writes allow data producers to send data without waiting for the write operation to complete, which can lead to situations where the data in the online store is stale. This might be acceptable in scenarios where absolute freshness is not critical. However, for critical operations, such as calculating loan amounts in financial applications, stale data can lead to incorrect decisions, making synchronous writes essential. +- **Service Coupling**: Synchronous writes result in tighter coupling between services. If a write operation fails, it can cause the dependent service operation to fail as well, which might be a significant drawback in systems requiring high reliability and independence between services. +- **Application Latency**: Asynchronous writes typically reduce the perceived latency from the client's perspective because the client does not wait for the write operation to complete. This can enhance the user experience and efficiency in environments where operations are not critically dependent on immediate data freshness. +- **Correctness**: The risk of data being out-of-date must be weighed against the operational requirements. For instance, in a lending application, having up-to-date feature data can be crucial for correctness (depending upon the features and raw data), thus favoring synchronous writes. In less sensitive contexts, the eventual consistency offered by asynchronous writes might be sufficient. + +## Decision Matrix + +Given these considerations, the following matrix can help decide the most appropriate data write and feature computation strategies based on specific application needs and data sensitivity: + +| Scenario | Data Write Type | Feature Computation | Recommended Approach | +|----------|-----------------|---------------------|----------------------| +| Real-time, high-stakes decision making | Synchronous | On Demand | Use synchronous writes with on-demand feature computation to ensure data freshness and correctness. | +| High volume, non-critical data processing | Asynchronous | Precomputed | Use asynchronous batch jobs with precomputed transformations for efficiency and scalability. | +| User-facing applications requiring quick feedback | Synchronous | Precomputed | Use synchronous writes with precomputed features to reduce latency and improve user experience. | +| Data-intensive applications tolerant to staleness | Asynchronous | On Demand | Opt for asynchronous writes with on-demand computation to balance load and manage resource usage efficiently. | + +Each scenario balances the tradeoffs differently, depending on the application's tolerance for staleness versus its need for immediacy and accuracy. + From 29d9df6337e990c529d3ac1781721cbdd5ffff8f Mon Sep 17 00:00:00 2001 From: Francisco Arceo Date: Sun, 11 Aug 2024 23:52:16 -0400 Subject: [PATCH 05/21] Update write-patterns.md Signed-off-by: Francisco Javier Arceo --- .../write-patterns.md | 41 +++++++++++-------- 1 file changed, 23 insertions(+), 18 deletions(-) diff --git a/docs/getting-started/architecture-and-components/write-patterns.md b/docs/getting-started/architecture-and-components/write-patterns.md index bc5bbfdc30..5b9f67a849 100644 --- a/docs/getting-started/architecture-and-components/write-patterns.md +++ b/docs/getting-started/architecture-and-components/write-patterns.md @@ -2,8 +2,11 @@ Feast uses a [Push Model](getting-started/architecture-and-components/push-vs-pull-model.md) to push features to the online store. -This means Data Producers (i.e., services that generate data) have to push data to Feast. -Said another way, users have to send Feast data to Feast so Feast can write it to the online store. +This has two important consequences: (1) communication patterns between the data producer and Feast and (2) feature computation and +_feature value_ write patterns to Feast's online store. + +Data Producers (i.e., services that generate data) send data to Feast so Feast can write it to the online store. That data can +be either raw data where Feast computes and store stores the feature values or precomputed feature values. ## Communication Patterns @@ -15,35 +18,37 @@ There are two ways to *_send_* data to the online store: - Using an API call for a small number of entities or a single entity - Using a "batch job" for a large number of entities -It is worth noting that, in some contexts, developers may "batch" a group of entities together and write them to the -online store in a single API call. This is a common pattern when writing data to the online store to reduce write loads -but this would not qualify as a batch job. +Note, in some contexts, developers may "batch" a group of entities together and write them to the online store in a +single API call. This is a common pattern when writing data to the online store to reduce write loads but we would +not qualify this as a batch job. ## Feature Value Write Patterns There are two ways to write *feature values* to the online store: 1. Precomputing the transformations -2. Computing the transformations "On Demand" +2. Computing the transformations On Demand + +### 1. Precomputing the transformations +Precomputed transformations can happen outside of Feast (e.g., via some batch job or streaming application) or inside of the Feast feature server when writing to the online store via the `write-to-online-store` api. + +### 2. Computing the transformations "On Demand" +On Demand transformations can only happen inside of Feast at either (1) the time of the client's request or (2) when the data producer writes to the online store. In some cases, a blend of both may be optimal. ## Tradeoffs When deciding between synchronous and asynchronous data writes, several tradeoffs related to data consistency and operational impacts should be considered: - **Data Consistency**: Asynchronous writes allow data producers to send data without waiting for the write operation to complete, which can lead to situations where the data in the online store is stale. This might be acceptable in scenarios where absolute freshness is not critical. However, for critical operations, such as calculating loan amounts in financial applications, stale data can lead to incorrect decisions, making synchronous writes essential. +- **Correctness**: The risk of data being out-of-date must be weighed against the operational requirements. For instance, in a lending application, having up-to-date feature data can be crucial for correctness (depending upon the features and raw data), thus favoring synchronous writes. In less sensitive contexts, the eventual consistency offered by asynchronous writes might be sufficient. - **Service Coupling**: Synchronous writes result in tighter coupling between services. If a write operation fails, it can cause the dependent service operation to fail as well, which might be a significant drawback in systems requiring high reliability and independence between services. - **Application Latency**: Asynchronous writes typically reduce the perceived latency from the client's perspective because the client does not wait for the write operation to complete. This can enhance the user experience and efficiency in environments where operations are not critically dependent on immediate data freshness. -- **Correctness**: The risk of data being out-of-date must be weighed against the operational requirements. For instance, in a lending application, having up-to-date feature data can be crucial for correctness (depending upon the features and raw data), thus favoring synchronous writes. In less sensitive contexts, the eventual consistency offered by asynchronous writes might be sufficient. - -## Decision Matrix -Given these considerations, the following matrix can help decide the most appropriate data write and feature computation strategies based on specific application needs and data sensitivity: +Given these considerations, the table below can help guide the most appropriate data write and feature computation strategies based on specific application needs and data sensitivity. -| Scenario | Data Write Type | Feature Computation | Recommended Approach | +| Data Write Type | Feature Computation | Scenario | Recommended Approach | |----------|-----------------|---------------------|----------------------| -| Real-time, high-stakes decision making | Synchronous | On Demand | Use synchronous writes with on-demand feature computation to ensure data freshness and correctness. | -| High volume, non-critical data processing | Asynchronous | Precomputed | Use asynchronous batch jobs with precomputed transformations for efficiency and scalability. | -| User-facing applications requiring quick feedback | Synchronous | Precomputed | Use synchronous writes with precomputed features to reduce latency and improve user experience. | -| Data-intensive applications tolerant to staleness | Asynchronous | On Demand | Opt for asynchronous writes with on-demand computation to balance load and manage resource usage efficiently. | - -Each scenario balances the tradeoffs differently, depending on the application's tolerance for staleness versus its need for immediacy and accuracy. - +| Asynchronous | On Demand | Data-intensive applications tolerant to staleness | Opt for asynchronous writes with on-demand computation to balance load and manage resource usage efficiently. | +| Asynchronous | Precomputed | High volume, non-critical data processing | Use asynchronous batch jobs with precomputed transformations for efficiency and scalability. | +| Synchronous | On Demand | High-stakes decision making | Use synchronous writes with on-demand feature computation to ensure data freshness and correctness. | +| Synchronous | Precomputed | User-facing applications requiring quick feedback | Use synchronous writes with precomputed features to reduce latency and improve user experience. | +| Synchronous | Precomputed + On Demand | High-stakes decision making that want to optimize for latency under constraints| Use synchronous writes with precomputed features where possible and a select set of on demand computations to reduce latency and improve user experience. | From 578ad7eaa6efbc34ca5391d4f9a1d9c2164bfde3 Mon Sep 17 00:00:00 2001 From: Francisco Arceo Date: Mon, 12 Aug 2024 00:09:10 -0400 Subject: [PATCH 06/21] Update write-patterns.md Signed-off-by: Francisco Javier Arceo --- .../write-patterns.md | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/docs/getting-started/architecture-and-components/write-patterns.md b/docs/getting-started/architecture-and-components/write-patterns.md index 5b9f67a849..e3e8488553 100644 --- a/docs/getting-started/architecture-and-components/write-patterns.md +++ b/docs/getting-started/architecture-and-components/write-patterns.md @@ -23,17 +23,31 @@ single API call. This is a common pattern when writing data to the online store not qualify this as a batch job. ## Feature Value Write Patterns + +Writing feature values to the online store can be done in two ways: Precomputing the transformations or Computing the transformations On Demand. + +### Combining Approaches + +In some advanced scenarios, a combination of precomputed and On Demand transformations might be optimal. For example, base feature calculations that do not change often could be precomputed and stored, while more dynamic calculations based on real-time data can be computed on demand. This hybrid approach can help balance the load on compute resources while ensuring feature freshness where it matters most. + +When selecting a feature value write pattern, consider the specific requirements of your application, such as the need for real-time data, the acceptable level of latency, and the computational resources available. Making an informed choice can significantly enhance the performance and reliability of your feature store operations. + + There are two ways to write *feature values* to the online store: 1. Precomputing the transformations 2. Computing the transformations On Demand +3. Hybrid (Precomputed + On Demand) ### 1. Precomputing the transformations Precomputed transformations can happen outside of Feast (e.g., via some batch job or streaming application) or inside of the Feast feature server when writing to the online store via the `write-to-online-store` api. -### 2. Computing the transformations "On Demand" +### 2. Computing the transformations On Demand On Demand transformations can only happen inside of Feast at either (1) the time of the client's request or (2) when the data producer writes to the online store. In some cases, a blend of both may be optimal. +### 3. Hybrid (Precomputed + On Demand) +The hybrid approach allows for precomputed transformations to happen inside or outside of Feast and have the On Demand transformations happen at client request time. This is particularly convenient for "Time Since Last" types of features (e.g., time since last payment). + ## Tradeoffs When deciding between synchronous and asynchronous data writes, several tradeoffs related to data consistency and operational impacts should be considered: @@ -51,4 +65,4 @@ Given these considerations, the table below can help guide the most appropriate | Asynchronous | Precomputed | High volume, non-critical data processing | Use asynchronous batch jobs with precomputed transformations for efficiency and scalability. | | Synchronous | On Demand | High-stakes decision making | Use synchronous writes with on-demand feature computation to ensure data freshness and correctness. | | Synchronous | Precomputed | User-facing applications requiring quick feedback | Use synchronous writes with precomputed features to reduce latency and improve user experience. | -| Synchronous | Precomputed + On Demand | High-stakes decision making that want to optimize for latency under constraints| Use synchronous writes with precomputed features where possible and a select set of on demand computations to reduce latency and improve user experience. | +| Synchronous | Hybrid (Precomputed + On Demand) | High-stakes decision making that want to optimize for latency under constraints| Use synchronous writes with precomputed features where possible and a select set of on demand computations to reduce latency and improve user experience. | From cd819f2d51e130af584bd3bab8c7d9484ebbf38b Mon Sep 17 00:00:00 2001 From: Francisco Arceo Date: Tue, 13 Aug 2024 06:06:45 -0400 Subject: [PATCH 07/21] Update write-patterns.md Adding some clarity. Signed-off-by: Francisco Javier Arceo --- .../write-patterns.md | 31 +++++++++---------- 1 file changed, 15 insertions(+), 16 deletions(-) diff --git a/docs/getting-started/architecture-and-components/write-patterns.md b/docs/getting-started/architecture-and-components/write-patterns.md index e3e8488553..4a5fdff553 100644 --- a/docs/getting-started/architecture-and-components/write-patterns.md +++ b/docs/getting-started/architecture-and-components/write-patterns.md @@ -2,21 +2,21 @@ Feast uses a [Push Model](getting-started/architecture-and-components/push-vs-pull-model.md) to push features to the online store. -This has two important consequences: (1) communication patterns between the data producer and Feast and (2) feature computation and +This has two important consequences: (1) communication patterns between the Data Producer (i.e., the client) and Feast (i.e,. the server) and (2) feature computation and _feature value_ write patterns to Feast's online store. Data Producers (i.e., services that generate data) send data to Feast so Feast can write it to the online store. That data can -be either raw data where Feast computes and store stores the feature values or precomputed feature values. +be either raw data where Feast computes and stores the feature values or precomputed feature values. ## Communication Patterns -There are two ways to *_send_* data to the online store: +There are two ways a client (or Data Producer) can *_send_* data to the online store: 1. Synchronously - Using an API call for a small number of entities or a single entity 2. Asynchronously - - Using an API call for a small number of entities or a single entity - - Using a "batch job" for a large number of entities + - Using an API call for a small number of entities or a single entity (e.g., using the [`push` or `write_to_online_store` methods](https://docs.feast.dev/reference/data-sources/push#pushing-data)) or the Feature Server's [`push` endpoint](https://docs.feast.dev/reference/feature-servers/python-feature-server#pushing-features-to-the-online-and-offline-stores)) + - Using a "batch job" for a large number of entities (e.g., using a [batch materialization engine]([url](https://docs.feast.dev/getting-started/architecture-and-components/batch-materialization-engine))) Note, in some contexts, developers may "batch" a group of entities together and write them to the online store in a single API call. This is a common pattern when writing data to the online store to reduce write loads but we would @@ -24,40 +24,39 @@ not qualify this as a batch job. ## Feature Value Write Patterns -Writing feature values to the online store can be done in two ways: Precomputing the transformations or Computing the transformations On Demand. +Writing feature values to the online store (i.e., the server) can be done in two ways: Precomputing the transformations client-side or Computing the transformations On Demand server-side. ### Combining Approaches -In some advanced scenarios, a combination of precomputed and On Demand transformations might be optimal. For example, base feature calculations that do not change often could be precomputed and stored, while more dynamic calculations based on real-time data can be computed on demand. This hybrid approach can help balance the load on compute resources while ensuring feature freshness where it matters most. - -When selecting a feature value write pattern, consider the specific requirements of your application, such as the need for real-time data, the acceptable level of latency, and the computational resources available. Making an informed choice can significantly enhance the performance and reliability of your feature store operations. +In some advanced scenarios, a combination of precomputed and On Demand transformations might be optimal. +When selecting feature value write patterns, one must consider the specific requirements of your application, the acceptable correctness of the data, the tolerance for latency, and the computational resources available. Making an informed choice can significantly help the performance and reliability of your feature store service. -There are two ways to write *feature values* to the online store: +There are two ways the client can write *feature values* to the online store: 1. Precomputing the transformations 2. Computing the transformations On Demand 3. Hybrid (Precomputed + On Demand) ### 1. Precomputing the transformations -Precomputed transformations can happen outside of Feast (e.g., via some batch job or streaming application) or inside of the Feast feature server when writing to the online store via the `write-to-online-store` api. +Precomputed transformations can happen outside of Feast (e.g., via some batch job or streaming application) or inside of the Feast feature server when writing to the online store via the `push` or `write-to-online-store` api. ### 2. Computing the transformations On Demand -On Demand transformations can only happen inside of Feast at either (1) the time of the client's request or (2) when the data producer writes to the online store. In some cases, a blend of both may be optimal. +On Demand transformations can only happen inside of Feast at either (1) the time of the client's request or (2) when the data producer writes to the online store. ### 3. Hybrid (Precomputed + On Demand) -The hybrid approach allows for precomputed transformations to happen inside or outside of Feast and have the On Demand transformations happen at client request time. This is particularly convenient for "Time Since Last" types of features (e.g., time since last payment). +The hybrid approach allows for precomputed transformations to happen inside or outside of Feast and have the On Demand transformations happen at client request time. This is particularly convenient for "Time Since Last" types of features (e.g., time since purchase). ## Tradeoffs -When deciding between synchronous and asynchronous data writes, several tradeoffs related to data consistency and operational impacts should be considered: +When deciding between synchronous and asynchronous data writes, several tradeoffs should be considered: -- **Data Consistency**: Asynchronous writes allow data producers to send data without waiting for the write operation to complete, which can lead to situations where the data in the online store is stale. This might be acceptable in scenarios where absolute freshness is not critical. However, for critical operations, such as calculating loan amounts in financial applications, stale data can lead to incorrect decisions, making synchronous writes essential. +- **Data Consistency**: Asynchronous writes allow Data Producers to send data without waiting for the write operation to complete, which can lead to situations where the data in the online store is stale. This might be acceptable in scenarios where absolute freshness is not critical. However, for critical operations, such as calculating loan amounts in financial applications, stale data can lead to incorrect decisions, making synchronous writes essential. - **Correctness**: The risk of data being out-of-date must be weighed against the operational requirements. For instance, in a lending application, having up-to-date feature data can be crucial for correctness (depending upon the features and raw data), thus favoring synchronous writes. In less sensitive contexts, the eventual consistency offered by asynchronous writes might be sufficient. - **Service Coupling**: Synchronous writes result in tighter coupling between services. If a write operation fails, it can cause the dependent service operation to fail as well, which might be a significant drawback in systems requiring high reliability and independence between services. - **Application Latency**: Asynchronous writes typically reduce the perceived latency from the client's perspective because the client does not wait for the write operation to complete. This can enhance the user experience and efficiency in environments where operations are not critically dependent on immediate data freshness. -Given these considerations, the table below can help guide the most appropriate data write and feature computation strategies based on specific application needs and data sensitivity. +The table below can help guide the most appropriate data write and feature computation strategies based on specific application needs and data sensitivity. | Data Write Type | Feature Computation | Scenario | Recommended Approach | |----------|-----------------|---------------------|----------------------| From 7fd0aa2b92892cbdade0aa303c235e755a74be6b Mon Sep 17 00:00:00 2001 From: Francisco Arceo Date: Tue, 13 Aug 2024 06:48:40 -0400 Subject: [PATCH 08/21] Update write-patterns.md Signed-off-by: Francisco Javier Arceo --- .../write-patterns.md | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/getting-started/architecture-and-components/write-patterns.md b/docs/getting-started/architecture-and-components/write-patterns.md index 4a5fdff553..cef09d2779 100644 --- a/docs/getting-started/architecture-and-components/write-patterns.md +++ b/docs/getting-started/architecture-and-components/write-patterns.md @@ -5,7 +5,7 @@ Feast uses a [Push Model](getting-started/architecture-and-components/push-vs-pu This has two important consequences: (1) communication patterns between the Data Producer (i.e., the client) and Feast (i.e,. the server) and (2) feature computation and _feature value_ write patterns to Feast's online store. -Data Producers (i.e., services that generate data) send data to Feast so Feast can write it to the online store. That data can +Data Producers (i.e., services that generate data) send data to Feast so that Feast can write feature values to the online store. That data can be either raw data where Feast computes and stores the feature values or precomputed feature values. ## Communication Patterns @@ -13,10 +13,10 @@ be either raw data where Feast computes and stores the feature values or precomp There are two ways a client (or Data Producer) can *_send_* data to the online store: 1. Synchronously - - Using an API call for a small number of entities or a single entity + - Using a synchronous API call for a small number of entities or a single entity (e.g., using the [`push` or `write_to_online_store` methods](https://docs.feast.dev/reference/data-sources/push#pushing-data)) or the Feature Server's [`push` endpoint](https://docs.feast.dev/reference/feature-servers/python-feature-server#pushing-features-to-the-online-and-offline-stores)) 2. Asynchronously - - Using an API call for a small number of entities or a single entity (e.g., using the [`push` or `write_to_online_store` methods](https://docs.feast.dev/reference/data-sources/push#pushing-data)) or the Feature Server's [`push` endpoint](https://docs.feast.dev/reference/feature-servers/python-feature-server#pushing-features-to-the-online-and-offline-stores)) - - Using a "batch job" for a large number of entities (e.g., using a [batch materialization engine]([url](https://docs.feast.dev/getting-started/architecture-and-components/batch-materialization-engine))) + - Using an asynchronous API call for a small number of entities or a single entity (e.g., using the [`push` or `write_to_online_store` methods](https://docs.feast.dev/reference/data-sources/push#pushing-data)) or the Feature Server's [`push` endpoint](https://docs.feast.dev/reference/feature-servers/python-feature-server#pushing-features-to-the-online-and-offline-stores)) + - Using a "batch job" for a large number of entities (e.g., using a [batch materialization engine](https://docs.feast.dev/getting-started/architecture-and-components/batch-materialization-engine)) Note, in some contexts, developers may "batch" a group of entities together and write them to the online store in a single API call. This is a common pattern when writing data to the online store to reduce write loads but we would @@ -28,20 +28,20 @@ Writing feature values to the online store (i.e., the server) can be done in two ### Combining Approaches -In some advanced scenarios, a combination of precomputed and On Demand transformations might be optimal. +In some scenarios, a combination of Precomputed and On Demand transformations may be optimal. -When selecting feature value write patterns, one must consider the specific requirements of your application, the acceptable correctness of the data, the tolerance for latency, and the computational resources available. Making an informed choice can significantly help the performance and reliability of your feature store service. +When selecting feature value write patterns, one must consider the specific requirements of your application, the acceptable correctness of the data, the latency tolerance, and the computational resources available. Making deliberate choices can help the performance and reliability of your service. There are two ways the client can write *feature values* to the online store: -1. Precomputing the transformations -2. Computing the transformations On Demand +1. Precomputing transformations +2. Computing transformations On Demand 3. Hybrid (Precomputed + On Demand) -### 1. Precomputing the transformations +### 1. Precomputing Transformations Precomputed transformations can happen outside of Feast (e.g., via some batch job or streaming application) or inside of the Feast feature server when writing to the online store via the `push` or `write-to-online-store` api. -### 2. Computing the transformations On Demand +### 2. Computing Transformations On Demand On Demand transformations can only happen inside of Feast at either (1) the time of the client's request or (2) when the data producer writes to the online store. ### 3. Hybrid (Precomputed + On Demand) From cd4d108965af23ddebef9a8fca0ffdf7023fdb91 Mon Sep 17 00:00:00 2001 From: Francisco Arceo Date: Tue, 13 Aug 2024 20:57:46 -0400 Subject: [PATCH 09/21] chore: Update feature-transformetion.md (#4405) Signed-off-by: Francisco Javier Arceo --- .../feature-transformetion.md | 24 +++++++++++++++++++ 1 file changed, 24 insertions(+) create mode 100644 docs/getting-started/architecture-and-components/feature-transformetion.md diff --git a/docs/getting-started/architecture-and-components/feature-transformetion.md b/docs/getting-started/architecture-and-components/feature-transformetion.md new file mode 100644 index 0000000000..a0d533a33d --- /dev/null +++ b/docs/getting-started/architecture-and-components/feature-transformetion.md @@ -0,0 +1,24 @@ +# Feature Transformation + +A feature transformation is a function that takes some set of input data and +returns some set of output data. + +Feature transformations can happen on either raw data or derived data. + +Festure transformations can be executed by three types of "transformation +engines": + +1. The Feast Feature Server +2. An Offline Store (e.g., Snowflake or Spark) +3. A Stream processor + +The three transformation engines are coupled with the communication pattern used +for writes. + +Importantly, this implies that different feature transformation code may be +used under different transformation engines, so understanding the tradeoffs of +when to use which transformation engine/network call is extremely critical to +the success of your implementation. + +In general, we recommend transformation engines and network calls to be chosen by +aligning it with what's most appropriate for the data producer and feature usage. \ No newline at end of file From 8be0d1b342564e2cb9bdfea8e3ec522e6bf57322 Mon Sep 17 00:00:00 2001 From: Francisco Javier Arceo Date: Wed, 14 Aug 2024 08:28:48 -0400 Subject: [PATCH 10/21] updated readme Signed-off-by: Francisco Javier Arceo --- docs/README.md | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/docs/README.md b/docs/README.md index d391069429..f371807e84 100644 --- a/docs/README.md +++ b/docs/README.md @@ -2,7 +2,16 @@ ## What is Feast? -Feast (**Fea**ture **St**ore) is a customizable operational data system that re-uses existing infrastructure to manage and serve machine learning features to realtime models. +Feast (**Fea**ture **St**ore) is an [open-source](https://github.com/feast-dev/feast) feature store that helps teams +operate production ML systems at scale by allowing them to define, manage, validate, and serve features for production +AI/ML. + +Feast's feature store is composed of two foundational components: (1) an offline store for historical feature +extraction used in model training and an (2) online store for feature retrieval for low latency feature serving in +production systems and applications. + +Feast is a configurable operational data system that re-uses existing infrastructure to manage and serve machine learning +features to realtime models. For more details please review our [architecture](getting-started/architecture-and-components/overview.md). Feast allows ML platform teams to: @@ -41,7 +50,7 @@ Feast is likely **not** the right tool if you ### Feast is not -* **an** [**ETL**](https://en.wikipedia.org/wiki/Extract,\_transform,\_load) / [**ELT**](https://en.wikipedia.org/wiki/Extract,\_load,\_transform) **system:** Feast is not (and does not plan to become) a general purpose data transformation or pipelining system. Users often leverage tools like [dbt](https://www.getdbt.com) to manage upstream data transformations. +* **an** [**ETL**](https://en.wikipedia.org/wiki/Extract,\_transform,\_load) / [**ELT**](https://en.wikipedia.org/wiki/Extract,\_load,\_transform) **system.** Feast is not a general purpose data pipelining system. Users often leverage tools like [dbt](https://www.getdbt.com) to manage upstream data transformations. Feast does support some [transformations](getting-started/architecture-and-components/feature-transformetion.md). * **a data orchestration tool:** Feast does not manage or orchestrate complex workflow DAGs. It relies on upstream data pipelines to produce feature values and integrations with tools like [Airflow](https://airflow.apache.org) to make features consistently available. * **a data warehouse:** Feast is not a replacement for your data warehouse or the source of truth for all transformed data in your organization. Rather, Feast is a light-weight downstream layer that can serve data from an existing data warehouse (or other data sources) to models in production. * **a database:** Feast is not a database, but helps manage data stored in other systems (e.g. BigQuery, Snowflake, DynamoDB, Redis) to make features consistently available at training / serving time From 30f96c4d189f9e3aa1a2895a483736678e0e096d Mon Sep 17 00:00:00 2001 From: Francisco Javier Arceo Date: Wed, 14 Aug 2024 08:33:25 -0400 Subject: [PATCH 11/21] updated summary and readme Signed-off-by: Francisco Javier Arceo --- docs/SUMMARY.md | 1 + .../architecture-and-components/README.md | 12 ++++++++---- ...e-transformetion.md => feature-transformation.md} | 0 3 files changed, 9 insertions(+), 4 deletions(-) rename docs/getting-started/architecture-and-components/{feature-transformetion.md => feature-transformation.md} (100%) diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index 87c3626254..886a1ced5d 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -21,6 +21,7 @@ * [Overview](getting-started/architecture-and-components/overview.md) * [Language](getting-started/architecture-and-components/language.md) * [Push vs Pull Model](getting-started/architecture-and-components/push-vs-pull-model.md) + * [Feature Transformation](getting-started/architecture-and-components/feature-transformation.md) * [Registry](getting-started/architecture-and-components/registry.md) * [Offline store](getting-started/architecture-and-components/offline-store.md) * [Online store](getting-started/architecture-and-components/online-store.md) diff --git a/docs/getting-started/architecture-and-components/README.md b/docs/getting-started/architecture-and-components/README.md index 050a430c97..ffee5134bc 100644 --- a/docs/getting-started/architecture-and-components/README.md +++ b/docs/getting-started/architecture-and-components/README.md @@ -1,17 +1,21 @@ # Architecture -{% content-ref url="language.md" %} -[language.md](language.md) -{% endcontent-ref %} - {% content-ref url="overview.md" %} [overview.md](overview.md) {% endcontent-ref %} +{% content-ref url="language.md" %} +[language.md](language.md) +{% endcontent-ref %} + {% content-ref url="push-vs-pull-model.md" %} [push-vs-pull-model.md](push-vs-pull-model.md) {% endcontent-ref %} +{% content-ref url="feature-transformation-model.md" %} +[feature-transformation.md](feature-transformation.md) +{% endcontent-ref %} + {% content-ref url="registry.md" %} [registry.md](registry.md) {% endcontent-ref %} diff --git a/docs/getting-started/architecture-and-components/feature-transformetion.md b/docs/getting-started/architecture-and-components/feature-transformation.md similarity index 100% rename from docs/getting-started/architecture-and-components/feature-transformetion.md rename to docs/getting-started/architecture-and-components/feature-transformation.md From 67d11b014914db9e62d6be17f92c3e0242e2517c Mon Sep 17 00:00:00 2001 From: Francisco Javier Arceo Date: Wed, 14 Aug 2024 08:51:56 -0400 Subject: [PATCH 12/21] updated docs Signed-off-by: Francisco Javier Arceo --- docs/README.md | 21 +++++++-------------- 1 file changed, 7 insertions(+), 14 deletions(-) diff --git a/docs/README.md b/docs/README.md index f371807e84..ddd5ca4fe6 100644 --- a/docs/README.md +++ b/docs/README.md @@ -6,9 +6,9 @@ Feast (**Fea**ture **St**ore) is an [open-source](https://github.com/feast-dev/f operate production ML systems at scale by allowing them to define, manage, validate, and serve features for production AI/ML. -Feast's feature store is composed of two foundational components: (1) an offline store for historical feature -extraction used in model training and an (2) online store for feature retrieval for low latency feature serving in -production systems and applications. +Feast's feature store is composed of two foundational components: (1) an [offline store](getting-started/architecture-and-components/offline-store.md) +for historical feature extraction used in model training and an (2) [online store](getting-started/architecture-and-components/online-store.md) +for feature retrieval for serving features in production systems and applications. Feast is a configurable operational data system that re-uses existing infrastructure to manage and serve machine learning features to realtime models. For more details please review our [architecture](getting-started/architecture-and-components/overview.md). @@ -32,19 +32,14 @@ serving system must make a request to the feature store to retrieve feature valu [this document](getting-started/architecture-and-components/push-vs-pull-model.md) for a more detailed discussion. {% endhint %} -{% hint style="info" %} -{% endhint %} - ## Who is Feast for? -Feast helps ML platform teams with DevOps experience productionize real-time models. Feast can also help these teams build towards a feature platform that improves collaboration between engineers and data scientists. +Feast helps ML platform/MLOps teams with DevOps experience productionize real-time models. Feast also helps these teams +build a feature platform that improves collaboration between data engineers, software engineers, machine learning +engineers, and data scientists. Feast is likely **not** the right tool if you - * are in an organization that’s just getting started with ML and is not yet sure what the business impact of ML is -* rely primarily on unstructured data -* need very low latency feature retrieval (e.g. p99 feature retrieval << 10ms) -* have a small team to support a large number of use cases ## What Feast is not? @@ -56,11 +51,9 @@ Feast is likely **not** the right tool if you * **a database:** Feast is not a database, but helps manage data stored in other systems (e.g. BigQuery, Snowflake, DynamoDB, Redis) to make features consistently available at training / serving time ### Feast does not _fully_ solve - * **reproducible model training / model backtesting / experiment management**: Feast captures feature and model metadata, but does not version-control datasets / labels or manage train / test splits. Other tools like [DVC](https://dvc.org/), [MLflow](https://www.mlflow.org/), and [Kubeflow](https://www.kubeflow.org/) are better suited for this. -* **batch + streaming feature engineering**: Feast primarily processes already transformed feature values but is investing in supporting batch and streaming transformations. +* **batch feature engineering**: Feast primarily processes already transformed feature values but is investing in supporting batch and streaming transformations. * **native streaming feature integration:** Feast enables users to push streaming features, but does not pull from streaming sources or manage streaming pipelines. -* **feature sharing**: Feast has experimental functionality to enable discovery and cataloguing of feature metadata with a [Feast web UI (alpha)](https://docs.feast.dev/reference/alpha-web-ui). Feast also has community contributed plugins with [DataHub](https://datahubproject.io/docs/generated/ingestion/sources/feast/) and [Amundsen](https://github.com/amundsen-io/amundsen/blob/4a9d60176767c4d68d1cad5b093320ea22e26a49/databuilder/databuilder/extractor/feast\_extractor.py). * **lineage:** Feast helps tie feature values to model versions, but is not a complete solution for capturing end-to-end lineage from raw data sources to model versions. Feast also has community contributed plugins with [DataHub](https://datahubproject.io/docs/generated/ingestion/sources/feast/) and [Amundsen](https://github.com/amundsen-io/amundsen/blob/4a9d60176767c4d68d1cad5b093320ea22e26a49/databuilder/databuilder/extractor/feast\_extractor.py). * **data quality / drift detection**: Feast has experimental integrations with [Great Expectations](https://greatexpectations.io/), but is not purpose built to solve data drift / data quality issues. This requires more sophisticated monitoring across data pipelines, served feature values, labels, and model versions. From 265791d245e327ae29538875ae12fdced3b8eeb5 Mon Sep 17 00:00:00 2001 From: Francisco Javier Arceo Date: Wed, 14 Aug 2024 09:14:52 -0400 Subject: [PATCH 13/21] Updated readme Signed-off-by: Francisco Javier Arceo --- docs/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/README.md b/docs/README.md index ddd5ca4fe6..bc859e7d7d 100644 --- a/docs/README.md +++ b/docs/README.md @@ -8,7 +8,7 @@ AI/ML. Feast's feature store is composed of two foundational components: (1) an [offline store](getting-started/architecture-and-components/offline-store.md) for historical feature extraction used in model training and an (2) [online store](getting-started/architecture-and-components/online-store.md) -for feature retrieval for serving features in production systems and applications. +for serving features at low-latency in production systems and applications. Feast is a configurable operational data system that re-uses existing infrastructure to manage and serve machine learning features to realtime models. For more details please review our [architecture](getting-started/architecture-and-components/overview.md). From 54c98266b58dccb036260ef06b9e435b015d36f3 Mon Sep 17 00:00:00 2001 From: Francisco Javier Arceo Date: Wed, 14 Aug 2024 09:30:49 -0400 Subject: [PATCH 14/21] updated more Signed-off-by: Francisco Javier Arceo --- docs/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/README.md b/docs/README.md index bc859e7d7d..07dc7d44df 100644 --- a/docs/README.md +++ b/docs/README.md @@ -52,7 +52,7 @@ Feast is likely **not** the right tool if you ### Feast does not _fully_ solve * **reproducible model training / model backtesting / experiment management**: Feast captures feature and model metadata, but does not version-control datasets / labels or manage train / test splits. Other tools like [DVC](https://dvc.org/), [MLflow](https://www.mlflow.org/), and [Kubeflow](https://www.kubeflow.org/) are better suited for this. -* **batch feature engineering**: Feast primarily processes already transformed feature values but is investing in supporting batch and streaming transformations. +* **batch feature engineering**: Feast supports on demand and streaming transformations. Feast is also investing in supporting batch transformations. * **native streaming feature integration:** Feast enables users to push streaming features, but does not pull from streaming sources or manage streaming pipelines. * **lineage:** Feast helps tie feature values to model versions, but is not a complete solution for capturing end-to-end lineage from raw data sources to model versions. Feast also has community contributed plugins with [DataHub](https://datahubproject.io/docs/generated/ingestion/sources/feast/) and [Amundsen](https://github.com/amundsen-io/amundsen/blob/4a9d60176767c4d68d1cad5b093320ea22e26a49/databuilder/databuilder/extractor/feast\_extractor.py). * **data quality / drift detection**: Feast has experimental integrations with [Great Expectations](https://greatexpectations.io/), but is not purpose built to solve data drift / data quality issues. This requires more sophisticated monitoring across data pipelines, served feature values, labels, and model versions. From bcce85f8cb7191319cec1aa26a6552412d69c577 Mon Sep 17 00:00:00 2001 From: Francisco Javier Arceo Date: Wed, 14 Aug 2024 09:35:51 -0400 Subject: [PATCH 15/21] updated transformation Signed-off-by: Francisco Javier Arceo --- docs/SUMMARY.md | 1 + .../feature-transformation.md | 22 ++++++++----------- 2 files changed, 10 insertions(+), 13 deletions(-) diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index 886a1ced5d..fba70cbfe7 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -21,6 +21,7 @@ * [Overview](getting-started/architecture-and-components/overview.md) * [Language](getting-started/architecture-and-components/language.md) * [Push vs Pull Model](getting-started/architecture-and-components/push-vs-pull-model.md) + * [Write Patterns](getting-started/architecture-and-components/write-patterns.md) * [Feature Transformation](getting-started/architecture-and-components/feature-transformation.md) * [Registry](getting-started/architecture-and-components/registry.md) * [Offline store](getting-started/architecture-and-components/offline-store.md) diff --git a/docs/getting-started/architecture-and-components/feature-transformation.md b/docs/getting-started/architecture-and-components/feature-transformation.md index a0d533a33d..8a0e6e2dcd 100644 --- a/docs/getting-started/architecture-and-components/feature-transformation.md +++ b/docs/getting-started/architecture-and-components/feature-transformation.md @@ -1,24 +1,20 @@ # Feature Transformation -A feature transformation is a function that takes some set of input data and -returns some set of output data. +A *feature transformation* is a function that takes some set of input data and +returns some set of output data. Feature transformations can happen on either raw data or derived data. -Feature transformations can happen on either raw data or derived data. - -Festure transformations can be executed by three types of "transformation -engines": +Feature transformations can be executed by three types of "transformation engines": 1. The Feast Feature Server -2. An Offline Store (e.g., Snowflake or Spark) -3. A Stream processor +2. An Offline Store (e.g., Snowflake, BigQuery, DuckDB, Spark, etc.) +3. A Stream processor (e.g., Flink or Spark Streaming) -The three transformation engines are coupled with the communication pattern used -for writes. +The three transformation engines are coupled with the [communication pattern used for writes](getting-started/architecture-and-components/write-patterns.md). Importantly, this implies that different feature transformation code may be used under different transformation engines, so understanding the tradeoffs of -when to use which transformation engine/network call is extremely critical to +when to use which transformation engine/communication pattern is extremely critical to the success of your implementation. -In general, we recommend transformation engines and network calls to be chosen by -aligning it with what's most appropriate for the data producer and feature usage. \ No newline at end of file +In general, we recommend transformation engines and network calls to be chosen by aligning it with what is most +appropriate for the data producer, feature/model usage, and overall product. \ No newline at end of file From d9e8ab7163b785aa621be4fad6fb1e6249fff3af Mon Sep 17 00:00:00 2001 From: Francisco Javier Arceo Date: Wed, 14 Aug 2024 09:38:34 -0400 Subject: [PATCH 16/21] updated urls Signed-off-by: Francisco Javier Arceo --- docs/getting-started/architecture-and-components/README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/getting-started/architecture-and-components/README.md b/docs/getting-started/architecture-and-components/README.md index ffee5134bc..9ee743846a 100644 --- a/docs/getting-started/architecture-and-components/README.md +++ b/docs/getting-started/architecture-and-components/README.md @@ -12,6 +12,10 @@ [push-vs-pull-model.md](push-vs-pull-model.md) {% endcontent-ref %} +{% content-ref url="write-patterns.md" %} +[write-patterns.md](write-patterns.md) +{% endcontent-ref %} + {% content-ref url="feature-transformation-model.md" %} [feature-transformation.md](feature-transformation.md) {% endcontent-ref %} From 2ef2cad3f855df035e3db34a4feb2fd0e378b721 Mon Sep 17 00:00:00 2001 From: Francisco Javier Arceo Date: Wed, 14 Aug 2024 12:27:07 -0400 Subject: [PATCH 17/21] refactoring and renaming architecture-and-components to components and architecture Signed-off-by: Francisco Javier Arceo --- docs/README.md | 12 +++++----- docs/SUMMARY.md | 23 ++++++++++--------- .../architecture-and-components/overview.md | 8 ------- .../feature-transformation.md | 2 +- .../language.md | 0 docs/getting-started/architecture/overview.md | 18 +++++++++++++++ .../push-vs-pull-model.md | 2 +- .../write-patterns.md | 8 +++---- docs/getting-started/concepts/dataset.md | 2 +- docs/getting-started/faq.md | 2 +- docs/getting-started/quickstart.md | 2 +- docs/how-to-guides/scaling-feast.md | 2 +- .../reference/batch-materialization/README.md | 2 +- docs/reference/codebase-structure.md | 2 +- docs/reference/offline-stores/README.md | 2 +- docs/reference/online-stores/README.md | 2 +- docs/reference/providers/README.md | 2 +- 17 files changed, 51 insertions(+), 40 deletions(-) rename docs/getting-started/{architecture-and-components => architecture}/feature-transformation.md (91%) rename docs/getting-started/{architecture-and-components => architecture}/language.md (100%) create mode 100644 docs/getting-started/architecture/overview.md rename docs/getting-started/{architecture-and-components => architecture}/push-vs-pull-model.md (96%) rename docs/getting-started/{architecture-and-components => architecture}/write-patterns.md (88%) diff --git a/docs/README.md b/docs/README.md index 07dc7d44df..6652eaddc8 100644 --- a/docs/README.md +++ b/docs/README.md @@ -6,12 +6,12 @@ Feast (**Fea**ture **St**ore) is an [open-source](https://github.com/feast-dev/f operate production ML systems at scale by allowing them to define, manage, validate, and serve features for production AI/ML. -Feast's feature store is composed of two foundational components: (1) an [offline store](getting-started/architecture-and-components/offline-store.md) -for historical feature extraction used in model training and an (2) [online store](getting-started/architecture-and-components/online-store.md) +Feast's feature store is composed of two foundational components: (1) an [offline store](getting-started/components/offline-store.md) +for historical feature extraction used in model training and an (2) [online store](getting-started/components/online-store.md) for serving features at low-latency in production systems and applications. Feast is a configurable operational data system that re-uses existing infrastructure to manage and serve machine learning -features to realtime models. For more details please review our [architecture](getting-started/architecture-and-components/overview.md). +features to realtime models. For more details please review our [architecture](getting-started/architecture/overview.md). Feast allows ML platform teams to: @@ -29,7 +29,7 @@ Feast allows ML platform teams to: **Note:** Feast uses a push model for online serving. This means that the feature store pushes feature values to the online store, which reduces the latency of feature retrieval. This is more efficient than a pull model, where the model serving system must make a request to the feature store to retrieve feature values. See -[this document](getting-started/architecture-and-components/push-vs-pull-model.md) for a more detailed discussion. +[this document](getting-started/architecture/push-vs-pull-model.md) for a more detailed discussion. {% endhint %} ## Who is Feast for? @@ -45,7 +45,7 @@ Feast is likely **not** the right tool if you ### Feast is not -* **an** [**ETL**](https://en.wikipedia.org/wiki/Extract,\_transform,\_load) / [**ELT**](https://en.wikipedia.org/wiki/Extract,\_load,\_transform) **system.** Feast is not a general purpose data pipelining system. Users often leverage tools like [dbt](https://www.getdbt.com) to manage upstream data transformations. Feast does support some [transformations](getting-started/architecture-and-components/feature-transformetion.md). +* **an** [**ETL**](https://en.wikipedia.org/wiki/Extract,\_transform,\_load) / [**ELT**](https://en.wikipedia.org/wiki/Extract,\_load,\_transform) **system.** Feast is not a general purpose data pipelining system. Users often leverage tools like [dbt](https://www.getdbt.com) to manage upstream data transformations. Feast does support some [transformations](getting-started/architecture/feature-transformetion.md). * **a data orchestration tool:** Feast does not manage or orchestrate complex workflow DAGs. It relies on upstream data pipelines to produce feature values and integrations with tools like [Airflow](https://airflow.apache.org) to make features consistently available. * **a data warehouse:** Feast is not a replacement for your data warehouse or the source of truth for all transformed data in your organization. Rather, Feast is a light-weight downstream layer that can serve data from an existing data warehouse (or other data sources) to models in production. * **a database:** Feast is not a database, but helps manage data stored in other systems (e.g. BigQuery, Snowflake, DynamoDB, Redis) to make features consistently available at training / serving time @@ -76,7 +76,7 @@ Explore the following resources to get started with Feast: * [Quickstart](getting-started/quickstart.md) is the fastest way to get started with Feast * [Concepts](getting-started/concepts/) describes all important Feast API concepts -* [Architecture](getting-started/architecture-and-components/) describes Feast's overall architecture. +* [Architecture](getting-started/architecture/) describes Feast's overall architecture. * [Tutorials](tutorials/tutorials-overview/) shows full examples of using Feast in machine learning applications. * [Running Feast with Snowflake/GCP/AWS](how-to-guides/feast-snowflake-gcp-aws/) provides a more in-depth guide to using Feast. * [Reference](reference/feast-cli-commands.md) contains detailed API and design documents. diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index fba70cbfe7..371621fe69 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -17,17 +17,18 @@ * [Point-in-time joins](getting-started/concepts/point-in-time-joins.md) * [Registry](getting-started/concepts/registry.md) * [\[Alpha\] Saved dataset](getting-started/concepts/dataset.md) -* [Architecture](getting-started/architecture-and-components/README.md) - * [Overview](getting-started/architecture-and-components/overview.md) - * [Language](getting-started/architecture-and-components/language.md) - * [Push vs Pull Model](getting-started/architecture-and-components/push-vs-pull-model.md) - * [Write Patterns](getting-started/architecture-and-components/write-patterns.md) - * [Feature Transformation](getting-started/architecture-and-components/feature-transformation.md) - * [Registry](getting-started/architecture-and-components/registry.md) - * [Offline store](getting-started/architecture-and-components/offline-store.md) - * [Online store](getting-started/architecture-and-components/online-store.md) - * [Batch Materialization Engine](getting-started/architecture-and-components/batch-materialization-engine.md) - * [Provider](getting-started/architecture-and-components/provider.md) +* [Architecture](getting-started/components/README.md) + * [Language](getting-started/architecture/language.md) + * [Push vs Pull Model](getting-started/architecture/push-vs-pull-model.md) + * [Write Patterns](getting-started/architecture/write-patterns.md) + * [Feature Transformation](getting-started/architecture/feature-transformation.md) +* [Components](getting-started/components/README.md) + * [Overview](getting-started/components/overview.md) + * [Registry](getting-started/components/registry.md) + * [Offline store](getting-started/components/offline-store.md) + * [Online store](getting-started/components/online-store.md) + * [Batch Materialization Engine](getting-started/components/batch-materialization-engine.md) + * [Provider](getting-started/components/provider.md) * [Third party integrations](getting-started/third-party-integrations.md) * [FAQ](getting-started/faq.md) diff --git a/docs/getting-started/architecture-and-components/overview.md b/docs/getting-started/architecture-and-components/overview.md index f4d543cd5a..393f436e5b 100644 --- a/docs/getting-started/architecture-and-components/overview.md +++ b/docs/getting-started/architecture-and-components/overview.md @@ -28,11 +28,3 @@ A complete Feast deployment contains the following components: * **Batch Materialization Engine:** The [Batch Materialization Engine](batch-materialization-engine.md) component launches a process which loads data into the online store from the offline store. By default, Feast uses a local in-process engine implementation to materialize data. However, additional infrastructure can be used for a more scalable materialization process. * **Online Store:** The online store is a database that stores only the latest feature values for each entity. The online store is either populated through materialization jobs or through [stream ingestion](../../reference/data-sources/push.md). * **Offline Store:** The offline store persists batch data that has been ingested into Feast. This data is used for producing training datasets. For feature retrieval and materialization, Feast does not manage the offline store directly, but runs queries against it. However, offline stores can be configured to support writes if Feast configures logging functionality of served features. - -{% hint style="info" %} -Java and Go Clients are also available for online feature retrieval. - -In general, we recommend [using Python](language.md) for your Feature Store microservice. - -As mentioned in the document, precomputing features is the recommended optimal path to ensure low latency performance. Reducing feature serving to a lightweight database lookup is the ideal pattern, which means the marginal overhead of Python should be tolerable. Because of this we believe the pros of Python outweigh the costs, as reimplementing feature logic is undesirable. -{% endhint %} diff --git a/docs/getting-started/architecture-and-components/feature-transformation.md b/docs/getting-started/architecture/feature-transformation.md similarity index 91% rename from docs/getting-started/architecture-and-components/feature-transformation.md rename to docs/getting-started/architecture/feature-transformation.md index 8a0e6e2dcd..f674633dc5 100644 --- a/docs/getting-started/architecture-and-components/feature-transformation.md +++ b/docs/getting-started/architecture/feature-transformation.md @@ -9,7 +9,7 @@ Feature transformations can be executed by three types of "transformation engine 2. An Offline Store (e.g., Snowflake, BigQuery, DuckDB, Spark, etc.) 3. A Stream processor (e.g., Flink or Spark Streaming) -The three transformation engines are coupled with the [communication pattern used for writes](getting-started/architecture-and-components/write-patterns.md). +The three transformation engines are coupled with the [communication pattern used for writes](getting-started/architecture/write-patterns.md). Importantly, this implies that different feature transformation code may be used under different transformation engines, so understanding the tradeoffs of diff --git a/docs/getting-started/architecture-and-components/language.md b/docs/getting-started/architecture/language.md similarity index 100% rename from docs/getting-started/architecture-and-components/language.md rename to docs/getting-started/architecture/language.md diff --git a/docs/getting-started/architecture/overview.md b/docs/getting-started/architecture/overview.md new file mode 100644 index 0000000000..7d1180bfd1 --- /dev/null +++ b/docs/getting-started/architecture/overview.md @@ -0,0 +1,18 @@ +# Overview + +![Feast Architecture Diagram](<../../assets/feast_marchitecture.png>) + +Feast's architecture is designed to be flexible and scalable. It is composed of several components that work together to provide a feature store that can be used to serve features for training and inference. + +* Feast uses a [Push Model](push-vs-pull-model.md) to ingest data from different sources and store feature values in the +online store. +This allows Feast to serve features in real-time with low latency. + +* Feast supports On Demand and Streaming Transformations for [feature computation](feature-transformation.md) and + will support Batch transformations in the future. For Streaming and Batch, Feast requires a separate Feature Transformation + Engine (in the batch case, this is typically your Offline Store). We are exploring adding a default streaming engine to Feast. + +* Domain expertise is recommended when integrating a data source with Feast understand the [tradeoffs from different + write patterns](write-patterns.md) to your application + +* We recommend [using Python](language.md) for your Feature Store microservice. As mentioned in the document, precomputing features is the recommended optimal path to ensure low latency performance. Reducing feature serving to a lightweight database lookup is the ideal pattern, which means the marginal overhead of Python should be tolerable. Because of this we believe the pros of Python outweigh the costs, as reimplementing feature logic is undesirable. Java and Go Clients are also available for online feature retrieval. diff --git a/docs/getting-started/architecture-and-components/push-vs-pull-model.md b/docs/getting-started/architecture/push-vs-pull-model.md similarity index 96% rename from docs/getting-started/architecture-and-components/push-vs-pull-model.md rename to docs/getting-started/architecture/push-vs-pull-model.md index 9b20b622e3..b205e97fc5 100644 --- a/docs/getting-started/architecture-and-components/push-vs-pull-model.md +++ b/docs/getting-started/architecture/push-vs-pull-model.md @@ -25,4 +25,4 @@ Implicit in the Push model are decisions about _how_ and _when_ to push feature From a developer's perspective, there are three ways to push feature values to the online store with different tradeoffs. -They are discussed further in the [Write Patterns](getting-started/architecture-and-components/write-patterns.md) section. +They are discussed further in the [Write Patterns](getting-started/architecture/write-patterns.md) section. diff --git a/docs/getting-started/architecture-and-components/write-patterns.md b/docs/getting-started/architecture/write-patterns.md similarity index 88% rename from docs/getting-started/architecture-and-components/write-patterns.md rename to docs/getting-started/architecture/write-patterns.md index cef09d2779..0682287a11 100644 --- a/docs/getting-started/architecture-and-components/write-patterns.md +++ b/docs/getting-started/architecture/write-patterns.md @@ -1,6 +1,6 @@ # Writing Data to Feast -Feast uses a [Push Model](getting-started/architecture-and-components/push-vs-pull-model.md) to push features to the online store. +Feast uses a [Push Model](getting-started/architecture/push-vs-pull-model.md) to push features to the online store. This has two important consequences: (1) communication patterns between the Data Producer (i.e., the client) and Feast (i.e,. the server) and (2) feature computation and _feature value_ write patterns to Feast's online store. @@ -13,10 +13,10 @@ be either raw data where Feast computes and stores the feature values or precomp There are two ways a client (or Data Producer) can *_send_* data to the online store: 1. Synchronously - - Using a synchronous API call for a small number of entities or a single entity (e.g., using the [`push` or `write_to_online_store` methods](https://docs.feast.dev/reference/data-sources/push#pushing-data)) or the Feature Server's [`push` endpoint](https://docs.feast.dev/reference/feature-servers/python-feature-server#pushing-features-to-the-online-and-offline-stores)) + - Using a synchronous API call for a small number of entities or a single entity (e.g., using the [`push` or `write_to_online_store` methods](../../reference/data-sources/push.md#pushing-data)) or the Feature Server's [`push` endpoint](reference/feature-servers/python-feature-server.md#pushing-features-to-the-online-and-offline-stores)) 2. Asynchronously - - Using an asynchronous API call for a small number of entities or a single entity (e.g., using the [`push` or `write_to_online_store` methods](https://docs.feast.dev/reference/data-sources/push#pushing-data)) or the Feature Server's [`push` endpoint](https://docs.feast.dev/reference/feature-servers/python-feature-server#pushing-features-to-the-online-and-offline-stores)) - - Using a "batch job" for a large number of entities (e.g., using a [batch materialization engine](https://docs.feast.dev/getting-started/architecture-and-components/batch-materialization-engine)) + - Using an asynchronous API call for a small number of entities or a single entity (e.g., using the [`push` or `write_to_online_store` methods](reference/data-sources/push.md#pushing-data)) or the Feature Server's [`push` endpoint](reference/feature-servers/python-feature-server.md#pushing-features-to-the-online-and-offline-stores)) + - Using a "batch job" for a large number of entities (e.g., using a [batch materialization engine](getting-started/components/batch-materialization-engine)) Note, in some contexts, developers may "batch" a group of entities together and write them to the online store in a single API call. This is a common pattern when writing data to the online store to reduce write loads but we would diff --git a/docs/getting-started/concepts/dataset.md b/docs/getting-started/concepts/dataset.md index d55adb4703..829ad4284e 100644 --- a/docs/getting-started/concepts/dataset.md +++ b/docs/getting-started/concepts/dataset.md @@ -2,7 +2,7 @@ Feast datasets allow for conveniently saving dataframes that include both features and entities to be subsequently used for data analysis and model training. [Data Quality Monitoring](https://docs.google.com/document/d/110F72d4NTv80p35wDSONxhhPBqWRwbZXG4f9mNEMd98) was the primary motivation for creating dataset concept. -Dataset's metadata is stored in the Feast registry and raw data (features, entities, additional input keys and timestamp) is stored in the [offline store](../architecture-and-components/offline-store.md). +Dataset's metadata is stored in the Feast registry and raw data (features, entities, additional input keys and timestamp) is stored in the [offline store](../components/offline-store.md). Dataset can be created from: diff --git a/docs/getting-started/faq.md b/docs/getting-started/faq.md index d603e12ab6..6567ae181d 100644 --- a/docs/getting-started/faq.md +++ b/docs/getting-started/faq.md @@ -29,7 +29,7 @@ Feature views once they are used by a feature service are intended to be immutab ### What is the difference between data sources and the offline store? -The data source itself defines the underlying data warehouse table in which the features are stored. The offline store interface defines the APIs required to make an arbitrary compute layer work for Feast (e.g. pulling features given a set of feature views from their sources, exporting the data set results to different formats). Please see [data sources](concepts/data-ingestion.md) and [offline store](architecture-and-components/offline-store.md) for more details. +The data source itself defines the underlying data warehouse table in which the features are stored. The offline store interface defines the APIs required to make an arbitrary compute layer work for Feast (e.g. pulling features given a set of feature views from their sources, exporting the data set results to different formats). Please see [data sources](concepts/data-ingestion.md) and [offline store](components/offline-store.md) for more details. ### Is it possible to have offline and online stores from different providers? diff --git a/docs/getting-started/quickstart.md b/docs/getting-started/quickstart.md index 01c039e9c5..ffc01c9d6e 100644 --- a/docs/getting-started/quickstart.md +++ b/docs/getting-started/quickstart.md @@ -623,6 +623,6 @@ show up in the upcoming concepts + architecture + tutorial pages as well. ## Next steps * Read the [Concepts](concepts/) page to understand the Feast data model. -* Read the [Architecture](architecture-and-components/) page. +* Read the [Architecture](architecture/) page. * Check out our [Tutorials](../tutorials/tutorials-overview/) section for more examples on how to use Feast. * Follow our [Running Feast with Snowflake/GCP/AWS](../how-to-guides/feast-snowflake-gcp-aws/) guide for a more in-depth tutorial on using Feast. diff --git a/docs/how-to-guides/scaling-feast.md b/docs/how-to-guides/scaling-feast.md index ce63f027c9..7e4f27b1dd 100644 --- a/docs/how-to-guides/scaling-feast.md +++ b/docs/how-to-guides/scaling-feast.md @@ -20,7 +20,7 @@ The recommended solution in this case is to use the [SQL based registry](../tuto The default Feast materialization process is an in-memory process, which pulls data from the offline store before writing it to the online store. However, this process does not scale for large data sets, since it's executed on a single-process. -Feast supports pluggable [Materialization Engines](../getting-started/architecture-and-components/batch-materialization-engine.md), that allow the materialization process to be scaled up. +Feast supports pluggable [Materialization Engines](../getting-started/components/batch-materialization-engine.md), that allow the materialization process to be scaled up. Aside from the local process, Feast supports a [Lambda-based materialization engine](https://rtd.feast.dev/en/master/#alpha-lambda-based-engine), and a [Bytewax-based materialization engine](https://rtd.feast.dev/en/master/#bytewax-engine). Users may also be able to build an engine to scale up materialization using existing infrastructure in their organizations. \ No newline at end of file diff --git a/docs/reference/batch-materialization/README.md b/docs/reference/batch-materialization/README.md index 8511fd81d0..a05d6d75e5 100644 --- a/docs/reference/batch-materialization/README.md +++ b/docs/reference/batch-materialization/README.md @@ -1,6 +1,6 @@ # Batch materialization -Please see [Batch Materialization Engine](../../getting-started/architecture-and-components/batch-materialization-engine.md) for an explanation of batch materialization engines. +Please see [Batch Materialization Engine](../../getting-started/components/batch-materialization-engine.md) for an explanation of batch materialization engines. {% page-ref page="snowflake.md" %} diff --git a/docs/reference/codebase-structure.md b/docs/reference/codebase-structure.md index 8eb5572679..7077e48fef 100644 --- a/docs/reference/codebase-structure.md +++ b/docs/reference/codebase-structure.md @@ -34,7 +34,7 @@ There are also several important submodules: * `ui/` contains the embedded Web UI, to be launched on the `feast ui` command. Of these submodules, `infra/` is the most important. -It contains the interfaces for the [provider](getting-started/architecture-and-components/provider.md), [offline store](getting-started/architecture-and-components/offline-store.md), [online store](getting-started/architecture-and-components/online-store.md), [batch materialization engine](getting-started/architecture-and-components/batch-materialization-engine.md), and [registry](getting-started/architecture-and-components/registry.md), as well as all of their individual implementations. +It contains the interfaces for the [provider](getting-started/components/provider.md), [offline store](getting-started/components/offline-store.md), [online store](getting-started/components/online-store.md), [batch materialization engine](getting-started/components/batch-materialization-engine.md), and [registry](getting-started/components/registry.md), as well as all of their individual implementations. ``` $ tree --dirsfirst -L 1 infra diff --git a/docs/reference/offline-stores/README.md b/docs/reference/offline-stores/README.md index 33eca6d426..87c92bfcf8 100644 --- a/docs/reference/offline-stores/README.md +++ b/docs/reference/offline-stores/README.md @@ -1,6 +1,6 @@ # Offline stores -Please see [Offline Store](../../getting-started/architecture-and-components/offline-store.md) for a conceptual explanation of offline stores. +Please see [Offline Store](../../getting-started/components/offline-store.md) for a conceptual explanation of offline stores. {% content-ref url="overview.md" %} [overview.md](overview.md) diff --git a/docs/reference/online-stores/README.md b/docs/reference/online-stores/README.md index 0acf6701f9..bf5419b249 100644 --- a/docs/reference/online-stores/README.md +++ b/docs/reference/online-stores/README.md @@ -1,6 +1,6 @@ # Online stores -Please see [Online Store](../../getting-started/architecture-and-components/online-store.md) for an explanation of online stores. +Please see [Online Store](../../getting-started/components/online-store.md) for an explanation of online stores. {% content-ref url="overview.md" %} [overview.md](overview.md) diff --git a/docs/reference/providers/README.md b/docs/reference/providers/README.md index 20686a1e14..925ae8ebc1 100644 --- a/docs/reference/providers/README.md +++ b/docs/reference/providers/README.md @@ -1,6 +1,6 @@ # Providers -Please see [Provider](../../getting-started/architecture-and-components/provider.md) for an explanation of providers. +Please see [Provider](../../getting-started/components/provider.md) for an explanation of providers. {% page-ref page="local.md" %} From a5cc13d21b0457c88bc0eb67a5bc02fb8868d641 Mon Sep 17 00:00:00 2001 From: Francisco Javier Arceo Date: Wed, 14 Aug 2024 12:27:27 -0400 Subject: [PATCH 18/21] updated urls Signed-off-by: Francisco Javier Arceo --- sdk/python/feast/templates/gcp/README.md | 2 +- sdk/python/feast/templates/local/README.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/sdk/python/feast/templates/gcp/README.md b/sdk/python/feast/templates/gcp/README.md index 7929dc2bdf..bc9e51769c 100644 --- a/sdk/python/feast/templates/gcp/README.md +++ b/sdk/python/feast/templates/gcp/README.md @@ -11,7 +11,7 @@ You can run the overall workflow with `python test_workflow.py`. ## To move from this into a more production ready workflow: 1. `feature_store.yaml` points to a local file as a registry. You'll want to setup a remote file (e.g. in S3/GCS) or a SQL registry. See [registry docs](https://docs.feast.dev/getting-started/concepts/registry) for more details. -2. This example uses an already setup BigQuery Feast data warehouse as the [offline store](https://docs.feast.dev/getting-started/architecture-and-components/offline-store) +2. This example uses an already setup BigQuery Feast data warehouse as the [offline store](https://docs.feast.dev/getting-started/components/offline-store) to generate training data. You'll need to connect your own BigQuery instance to make this work. 3. Setup CI/CD + dev vs staging vs prod environments to automatically update the registry as you change Feast feature definitions. See [docs](https://docs.feast.dev/how-to-guides/running-feast-in-production#1.-automatically-deploying-changes-to-your-feature-definitions). 4. (optional) Regularly scheduled materialization to power low latency feature retrieval (e.g. via Airflow). See [Batch data ingestion](https://docs.feast.dev/getting-started/concepts/data-ingestion#batch-data-ingestion) diff --git a/sdk/python/feast/templates/local/README.md b/sdk/python/feast/templates/local/README.md index daf3a686fb..1e617cc442 100644 --- a/sdk/python/feast/templates/local/README.md +++ b/sdk/python/feast/templates/local/README.md @@ -18,7 +18,7 @@ You can run the overall workflow with `python test_workflow.py`. - You can see your options if you run `feast init --help`. 2. `feature_store.yaml` points to a local file as a registry. You'll want to setup a remote file (e.g. in S3/GCS) or a SQL registry. See [registry docs](https://docs.feast.dev/getting-started/concepts/registry) for more details. -3. This example uses a file [offline store](https://docs.feast.dev/getting-started/architecture-and-components/offline-store) +3. This example uses a file [offline store](https://docs.feast.dev/getting-started/components/offline-store) to generate training data. It does not scale. We recommend instead using a data warehouse such as BigQuery, Snowflake, Redshift. There is experimental support for Spark as well. 4. Setup CI/CD + dev vs staging vs prod environments to automatically update the registry as you change Feast feature definitions. See [docs](https://docs.feast.dev/how-to-guides/running-feast-in-production#1.-automatically-deploying-changes-to-your-feature-definitions). From 466f07a34b1aed85cfa69285a8ad8ae0739c75e8 Mon Sep 17 00:00:00 2001 From: Francisco Javier Arceo Date: Wed, 14 Aug 2024 12:28:12 -0400 Subject: [PATCH 19/21] moving files from architecture-and-components to components/ Signed-off-by: Francisco Javier Arceo --- .../{architecture-and-components => copmonents}/README.md | 0 .../batch-materialization-engine.md | 0 .../{architecture-and-components => copmonents}/offline-store.md | 0 .../{architecture-and-components => copmonents}/online-store.md | 0 .../{architecture-and-components => copmonents}/overview.md | 0 .../{architecture-and-components => copmonents}/provider.md | 0 .../{architecture-and-components => copmonents}/registry.md | 0 .../stream-processor.md | 0 8 files changed, 0 insertions(+), 0 deletions(-) rename docs/getting-started/{architecture-and-components => copmonents}/README.md (100%) rename docs/getting-started/{architecture-and-components => copmonents}/batch-materialization-engine.md (100%) rename docs/getting-started/{architecture-and-components => copmonents}/offline-store.md (100%) rename docs/getting-started/{architecture-and-components => copmonents}/online-store.md (100%) rename docs/getting-started/{architecture-and-components => copmonents}/overview.md (100%) rename docs/getting-started/{architecture-and-components => copmonents}/provider.md (100%) rename docs/getting-started/{architecture-and-components => copmonents}/registry.md (100%) rename docs/getting-started/{architecture-and-components => copmonents}/stream-processor.md (100%) diff --git a/docs/getting-started/architecture-and-components/README.md b/docs/getting-started/copmonents/README.md similarity index 100% rename from docs/getting-started/architecture-and-components/README.md rename to docs/getting-started/copmonents/README.md diff --git a/docs/getting-started/architecture-and-components/batch-materialization-engine.md b/docs/getting-started/copmonents/batch-materialization-engine.md similarity index 100% rename from docs/getting-started/architecture-and-components/batch-materialization-engine.md rename to docs/getting-started/copmonents/batch-materialization-engine.md diff --git a/docs/getting-started/architecture-and-components/offline-store.md b/docs/getting-started/copmonents/offline-store.md similarity index 100% rename from docs/getting-started/architecture-and-components/offline-store.md rename to docs/getting-started/copmonents/offline-store.md diff --git a/docs/getting-started/architecture-and-components/online-store.md b/docs/getting-started/copmonents/online-store.md similarity index 100% rename from docs/getting-started/architecture-and-components/online-store.md rename to docs/getting-started/copmonents/online-store.md diff --git a/docs/getting-started/architecture-and-components/overview.md b/docs/getting-started/copmonents/overview.md similarity index 100% rename from docs/getting-started/architecture-and-components/overview.md rename to docs/getting-started/copmonents/overview.md diff --git a/docs/getting-started/architecture-and-components/provider.md b/docs/getting-started/copmonents/provider.md similarity index 100% rename from docs/getting-started/architecture-and-components/provider.md rename to docs/getting-started/copmonents/provider.md diff --git a/docs/getting-started/architecture-and-components/registry.md b/docs/getting-started/copmonents/registry.md similarity index 100% rename from docs/getting-started/architecture-and-components/registry.md rename to docs/getting-started/copmonents/registry.md diff --git a/docs/getting-started/architecture-and-components/stream-processor.md b/docs/getting-started/copmonents/stream-processor.md similarity index 100% rename from docs/getting-started/architecture-and-components/stream-processor.md rename to docs/getting-started/copmonents/stream-processor.md From 9f16302b9be6562e8397d7385a6e45d621da0258 Mon Sep 17 00:00:00 2001 From: Francisco Javier Arceo Date: Wed, 14 Aug 2024 12:30:01 -0400 Subject: [PATCH 20/21] had a typo in components Signed-off-by: Francisco Javier Arceo --- docs/getting-started/{copmonents => components}/README.md | 0 .../{copmonents => components}/batch-materialization-engine.md | 0 docs/getting-started/{copmonents => components}/offline-store.md | 0 docs/getting-started/{copmonents => components}/online-store.md | 0 docs/getting-started/{copmonents => components}/overview.md | 0 docs/getting-started/{copmonents => components}/provider.md | 0 docs/getting-started/{copmonents => components}/registry.md | 0 .../{copmonents => components}/stream-processor.md | 0 8 files changed, 0 insertions(+), 0 deletions(-) rename docs/getting-started/{copmonents => components}/README.md (100%) rename docs/getting-started/{copmonents => components}/batch-materialization-engine.md (100%) rename docs/getting-started/{copmonents => components}/offline-store.md (100%) rename docs/getting-started/{copmonents => components}/online-store.md (100%) rename docs/getting-started/{copmonents => components}/overview.md (100%) rename docs/getting-started/{copmonents => components}/provider.md (100%) rename docs/getting-started/{copmonents => components}/registry.md (100%) rename docs/getting-started/{copmonents => components}/stream-processor.md (100%) diff --git a/docs/getting-started/copmonents/README.md b/docs/getting-started/components/README.md similarity index 100% rename from docs/getting-started/copmonents/README.md rename to docs/getting-started/components/README.md diff --git a/docs/getting-started/copmonents/batch-materialization-engine.md b/docs/getting-started/components/batch-materialization-engine.md similarity index 100% rename from docs/getting-started/copmonents/batch-materialization-engine.md rename to docs/getting-started/components/batch-materialization-engine.md diff --git a/docs/getting-started/copmonents/offline-store.md b/docs/getting-started/components/offline-store.md similarity index 100% rename from docs/getting-started/copmonents/offline-store.md rename to docs/getting-started/components/offline-store.md diff --git a/docs/getting-started/copmonents/online-store.md b/docs/getting-started/components/online-store.md similarity index 100% rename from docs/getting-started/copmonents/online-store.md rename to docs/getting-started/components/online-store.md diff --git a/docs/getting-started/copmonents/overview.md b/docs/getting-started/components/overview.md similarity index 100% rename from docs/getting-started/copmonents/overview.md rename to docs/getting-started/components/overview.md diff --git a/docs/getting-started/copmonents/provider.md b/docs/getting-started/components/provider.md similarity index 100% rename from docs/getting-started/copmonents/provider.md rename to docs/getting-started/components/provider.md diff --git a/docs/getting-started/copmonents/registry.md b/docs/getting-started/components/registry.md similarity index 100% rename from docs/getting-started/copmonents/registry.md rename to docs/getting-started/components/registry.md diff --git a/docs/getting-started/copmonents/stream-processor.md b/docs/getting-started/components/stream-processor.md similarity index 100% rename from docs/getting-started/copmonents/stream-processor.md rename to docs/getting-started/components/stream-processor.md From 408c906716b273b053e6da276bd7fec7d65bb89d Mon Sep 17 00:00:00 2001 From: Francisco Javier Arceo Date: Wed, 14 Aug 2024 12:36:44 -0400 Subject: [PATCH 21/21] Cleaned everything up Signed-off-by: Francisco Javier Arceo --- docs/SUMMARY.md | 3 ++- docs/getting-started/architecture/README.md | 21 ++++++++++++++++++ .../architecture/feature-transformation.md | 2 +- .../architecture/write-patterns.md | 6 ++--- docs/getting-started/components/README.md | 22 +------------------ .../feature-servers/python-feature-server.md | 2 +- 6 files changed, 29 insertions(+), 27 deletions(-) create mode 100644 docs/getting-started/architecture/README.md diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index 371621fe69..a6a40fc91d 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -17,7 +17,8 @@ * [Point-in-time joins](getting-started/concepts/point-in-time-joins.md) * [Registry](getting-started/concepts/registry.md) * [\[Alpha\] Saved dataset](getting-started/concepts/dataset.md) -* [Architecture](getting-started/components/README.md) +* [Architecture](getting-started/architecture/README.md) + * [Overview](getting-started/architecture/overview.md) * [Language](getting-started/architecture/language.md) * [Push vs Pull Model](getting-started/architecture/push-vs-pull-model.md) * [Write Patterns](getting-started/architecture/write-patterns.md) diff --git a/docs/getting-started/architecture/README.md b/docs/getting-started/architecture/README.md new file mode 100644 index 0000000000..a45f4ed6ec --- /dev/null +++ b/docs/getting-started/architecture/README.md @@ -0,0 +1,21 @@ +# Architecture + +{% content-ref url="overview.md" %} +[overview.md](overview.md) +{% endcontent-ref %} + +{% content-ref url="language.md" %} +[language.md](language.md) +{% endcontent-ref %} + +{% content-ref url="push-vs-pull-model.md" %} +[push-vs-pull-model.md](push-vs-pull-model.md) +{% endcontent-ref %} + +{% content-ref url="write-patterns.md" %} +[write-patterns.md](write-patterns.md) +{% endcontent-ref %} + +{% content-ref url="feature-transformation-model.md" %} +[feature-transformation.md](feature-transformation.md) +{% endcontent-ref %} diff --git a/docs/getting-started/architecture/feature-transformation.md b/docs/getting-started/architecture/feature-transformation.md index f674633dc5..457e71d85e 100644 --- a/docs/getting-started/architecture/feature-transformation.md +++ b/docs/getting-started/architecture/feature-transformation.md @@ -9,7 +9,7 @@ Feature transformations can be executed by three types of "transformation engine 2. An Offline Store (e.g., Snowflake, BigQuery, DuckDB, Spark, etc.) 3. A Stream processor (e.g., Flink or Spark Streaming) -The three transformation engines are coupled with the [communication pattern used for writes](getting-started/architecture/write-patterns.md). +The three transformation engines are coupled with the [communication pattern used for writes](write-patterns.md). Importantly, this implies that different feature transformation code may be used under different transformation engines, so understanding the tradeoffs of diff --git a/docs/getting-started/architecture/write-patterns.md b/docs/getting-started/architecture/write-patterns.md index 0682287a11..4674b5504d 100644 --- a/docs/getting-started/architecture/write-patterns.md +++ b/docs/getting-started/architecture/write-patterns.md @@ -13,10 +13,10 @@ be either raw data where Feast computes and stores the feature values or precomp There are two ways a client (or Data Producer) can *_send_* data to the online store: 1. Synchronously - - Using a synchronous API call for a small number of entities or a single entity (e.g., using the [`push` or `write_to_online_store` methods](../../reference/data-sources/push.md#pushing-data)) or the Feature Server's [`push` endpoint](reference/feature-servers/python-feature-server.md#pushing-features-to-the-online-and-offline-stores)) + - Using a synchronous API call for a small number of entities or a single entity (e.g., using the [`push` or `write_to_online_store` methods](../../reference/data-sources/push.md#pushing-data)) or the Feature Server's [`push` endpoint](../../reference/feature-servers/python-feature-server.md#pushing-features-to-the-online-and-offline-stores)) 2. Asynchronously - - Using an asynchronous API call for a small number of entities or a single entity (e.g., using the [`push` or `write_to_online_store` methods](reference/data-sources/push.md#pushing-data)) or the Feature Server's [`push` endpoint](reference/feature-servers/python-feature-server.md#pushing-features-to-the-online-and-offline-stores)) - - Using a "batch job" for a large number of entities (e.g., using a [batch materialization engine](getting-started/components/batch-materialization-engine)) + - Using an asynchronous API call for a small number of entities or a single entity (e.g., using the [`push` or `write_to_online_store` methods](../../reference/data-sources/push.md#pushing-data)) or the Feature Server's [`push` endpoint](../../reference/feature-servers/python-feature-server.md#pushing-features-to-the-online-and-offline-stores)) + - Using a "batch job" for a large number of entities (e.g., using a [batch materialization engine](../components/batch-materialization-engine.md)) Note, in some contexts, developers may "batch" a group of entities together and write them to the online store in a single API call. This is a common pattern when writing data to the online store to reduce write loads but we would diff --git a/docs/getting-started/components/README.md b/docs/getting-started/components/README.md index 9ee743846a..d468714bd4 100644 --- a/docs/getting-started/components/README.md +++ b/docs/getting-started/components/README.md @@ -1,24 +1,4 @@ -# Architecture - -{% content-ref url="overview.md" %} -[overview.md](overview.md) -{% endcontent-ref %} - -{% content-ref url="language.md" %} -[language.md](language.md) -{% endcontent-ref %} - -{% content-ref url="push-vs-pull-model.md" %} -[push-vs-pull-model.md](push-vs-pull-model.md) -{% endcontent-ref %} - -{% content-ref url="write-patterns.md" %} -[write-patterns.md](write-patterns.md) -{% endcontent-ref %} - -{% content-ref url="feature-transformation-model.md" %} -[feature-transformation.md](feature-transformation.md) -{% endcontent-ref %} +# Components {% content-ref url="registry.md" %} [registry.md](registry.md) diff --git a/docs/reference/feature-servers/python-feature-server.md b/docs/reference/feature-servers/python-feature-server.md index 0d8a0aef75..33dfe77ae1 100644 --- a/docs/reference/feature-servers/python-feature-server.md +++ b/docs/reference/feature-servers/python-feature-server.md @@ -153,7 +153,7 @@ curl -X POST \ ### Pushing features to the online and offline stores -The Python feature server also exposes an endpoint for [push sources](../../data-sources/push.md). This endpoint allows you to push data to the online and/or offline store. +The Python feature server also exposes an endpoint for [push sources](../data-sources/push.md). This endpoint allows you to push data to the online and/or offline store. The request definition for `PushMode` is a string parameter `to` where the options are: \[`"online"`, `"offline"`, `"online_and_offline"`].