-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Redundant feature materialization and premature incremental materialization timestamp updates #3789
Conversation
great, I have been working on the similar PR, but you did it first and way better 💯 |
sdk/python/feast/infra/materialization/contrib/bytewax/bytewax_materialization_engine.py
Show resolved
Hide resolved
@james-crabtree-sp Thanks again for putting this PR together. I think in order to merge, you'll need to sign your commits: https://docs.feast.dev/project/development-guide#signing-off-commits |
sdk/python/feast/infra/materialization/contrib/bytewax/bytewax_materialization_engine.py
Show resolved
Hide resolved
I think they are all signed off already. I see that the DCO task that checks this completed successfully https://github.com/feast-dev/feast/pull/3789/checks?check_run_id=17541354092 |
Sorry for the confusion, I was referring to the requirements to merge, and linked to the wrong documentation. This is the requirement that I was referring to: Merging is blocked |
hey @james-crabtree-sp this is a great PR, you need any hands to make it merged? I see there's a lot of commits in the past unverified if you don't mind I can create another PR to mimic this one |
0631425
to
baf108c
Compare
Signed-off-by: James Crabtree <[email protected]>
Signed-off-by: James Crabtree <[email protected]>
Signed-off-by: James Crabtree <[email protected]>
) * SAASMLOPS-809 fix bytewax workers so they only process a single file * SAASMLOPS-809 fix newlines Signed-off-by: James Crabtree <[email protected]>
* SAASMLOPS-833 add configurable job timeout * SAASMLOPS-833 fix whitespace Signed-off-by: James Crabtree <[email protected]>
Signed-off-by: James Crabtree <[email protected]>
Signed-off-by: James Crabtree <[email protected]>
Signed-off-by: James Crabtree <[email protected]>
baf108c
to
fabd417
Compare
@whoahbot @sudohainguyen Ahh, I understand now. Commits are now verified |
exactly 🤣 looks good now but you need to run code format though |
fabd417
to
c35b0c1
Compare
Noticed that too. Weird rebase issue. Should be fixed now |
…error Signed-off-by: James Crabtree <[email protected]>
Signed-off-by: James Crabtree <[email protected]>
9920df7
to
938b903
Compare
@james-crabtree-sp I've been adopting your changes on my side so far do we need to wait the job to be done, like forever without timeout? |
Well, this PR adds the configurable active_deadline_seconds setting, which can be used to impose a timeout on the kubernetes jobs. Does that answer your question? |
cool clear |
# [0.35.0](v0.34.0...v0.35.0) (2024-01-13) ### Bug Fixes * Add async refresh to prevent synchronous refresh in main thread ([#3812](#3812)) ([9583ed6](9583ed6)) * Adopt connection pooling for HBase ([#3793](#3793)) ([b3852bf](b3852bf)) * Bytewax engine create configmap from object ([#3821](#3821)) ([25e9775](25e9775)) * Fix warnings from deprecated paths and update default log level ([#3757](#3757)) ([68a8737](68a8737)) * improve parsing bytewax job status ([5983f40](5983f40)) * make bytewax settings unexposed ([ae1bb8b](ae1bb8b)) * Make generated temp table name escaped ([#3797](#3797)) ([175d796](175d796)) * Pin numpy version to avoid spammy deprecation messages ([774ed33](774ed33)) * Redundant feature materialization and premature incremental materialization timestamp updates ([#3789](#3789)) ([417b16b](417b16b)), closes [#6](#6) [#7](#7) * Resolve hbase hotspot issue when materializing ([#3790](#3790)) ([7376db8](7376db8)) * Set keepalives_idle None by default ([#3756](#3756)) ([8717e9b](8717e9b)) * Set upper bound for bigquery client due to its breaking changes ([2151c39](2151c39)) * UI project cannot handle fallback routes ([#3766](#3766)) ([96ece0f](96ece0f)) * update dependencies versions due to conflicts ([5dc0b24](5dc0b24)) * Update jackson and remove unnecessary logging ([#3809](#3809)) ([018d0ea](018d0ea)) * upgrade the pyarrow to latest v14.0.1 for CVE-2023-47248. ([052182b](052182b)) ### Features * Add get online feature rpc to gprc server ([#3815](#3815)) ([01db8cc](01db8cc)) * Add materialize and materialize-incremental rest endpoints ([#3761](#3761)) ([fa600fe](fa600fe)), closes [#3760](#3760) * add redis sentinel support ([3387a15](3387a15)) * add redis sentinel support ([4337c89](4337c89)) * add redis sentinel support format lint ([aad8718](aad8718)) * Add support for `table_create_disposition` in bigquery job for offline store ([#3762](#3762)) ([6a728fe](6a728fe)) * Add support for in_cluster config and additional labels for bytewax materialization ([#3754](#3754)) ([2192e65](2192e65)) * Apply cache to load proto registry for performance ([#3702](#3702)) ([709c709](709c709)) * Make bytewax job write as mini-batches ([#3777](#3777)) ([9b0e5ce](9b0e5ce)) * Optimize bytewax pod resource with zero-copy ([9cf9d96](9cf9d96)) * Support GCS filesystem for bytewax engine ([#3774](#3774)) ([fb6b807](fb6b807))
# [0.35.0](feast-dev/feast@v0.34.0...v0.35.0) (2024-01-13) ### Bug Fixes * Add async refresh to prevent synchronous refresh in main thread ([feast-dev#3812](feast-dev#3812)) ([9583ed6](feast-dev@9583ed6)) * Adopt connection pooling for HBase ([feast-dev#3793](feast-dev#3793)) ([b3852bf](feast-dev@b3852bf)) * Bytewax engine create configmap from object ([feast-dev#3821](feast-dev#3821)) ([25e9775](feast-dev@25e9775)) * Fix warnings from deprecated paths and update default log level ([feast-dev#3757](feast-dev#3757)) ([68a8737](feast-dev@68a8737)) * improve parsing bytewax job status ([5983f40](feast-dev@5983f40)) * make bytewax settings unexposed ([ae1bb8b](feast-dev@ae1bb8b)) * Make generated temp table name escaped ([feast-dev#3797](feast-dev#3797)) ([175d796](feast-dev@175d796)) * Pin numpy version to avoid spammy deprecation messages ([774ed33](feast-dev@774ed33)) * Redundant feature materialization and premature incremental materialization timestamp updates ([feast-dev#3789](feast-dev#3789)) ([417b16b](feast-dev@417b16b)), closes [feast-dev#6](feast-dev#6) [feast-dev#7](feast-dev#7) * Resolve hbase hotspot issue when materializing ([feast-dev#3790](feast-dev#3790)) ([7376db8](feast-dev@7376db8)) * Set keepalives_idle None by default ([feast-dev#3756](feast-dev#3756)) ([8717e9b](feast-dev@8717e9b)) * Set upper bound for bigquery client due to its breaking changes ([2151c39](feast-dev@2151c39)) * UI project cannot handle fallback routes ([feast-dev#3766](feast-dev#3766)) ([96ece0f](feast-dev@96ece0f)) * update dependencies versions due to conflicts ([5dc0b24](feast-dev@5dc0b24)) * Update jackson and remove unnecessary logging ([feast-dev#3809](feast-dev#3809)) ([018d0ea](feast-dev@018d0ea)) * upgrade the pyarrow to latest v14.0.1 for CVE-2023-47248. ([052182b](feast-dev@052182b)) ### Features * Add get online feature rpc to gprc server ([feast-dev#3815](feast-dev#3815)) ([01db8cc](feast-dev@01db8cc)) * Add materialize and materialize-incremental rest endpoints ([feast-dev#3761](feast-dev#3761)) ([fa600fe](feast-dev@fa600fe)), closes [feast-dev#3760](feast-dev#3760) * add redis sentinel support ([3387a15](feast-dev@3387a15)) * add redis sentinel support ([4337c89](feast-dev@4337c89)) * add redis sentinel support format lint ([aad8718](feast-dev@aad8718)) * Add support for `table_create_disposition` in bigquery job for offline store ([feast-dev#3762](feast-dev#3762)) ([6a728fe](feast-dev@6a728fe)) * Add support for in_cluster config and additional labels for bytewax materialization ([feast-dev#3754](feast-dev#3754)) ([2192e65](feast-dev@2192e65)) * Apply cache to load proto registry for performance ([feast-dev#3702](feast-dev#3702)) ([709c709](feast-dev@709c709)) * Make bytewax job write as mini-batches ([feast-dev#3777](feast-dev#3777)) ([9b0e5ce](feast-dev@9b0e5ce)) * Optimize bytewax pod resource with zero-copy ([9cf9d96](feast-dev@9cf9d96)) * Support GCS filesystem for bytewax engine ([feast-dev#3774](feast-dev#3774)) ([fb6b807](feast-dev@fb6b807)) Signed-off-by: tokoko <[email protected]>
…rialization timestamp updates (feast-dev#3789) * SAASMLOPS-767 wait for jobs to complete Signed-off-by: James Crabtree <[email protected]> * SAASMLOPS-805 Stopgap change to fix duplicate materialization of data Signed-off-by: James Crabtree <[email protected]> * SAASMLOPS-805 save BYTEWAX_REPLICAS=1 Signed-off-by: James Crabtree <[email protected]> * SAASMLOPS-809 fix bytewax workers so they only process a single file (feast-dev#6) * SAASMLOPS-809 fix bytewax workers so they only process a single file * SAASMLOPS-809 fix newlines Signed-off-by: James Crabtree <[email protected]> * SAASMLOPS-833 add configurable job timeout (feast-dev#7) * SAASMLOPS-833 add configurable job timeout * SAASMLOPS-833 fix whitespace Signed-off-by: James Crabtree <[email protected]> * develop Run large materializations in batches of pods Signed-off-by: James Crabtree <[email protected]> * master Set job_batch_size at least equal to max_parallelism Signed-off-by: James Crabtree <[email protected]> * master clarity max_parallelism description Signed-off-by: James Crabtree <[email protected]> * master resolve bug that causes materialization to continue after job error Signed-off-by: James Crabtree <[email protected]> * master resolve bug causing pod logs to not be printed Signed-off-by: James Crabtree <[email protected]> --------- Signed-off-by: James Crabtree <[email protected]> Signed-off-by: Attila Toth <[email protected]>
# [0.35.0](feast-dev/feast@v0.34.0...v0.35.0) (2024-01-13) ### Bug Fixes * Add async refresh to prevent synchronous refresh in main thread ([feast-dev#3812](feast-dev#3812)) ([9583ed6](feast-dev@9583ed6)) * Adopt connection pooling for HBase ([feast-dev#3793](feast-dev#3793)) ([b3852bf](feast-dev@b3852bf)) * Bytewax engine create configmap from object ([feast-dev#3821](feast-dev#3821)) ([25e9775](feast-dev@25e9775)) * Fix warnings from deprecated paths and update default log level ([feast-dev#3757](feast-dev#3757)) ([68a8737](feast-dev@68a8737)) * improve parsing bytewax job status ([5983f40](feast-dev@5983f40)) * make bytewax settings unexposed ([ae1bb8b](feast-dev@ae1bb8b)) * Make generated temp table name escaped ([feast-dev#3797](feast-dev#3797)) ([175d796](feast-dev@175d796)) * Pin numpy version to avoid spammy deprecation messages ([774ed33](feast-dev@774ed33)) * Redundant feature materialization and premature incremental materialization timestamp updates ([feast-dev#3789](feast-dev#3789)) ([417b16b](feast-dev@417b16b)), closes [feast-dev#6](feast-dev#6) [feast-dev#7](feast-dev#7) * Resolve hbase hotspot issue when materializing ([feast-dev#3790](feast-dev#3790)) ([7376db8](feast-dev@7376db8)) * Set keepalives_idle None by default ([feast-dev#3756](feast-dev#3756)) ([8717e9b](feast-dev@8717e9b)) * Set upper bound for bigquery client due to its breaking changes ([2151c39](feast-dev@2151c39)) * UI project cannot handle fallback routes ([feast-dev#3766](feast-dev#3766)) ([96ece0f](feast-dev@96ece0f)) * update dependencies versions due to conflicts ([5dc0b24](feast-dev@5dc0b24)) * Update jackson and remove unnecessary logging ([feast-dev#3809](feast-dev#3809)) ([018d0ea](feast-dev@018d0ea)) * upgrade the pyarrow to latest v14.0.1 for CVE-2023-47248. ([052182b](feast-dev@052182b)) ### Features * Add get online feature rpc to gprc server ([feast-dev#3815](feast-dev#3815)) ([01db8cc](feast-dev@01db8cc)) * Add materialize and materialize-incremental rest endpoints ([feast-dev#3761](feast-dev#3761)) ([fa600fe](feast-dev@fa600fe)), closes [feast-dev#3760](feast-dev#3760) * add redis sentinel support ([3387a15](feast-dev@3387a15)) * add redis sentinel support ([4337c89](feast-dev@4337c89)) * add redis sentinel support format lint ([aad8718](feast-dev@aad8718)) * Add support for `table_create_disposition` in bigquery job for offline store ([feast-dev#3762](feast-dev#3762)) ([6a728fe](feast-dev@6a728fe)) * Add support for in_cluster config and additional labels for bytewax materialization ([feast-dev#3754](feast-dev#3754)) ([2192e65](feast-dev@2192e65)) * Apply cache to load proto registry for performance ([feast-dev#3702](feast-dev#3702)) ([709c709](feast-dev@709c709)) * Make bytewax job write as mini-batches ([feast-dev#3777](feast-dev#3777)) ([9b0e5ce](feast-dev@9b0e5ce)) * Optimize bytewax pod resource with zero-copy ([9cf9d96](feast-dev@9cf9d96)) * Support GCS filesystem for bytewax engine ([feast-dev#3774](feast-dev#3774)) ([fb6b807](feast-dev@fb6b807)) Signed-off-by: Attila Toth <[email protected]>
What this PR does / why we need it:
This PR fixes 2 issues encountered while attempting to run very large materializations (Several billion rows as part of an initial run of online materialization).
Which issue(s) this PR fixes:
Fixes #3786
Fixes #3787
Fixes #3788