-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove pyarrow as a direct dependency #2228
Conversation
Signed-off-by: Thomas J. Fan <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2228 +/- ##
===========================================
+ Coverage 76.16% 91.32% +15.15%
===========================================
Files 243 144 -99
Lines 21282 6651 -14631
Branches 3915 0 -3915
===========================================
- Hits 16210 6074 -10136
+ Misses 4427 577 -3850
+ Partials 645 0 -645 ☔ View full report in Codecov by Sentry. |
I am putting this as a draft. Removing I need to make sure |
Signed-off-by: Thomas J. Fan <[email protected]>
Signed-off-by: Thomas J. Fan <[email protected]>
Signed-off-by: Thomas J. Fan <[email protected]>
@pingsutw This PR removes one of the biggest dependencies. On Linux amd64, The downside is that |
I just tested it and the error message is clear, so I think it's fine. import pandas as pd
from flytekit import task, workflow, StructuredDataset, ImageSpec
df = pd.DataFrame({"Name": ["Tom", "Joseph"], "Age": [1, 22]})
new_flytekit = "git+https://github.com/thomasjpfan/flytekit.git@603d3996bc4ed6907ba6c0296ae395e80f8e1dfc"
image_spec = ImageSpec(
base_image="python:3.10-slim-bookworm",
registry="pingsutw",
packages=[new_flytekit, "pandas"],
apt_packages=["git"]
)
@task(container_image=image_spec)
def t1(sd: StructuredDataset) -> StructuredDataset:
print(sd.open(pd.DataFrame).all())
return sd
@task(container_image=image_spec.with_packages("pyarrow"))
def t2(sd: StructuredDataset) -> StructuredDataset:
print(sd.open(pd.DataFrame).all())
return sd
@workflow
def wf():
t1(sd=StructuredDataset(df))
t2(sd=StructuredDataset(df)) |
Signed-off-by: Thomas J. Fan <[email protected]>
Signed-off-by: Thomas J. Fan <[email protected]> Signed-off-by: mao3267 <[email protected]>
…class] (#2603) * fix: set dataclass member as optional if default value is provided Signed-off-by: mao3267 <[email protected]> * lint Signed-off-by: mao3267 <[email protected]> * feat: handle nested dataclass conversion in JsonParamType Signed-off-by: mao3267 <[email protected]> * fix: handle errors caused by NoneType default value Signed-off-by: mao3267 <[email protected]> * test: add nested dataclass unit tests Signed-off-by: mao3267 <[email protected]> * Sagemaker dict determinism (#2597) * truncate sagemaker agent outputs Signed-off-by: Samhita Alla <[email protected]> * fix tests and update agent output Signed-off-by: Samhita Alla <[email protected]> * lint Signed-off-by: Samhita Alla <[email protected]> * fix test Signed-off-by: Samhita Alla <[email protected]> * add idempotence token to workflow Signed-off-by: Samhita Alla <[email protected]> * fix type Signed-off-by: Samhita Alla <[email protected]> * fix mixin Signed-off-by: Samhita Alla <[email protected]> * modify output handler Signed-off-by: Samhita Alla <[email protected]> * make the dictionary deterministic Signed-off-by: Samhita Alla <[email protected]> * nit Signed-off-by: Samhita Alla <[email protected]> --------- Signed-off-by: Samhita Alla <[email protected]> Signed-off-by: mao3267 <[email protected]> * refactor(core): Enhance return type extraction logic (#2598) Signed-off-by: Kevin Su <[email protected]> Signed-off-by: mao3267 <[email protected]> * Feat: Make exception raised by external command authenticator more actionable (#2594) Signed-off-by: Fabio Grätz <[email protected]> Co-authored-by: Fabio Grätz <[email protected]> Signed-off-by: mao3267 <[email protected]> * Fix: Properly re-raise non-grpc exceptions during refreshing of proxy-auth credentials in auth interceptor (#2591) Signed-off-by: Fabio Grätz <[email protected]> Co-authored-by: Fabio Grätz <[email protected]> Signed-off-by: mao3267 <[email protected]> * validate idempotence token length in subsequent tasks (#2604) * validate idempotence token length in subsequent tasks Signed-off-by: Samhita Alla <[email protected]> * remove redundant param Signed-off-by: Samhita Alla <[email protected]> * add tests Signed-off-by: Samhita Alla <[email protected]> --------- Signed-off-by: Samhita Alla <[email protected]> Signed-off-by: mao3267 <[email protected]> * Add nvidia-l4 gpu accelerator (#2608) Signed-off-by: Eduardo Apolinario <[email protected]> Co-authored-by: Eduardo Apolinario <[email protected]> Signed-off-by: mao3267 <[email protected]> * eliminate redundant literal conversion for `Iterator[JSON]` type (#2602) * eliminate redundant literal conversion for type Signed-off-by: Samhita Alla <[email protected]> * add test Signed-off-by: Samhita Alla <[email protected]> * lint Signed-off-by: Samhita Alla <[email protected]> * add isclass check Signed-off-by: Samhita Alla <[email protected]> --------- Signed-off-by: Samhita Alla <[email protected]> Signed-off-by: mao3267 <[email protected]> * [FlyteSchema] Fix numpy problems (#2619) Signed-off-by: Future-Outlier <[email protected]> Signed-off-by: mao3267 <[email protected]> * add nim plugin (#2475) * add nim plugin Signed-off-by: Samhita Alla <[email protected]> * move nim to inference Signed-off-by: Samhita Alla <[email protected]> * import fix Signed-off-by: Samhita Alla <[email protected]> * fix port Signed-off-by: Samhita Alla <[email protected]> * add pod_template method Signed-off-by: Samhita Alla <[email protected]> * add containers Signed-off-by: Samhita Alla <[email protected]> * update Signed-off-by: Samhita Alla <[email protected]> * clean up Signed-off-by: Samhita Alla <[email protected]> * remove cloud import Signed-off-by: Samhita Alla <[email protected]> * fix extra config Signed-off-by: Samhita Alla <[email protected]> * remove decorator Signed-off-by: Samhita Alla <[email protected]> * add tests, update readme Signed-off-by: Samhita Alla <[email protected]> * add env Signed-off-by: Samhita Alla <[email protected]> * add support for lora adapter Signed-off-by: Samhita Alla <[email protected]> * minor fixes Signed-off-by: Samhita Alla <[email protected]> * add startup probe Signed-off-by: Samhita Alla <[email protected]> * increase failure threshold Signed-off-by: Samhita Alla <[email protected]> * remove ngc secret group Signed-off-by: Samhita Alla <[email protected]> * move plugin to flytekit core Signed-off-by: Samhita Alla <[email protected]> * fix docs Signed-off-by: Samhita Alla <[email protected]> * remove hf group Signed-off-by: Samhita Alla <[email protected]> * modify podtemplate import Signed-off-by: Samhita Alla <[email protected]> * fix import Signed-off-by: Samhita Alla <[email protected]> * fix ngc api key Signed-off-by: Samhita Alla <[email protected]> * fix tests Signed-off-by: Samhita Alla <[email protected]> * fix formatting Signed-off-by: Samhita Alla <[email protected]> * lint Signed-off-by: Samhita Alla <[email protected]> * docs fix Signed-off-by: Samhita Alla <[email protected]> * docs fix Signed-off-by: Samhita Alla <[email protected]> * update secrets interface Signed-off-by: Samhita Alla <[email protected]> * add secret prefix Signed-off-by: Samhita Alla <[email protected]> * fix tests Signed-off-by: Samhita Alla <[email protected]> * add urls Signed-off-by: Samhita Alla <[email protected]> * add urls Signed-off-by: Samhita Alla <[email protected]> * remove urls Signed-off-by: Samhita Alla <[email protected]> * minor modifications Signed-off-by: Samhita Alla <[email protected]> * remove secrets prefix; add failure threshold Signed-off-by: Samhita Alla <[email protected]> * add hard-coded prefix Signed-off-by: Samhita Alla <[email protected]> * add comment Signed-off-by: Samhita Alla <[email protected]> * make secrets prefix a required param Signed-off-by: Samhita Alla <[email protected]> * move nim to flytekit plugin Signed-off-by: Samhita Alla <[email protected]> * update readme Signed-off-by: Samhita Alla <[email protected]> * update readme Signed-off-by: Samhita Alla <[email protected]> * update readme Signed-off-by: Samhita Alla <[email protected]> --------- Signed-off-by: Samhita Alla <[email protected]> Signed-off-by: mao3267 <[email protected]> * [Elastic/Artifacts] Pass through model card (#2575) Signed-off-by: Yee Hing Tong <[email protected]> Signed-off-by: mao3267 <[email protected]> * Remove pyarrow as a direct dependency (#2228) Signed-off-by: Thomas J. Fan <[email protected]> Signed-off-by: mao3267 <[email protected]> * Boolean flag to show local container logs to the terminal (#2521) Signed-off-by: aditya7302 <[email protected]> Signed-off-by: Kevin Su <[email protected]> Co-authored-by: Kevin Su <[email protected]> Signed-off-by: mao3267 <[email protected]> * Enable Ray Fast Register (#2606) Signed-off-by: Jan Fiedler <[email protected]> Signed-off-by: mao3267 <[email protected]> * [Artifacts/Elastic] Skip partitions (#2620) Signed-off-by: Yee Hing Tong <[email protected]> Signed-off-by: mao3267 <[email protected]> * Install flyteidl from master in plugins tests (#2621) Signed-off-by: Eduardo Apolinario <[email protected]> Co-authored-by: Eduardo Apolinario <[email protected]> Signed-off-by: mao3267 <[email protected]> * Using ParamSpec to show underlying typehinting (#2617) Signed-off-by: JackUrb <[email protected]> Signed-off-by: mao3267 <[email protected]> * Support ArrayNode mapping over Launch Plans (#2480) * set up array node Signed-off-by: Paul Dittamo <[email protected]> * wip array node task wrapper Signed-off-by: Paul Dittamo <[email protected]> * support function like callability Signed-off-by: Paul Dittamo <[email protected]> * temp check in some progress on python func wrapper Signed-off-by: Paul Dittamo <[email protected]> * only support launch plans in new array node class for now Signed-off-by: Paul Dittamo <[email protected]> * add map task array node implementation wrapper Signed-off-by: Paul Dittamo <[email protected]> * ArrayNode only supports LPs for now Signed-off-by: Paul Dittamo <[email protected]> * support local execute for new array node implementation Signed-off-by: Paul Dittamo <[email protected]> * add local execute unit tests for array node Signed-off-by: Paul Dittamo <[email protected]> * set exeucution version in array node spec Signed-off-by: Paul Dittamo <[email protected]> * check input types for local execute Signed-off-by: Paul Dittamo <[email protected]> * remove code that is un-needed for now Signed-off-by: Paul Dittamo <[email protected]> * clean up array node class Signed-off-by: Paul Dittamo <[email protected]> * improve naming Signed-off-by: Paul Dittamo <[email protected]> * clean up Signed-off-by: Paul Dittamo <[email protected]> * utilize enum execution mode to set array node execution path Signed-off-by: Paul Dittamo <[email protected]> * default execution mode to FULL_STATE for new array node class Signed-off-by: Paul Dittamo <[email protected]> * support min_successes for new array node Signed-off-by: Paul Dittamo <[email protected]> * add map task wrapper unit test Signed-off-by: Paul Dittamo <[email protected]> * set min successes for array node map task wrapper Signed-off-by: Paul Dittamo <[email protected]> * update docstrings Signed-off-by: Paul Dittamo <[email protected]> * Install flyteidl from master in plugins tests Signed-off-by: Eduardo Apolinario <[email protected]> * lint Signed-off-by: Paul Dittamo <[email protected]> * clean up min success/ratio setting Signed-off-by: Paul Dittamo <[email protected]> * lint Signed-off-by: Paul Dittamo <[email protected]> * make array node class callable Signed-off-by: Paul Dittamo <[email protected]> --------- Signed-off-by: Paul Dittamo <[email protected]> Signed-off-by: Eduardo Apolinario <[email protected]> Co-authored-by: Eduardo Apolinario <[email protected]> Signed-off-by: mao3267 <[email protected]> * Richer printing for some artifact objects (#2624) Signed-off-by: Yee Hing Tong <[email protected]> Signed-off-by: mao3267 <[email protected]> * ci: Add Python 3.9 to build matrix (#2622) Signed-off-by: Kevin Su <[email protected]> Signed-off-by: Eduardo Apolinario <[email protected]> Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: Eduardo Apolinario <[email protected]> Co-authored-by: Future-Outlier <[email protected]> Signed-off-by: mao3267 <[email protected]> * bump (#2627) Signed-off-by: Yee Hing Tong <[email protected]> Signed-off-by: mao3267 <[email protected]> * Added alt prefix head to FlyteFile.new_remote (#2601) * Added alt prefix head to FlyteFile.new_remote Signed-off-by: pryce-turner <[email protected]> * Added get_new_path method to FileAccessProvider, fixed new_remote method of FlyteFile Signed-off-by: pryce-turner <[email protected]> * Updated tests and added new path creator to FlyteFile/Dir new_remote methods Signed-off-by: pryce-turner <[email protected]> * Improved docstrings, fixed minor path sep bug, more descriptive naming, better test Signed-off-by: pryce-turner <[email protected]> * Formatting Signed-off-by: pryce-turner <[email protected]> --------- Signed-off-by: pryce-turner <[email protected]> Signed-off-by: mao3267 <[email protected]> * Feature gate for FlyteMissingReturnValueException (#2623) Signed-off-by: Kevin Su <[email protected]> Signed-off-by: mao3267 <[email protected]> * Remove use of multiprocessing from the OAuth client (#2626) * Remove use of multiprocessing from the OAuth client Signed-off-by: Robert Deaton <[email protected]> * Lint Signed-off-by: Robert Deaton <[email protected]> --------- Signed-off-by: Robert Deaton <[email protected]> Signed-off-by: mao3267 <[email protected]> * Update codespell in precommit to version 2.3.0 (#2630) Signed-off-by: mao3267 <[email protected]> * Fix Snowflake Agent Bug (#2605) * fix snowflake agent bug Signed-off-by: Future-Outlier <[email protected]> * a work version Signed-off-by: Future-Outlier <[email protected]> * Snowflake work version Signed-off-by: Future-Outlier <[email protected]> * fix secret encode Signed-off-by: Future-Outlier <[email protected]> * all works, I am so happy Signed-off-by: Future-Outlier <[email protected]> * improve additional protocol Signed-off-by: Future-Outlier <[email protected]> * fix tests Signed-off-by: Future-Outlier <[email protected]> * Fix Tests Signed-off-by: Future-Outlier <[email protected]> * update agent Signed-off-by: Kevin Su <[email protected]> * Add snowflake test Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * sd Signed-off-by: Kevin Su <[email protected]> * snowflake loglinks Signed-off-by: Future-Outlier <[email protected]> * add metadata Signed-off-by: Future-Outlier <[email protected]> * secret Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * remove table Signed-off-by: Future-Outlier <[email protected]> * add comment for get private key Signed-off-by: Future-Outlier <[email protected]> * update comments: Signed-off-by: Future-Outlier <[email protected]> * Fix Tests Signed-off-by: Future-Outlier <[email protected]> * update comments Signed-off-by: Future-Outlier <[email protected]> * update comments Signed-off-by: Future-Outlier <[email protected]> * Better Secrets Signed-off-by: Future-Outlier <[email protected]> * use union secret Signed-off-by: Future-Outlier <[email protected]> * Update Changes Signed-off-by: Future-Outlier <[email protected]> * use if not get_plugin().secret_requires_group() Signed-off-by: Future-Outlier <[email protected]> * Use Union SDK Signed-off-by: Future-Outlier <[email protected]> * Update Signed-off-by: Future-Outlier <[email protected]> * Fix Secrets Signed-off-by: Future-Outlier <[email protected]> * Fix Secrets Signed-off-by: Future-Outlier <[email protected]> * remove pacakge.json Signed-off-by: Future-Outlier <[email protected]> * lint Signed-off-by: Future-Outlier <[email protected]> * add snowflake-connector-python Signed-off-by: Future-Outlier <[email protected]> * fix test_snowflake Signed-off-by: Future-Outlier <[email protected]> * Try to fix tests Signed-off-by: Future-Outlier <[email protected]> * fix tests Signed-off-by: Future-Outlier <[email protected]> * Try Fix snowflake Import Signed-off-by: Future-Outlier <[email protected]> * snowflake test passed Signed-off-by: Future-Outlier <[email protected]> --------- Signed-off-by: Future-Outlier <[email protected]> Signed-off-by: Kevin Su <[email protected]> Co-authored-by: Kevin Su <[email protected]> Signed-off-by: mao3267 <[email protected]> * run test_missing_return_value on python 3.10+ (#2637) Signed-off-by: Kevin Su <[email protected]> Signed-off-by: mao3267 <[email protected]> * [Elastic] Fix context usage and apply fix to fork method (#2628) Signed-off-by: Yee Hing Tong <[email protected]> Signed-off-by: mao3267 <[email protected]> * Add flytekit-omegaconf plugin (#2299) * add flytekit-hydra Signed-off-by: mg515 <[email protected]> * fix small typo readme Signed-off-by: mg515 <[email protected]> * ruff ruff Signed-off-by: mg515 <[email protected]> * lint more Signed-off-by: mg515 <[email protected]> * rename plugin into flytekit-omegaconf Signed-off-by: mg515 <[email protected]> * lint sort imports Signed-off-by: mg515 <[email protected]> * use flytekit logger Signed-off-by: mg515 <[email protected]> * use flytekit logger #2 Signed-off-by: mg515 <[email protected]> * fix typing info in is_flatable Signed-off-by: mg515 <[email protected]> * use default_factory instead of mutable default value Signed-off-by: mg515 <[email protected]> * add python3.11 and python3.12 to setup.py Signed-off-by: mg515 <[email protected]> * make fmt Signed-off-by: mg515 <[email protected]> * define error message only once Signed-off-by: mg515 <[email protected]> * add docstring Signed-off-by: mg515 <[email protected]> * remove GenericEnumTransformer and tests Signed-off-by: mg515 <[email protected]> * fallback to TypeEngine.get_transformer(node_type) to find suitable transformer Signed-off-by: mg515 <[email protected]> * explicit valueerrors instead of asserts Signed-off-by: mg515 <[email protected]> * minor style improvements Signed-off-by: mg515 <[email protected]> * remove obsolete warnings Signed-off-by: mg515 <[email protected]> * import flytekit logger instead of instantiating our own Signed-off-by: mg515 <[email protected]> * docstrings in reST format Signed-off-by: mg515 <[email protected]> * refactor transformer mode Signed-off-by: mg515 <[email protected]> * improve docs Signed-off-by: mg515 <[email protected]> * refactor dictconfig class into smaller methods Signed-off-by: mg515 <[email protected]> * add unit tests for dictconfig transformer Signed-off-by: mg515 <[email protected]> * refactor of parse_type_description() Signed-off-by: mg515 <[email protected]> * add omegaconf plugin to pythonbuild.yaml --------- Signed-off-by: mg515 <[email protected]> Signed-off-by: Eduardo Apolinario <[email protected]> Co-authored-by: Eduardo Apolinario <[email protected]> Signed-off-by: mao3267 <[email protected]> * Adds extra-index-url to default image builder (#2636) Signed-off-by: Thomas J. Fan <[email protected]> Co-authored-by: Kevin Su <[email protected]> Signed-off-by: mao3267 <[email protected]> * reference_task should inherit from PythonTask (#2643) Signed-off-by: Kevin Su <[email protected]> Signed-off-by: mao3267 <[email protected]> * Fix Get Agent Secret Using Key (#2644) Signed-off-by: Future-Outlier <[email protected]> Signed-off-by: mao3267 <[email protected]> * fix: prevent converting Flyte types as custom dataclasses Signed-off-by: mao3267 <[email protected]> * fix: add None to output type Signed-off-by: mao3267 <[email protected]> * test: add unit test for nested dataclass inputs Signed-off-by: mao3267 <[email protected]> * test: add unit tests for nested dataclass, dataclass default value as None, and flyte type exceptions Signed-off-by: mao3267 <[email protected]> * fix: handle NoneType as default value of list type dataclass members Signed-off-by: mao3267 <[email protected]> * fix: add comments for `has_nested_dataclass` function Signed-off-by: mao3267 <[email protected]> * fix: make lint Signed-off-by: mao3267 <[email protected]> * fix: update tests regarding input through file and pipe Signed-off-by: mao3267 <[email protected]> * Make JsonParamType convert faster Signed-off-by: Future-Outlier <[email protected]> * make has_nested_dataclass func more clean and add tests for dataclass_with_optional_fields Signed-off-by: Future-Outlier <[email protected]> * make logic more backward compatible Signed-off-by: Future-Outlier <[email protected]> * fix: handle indexing errors in dict/list while checking nested dataclass, add comments Signed-off-by: mao3267 <[email protected]> --------- Signed-off-by: mao3267 <[email protected]> Co-authored-by: Kevin Su <[email protected]> Co-authored-by: Future-Outlier <[email protected]>
Tracking issue
Continues flyteorg/flyte#4418
Why are the changes needed?
From flyteorg/flyte#4418 (comment),
pyarrow
is the largest dependency. This PR removes the dependency and lazy loads it.What changes were proposed in this pull request?
With this PR,
pyarrow
is now lazy loaded. The lazy loading mechanism is the same as the one used forpandas
.How was this patch tested?
In two of the test environments,
pyarrow
is removed to make sureflytekit
works withoutpyarrow
installed.