Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade azureml-pipeline (and others) pyarrow dependency to at least >2.0.0 #1379

Closed
brunocous opened this issue Mar 8, 2021 · 9 comments
Closed
Labels
ADO Issue is documented on MSFT ADO for internal tracking Data Prep Services Data4ML Pipelines product-issue

Comments

@brunocous
Copy link

brunocous commented Mar 8, 2021

Currently, the azureml-pipeline pip package (version 1.23.0) and others requires pyarrow to be (>=0.17.0,<2.0.0).

Now, for my application I require pyarrow features which are available only after 2.0.0.

pip install --upgrade pyarrow after installing azureml-pipeline results in an failure resolving the versions.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
azureml-pipeline 1.23.0 requires pyarrow<2.0.0,>=0.17.0, but you have pyarrow 3.0.0 which is incompatible.

The fix is: updating the dependency list of the affected azureml packages.

@yikei
Copy link

yikei commented Mar 11, 2021

Hi @brunocous - Thank you for your feedback, we will look into loosening the upper bound of pyarrow after conducting thorough validation on whether newer versions have breaking changes on AzureML SDK or not.

In the meantime, even though pip shows an error about the incompatibility, the pip install --upgrade pyarrow should still succeed with Successfully installed pyarrow-3.0.0. Does this unblock your scenario, or do you run into other issues after upgrading pyarrow?

@brunocous brunocous changed the title Upgrading azureml-pipeline (and others) pyarrow dependency to at least >2.0.0 Upgrade azureml-pipeline (and others) pyarrow dependency to at least >2.0.0 Mar 11, 2021
@brunocous
Copy link
Author

brunocous commented Mar 11, 2021

There are indeed workarounds. Forcing the install of a recent pyarrow works in some setups (notebooks etc).

Our use-case consists of defining an Azure ML Pipeline with multiple PythonScriptSteps. In order to specify the dependencies, a RunConfiguration object is created with a CondaDependencies object to point to a conda_env.yaml file. Azure ML builds a container image with Python dependencies with whatever is in that file.
The problem is that conda is very strict in the way it resolves dependency conflicts. In our case, the building process is freezes after a while, resulting in a failed build.
A solution could be to built our own image, but then we need to manage our own containers, losing the "managed" experience of Azure ML.

@yikei
Copy link

yikei commented Mar 11, 2021

Hi @brunocous , thanks for the additional information. Totally makes sense that resolving conda dependencies from an env file would encounter issues. We will look into updating our dependencies safely, and reply here once we have more updates on that.

@v-strudm-msft v-strudm-msft added the ADO Issue is documented on MSFT ADO for internal tracking label Apr 27, 2021
@yikei
Copy link

yikei commented May 25, 2021

Hi @brunocous , in our recent release, we have increased the pyarrow dependency upper bound to allow pyarrow < 4.0.0. Thanks!

@brunocous
Copy link
Author

Yes, I noticed it! Thank you @yikei and the Azure Dev team :)

@poojithag554
Copy link

I was trying to import transformers in AzureML designer pipeline, it says for importing transformers and datasets the version of pyarrow needs to >=3.0.0, but then after upgrading pyarrow's version to 3.0.0 and importing transformers pyarrow version is reset to original version of 0.16.0. attaching few error samples. please have a look.

Got exception when invoking script: 'RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback):To use datasets, the module pyarrow>=3.0.0 is required, and the current version of pyarrow doesn't match this condition.If you are running this in a Google Colab, you should probably just restart the runtime to use the right version of pyarrow.' azureml-designer-core 0.0.68 requires pyarrow==0.16.0, but you'll have pyarrow 3.0.0 which is incompatible.

@yikei
Copy link

yikei commented Feb 28, 2022

Hi @poojithag554 , I recommend opening a new issue to get attention from the right people. This seems to be a dependency specified by azureml-designer-core, which is different from the dependency originally raised in this issue.

@sushmit86
Copy link

@yikei I am having similar conflicts with snowflake[pandas] and azureml-dataset-runtime==1.40.0? What should be the right place to raise the issue?

@yikei
Copy link

yikei commented Apr 14, 2022

Hi @sushmit86 , it looks like there is an open issue here: #1698
But unfortunately it hasn't gotten attention. I'll try to direct someone to look at that issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ADO Issue is documented on MSFT ADO for internal tracking Data Prep Services Data4ML Pipelines product-issue
Projects
None yet
Development

No branches or pull requests

7 participants