Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add friendlier errors #789

Open
1 of 10 tasks
remyleone opened this issue Nov 3, 2023 · 3 comments
Open
1 of 10 tasks

Add friendlier errors #789

remyleone opened this issue Nov 3, 2023 · 3 comments

Comments

@remyleone
Copy link

🐛 Bug

Module (check all that applies):

  • torchx.spec
  • torchx.component
  • torchx.apps
  • torchx.runtime
  • torchx.cli
  • torchx.schedulers
  • torchx.pipelines
  • torchx.aws
  • torchx.examples
  • other

To Reproduce

$ torchx configure       
Traceback (most recent call last):
  File "/Users/rleone/.local/bin/torchx", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/rleone/.local/lib/python3.11/site-packages/torchx/cli/main.py", line 116, in main
    run_main(get_sub_cmds(), argv)
  File "/Users/rleone/.local/lib/python3.11/site-packages/torchx/cli/main.py", line 112, in run_main
    args.func(args)
  File "/Users/rleone/.local/lib/python3.11/site-packages/torchx/cli/cmd_configure.py", line 52, in run
    dump(f=f, schedulers=schedulers, required_only=required_only)
  File "/Users/rleone/.local/lib/python3.11/site-packages/torchx/runner/config.py", line 247, in dump
    sched = _get_scheduler(sched_name)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rleone/.local/lib/python3.11/site-packages/torchx/runner/config.py", line 203, in _get_scheduler
    sched = schedulers[name](session_name="_")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rleone/.local/lib/python3.11/site-packages/torchx/schedulers/__init__.py", line 37, in run
    module = importlib.import_module(path)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/Users/rleone/.local/lib/python3.11/site-packages/torchx/schedulers/ray_scheduler.py", line 446, in <module>
    session_name: str, ray_client: Optional[JobSubmissionClient] = None, **kwargs: Any
                                            ^^^^^^^^^^^^^^^^^^^
NameError: name 'JobSubmissionClient' is not defined

Expected behavior

I would expect an error message with a bit of documentation about what to do to configure my deployments.

Environment

$ torchx --version
torchx-0.6.0

$ python3 --version                               
Python 3.11.5

os: macos 14.1
pip installation
@kiukchung
Copy link
Collaborator

Thanks for reporting! How did you install torchx?

We use extra reqs to install additional scheduler dependencies based on the user's choice of scheduler backend. See: https://github.com/pytorch/torchx#stable.

For ray, I'm wondering if you installed it via:

pip install torchx[ray]

as suggested in the docs.

@remyleone
Copy link
Author

remyleone commented Nov 8, 2023

I've installed it using only pip install torchx I've tried to install it using pip install torchx[dev] but it took many hours on my setup and was not successful at the end. I think there was an issues with the dependencies. https://pytorch.org/torchx/main/quickstart.html

It is normal that it is looking for so many different versions of the same dependency?

Collecting kubernetes<26,>=8.0.0 (from kfp==1.8.22->torchx[dev])
  Using cached kubernetes-24.2.0-py2.py3-none-any.whl (1.5 MB)
  Using cached kubernetes-23.6.0-py2.py3-none-any.whl (1.5 MB)
  Using cached kubernetes-23.3.0-py2.py3-none-any.whl (1.5 MB)
  Using cached kubernetes-22.6.0-py2.py3-none-any.whl (1.5 MB)
  Using cached kubernetes-21.7.0-py2.py3-none-any.whl (1.8 MB)
  Using cached kubernetes-20.13.0-py2.py3-none-any.whl (1.8 MB)
  Using cached kubernetes-19.15.0-py2.py3-none-any.whl (1.7 MB)
  Using cached kubernetes-18.20.0-py2.py3-none-any.whl (1.6 MB)
  Using cached kubernetes-17.17.0-py3-none-any.whl (1.8 MB)
  Using cached kubernetes-12.0.1-py2.py3-none-any.whl (1.7 MB)
  Using cached kubernetes-12.0.0-py3-none-any.whl (1.7 MB)
  Using cached kubernetes-11.0.0-py3-none-any.whl (1.5 MB)
  Using cached kubernetes-10.1.0-py3-none-any.whl (1.5 MB)
INFO: pip is looking at multiple versions of kubernetes to determine which version is compatible with other requirements. This could take a while.
  Using cached kubernetes-10.0.1-py2.py3-none-any.whl (1.5 MB)
  Using cached kubernetes-10.0.0-py2.py3-none-any.whl (1.5 MB)
  Using cached kubernetes-9.0.1-py2.py3-none-any.whl (1.4 MB)
  Using cached kubernetes-9.0.0-py2.py3-none-any.whl (1.4 MB)
  Using cached kubernetes-8.0.2-py2.py3-none-any.whl (1.3 MB)

@kiukchung
Copy link
Collaborator

Ah this is due to the changes in pip's dependency resolution logic for python-3.11. Pip now backtracks deps and unfortunately, it needs to actually download wheels of direct deps to figure out transitive deps. This is why you're seeing so many versions of kubernetes being downloaded. You can fall back to the previous logic by:

pip install --use-deprecated=legacy-resolver -e .[dev]

Which is what we do for CI (see workflow).

I created a bug report to track and fix #790.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants