RADICAL-Pilot Integration with Parsl #2923

AymenFJA · 2023-10-19T14:28:25Z

Description

Summary: This PR integrates RADICAL-Pilot workload management and runtime system with Parsl as an executor.

The executor offers the following capabilities:

MPI execution of heterogeneous tasks:
- Single/Multi-node MPI functions that require a private MPI communicator on the runtime.
- Single/Multi-node (non)/MPI executables.
Heterogeneous task execution of all the above on GPUs and CPUs.
Improve the submission mechanism by offering bulk_mode submission alongside stream submission of tasks from Parsl
to the RADICAL-Pilot runtime system.

Fixes: None

Type of change: Additional features.

Can you please guide me on where to put examples that show how the executor can be used?

Thanks all.

addressing comments and refine

AymenFJA · 2023-11-11T15:02:59Z

I had a brief mess around with the timeout test on my laptop, including changing some of the timings in that test for larger values, and I couldn't see anything immediately wrong on the parsl side.

I don't know enough about the radical pilot logs to see if the executable for that test really takes more than 0.2 seconds to run.

I will dig deeper into the bash_app timeout. The reason the tests are not working (test_apptimeout) is that we do not execute the bash_apps and wrap them as Parsl does. We execute them as a black-box executable. I am investigating an approach like this:

Mark the bash_apps as rp.PROC, and in that way, we can use Parsl bash_app execution approach to execute a single-line or multi-line bash_apps.
I will introduce another decorator only for executables I.e. executing for example: python example.py or ./c_app. This will allow not to abandon RP's capabilities on executing multi-node MPI executables if the user does not want to use a python_app or bash_app

benclifford · 2023-11-11T15:05:28Z

From the way this PR sets a timeout on the task description based on the parsl app kwargs, I was imagining that that would cause radical pilot to enforce a timeout though... what's happening there?

AymenFJA · 2023-11-11T15:12:19Z

From the way this PR sets a timeout on the task description based on the parsl app kwargs, I was imagining that that would cause radical pilot to enforce a timeout though... what's happening there?

If I understand your question correctly. Yes, on the executables level, RP sets a timeout, and I quote, "Any timeout larger than 0 will result in the task process being killed after the specified number of seconds. The task will then end up in CANCELED state."

But it does not raise an exception, which is what Parsl tests expect for example, AppTimeout

benclifford · 2023-11-11T15:18:03Z

ah ok. So what state does a cancelled task end up back in the parsl side of things?

AymenFJA · 2023-11-12T07:41:54Z

ah ok. So what state does a cancelled task end up back in the parsl side of things?

We set it as future.cancel() check this line:

https://github.com/AymenFJA/parsl/blob/65ff8f4ac2c1afeeaf24f74dc40a2ac8df8e56db/parsl/executors/radical/executor.py#L184

We can move forward with this PR and I can open another feature PR with timeout workaround, what do you think?

benclifford · 2023-11-13T09:18:09Z

yeah I think this is fine to merge now - I'm going to pull out the CI changes I made because that has potential to affect other things so I would like it to be a PR; once that is merged, I'll merge this #2923.

This is motivated by upcoming PR #2923 which adds a RADICAL-Pilot executor: that executor is able to re-use a submit side virtualenv, but not able to re-use `--user` level installs. This should not change any other behaviour of the CI: packages will be installed at different paths, but anything that was reliant on those paths being a specific values is suspicious.

benclifford · 2023-11-13T09:37:52Z

parsl/executors/radical/executor.py

+BASH = 'bash'
+PYTHON = 'python'
+
+os.environ["RADICAL_REPORT"] = "False"


I'm a bit uncomfortable with screwing with the environment just because someone did "import parsl" and then didn't use radical pilot at all: this is something that affects non-radical parsl users.

That should have been addressed by us. A radical.utils PR now changes the default behavior.

…pendencies

benclifford · 2023-11-13T09:44:26Z

parsl/executors/radical/executor.py

+
+        return True
+
+    @property


Since the start of development of this executor, there have been some changes to the scaling API: I think it's safe to remove all three of scaling_enabled, scale_in and scale_out: scaling_enabled was removed in PR #2545 because scaling behaviour now comes from subclassing BlockProviderExecutor and scale_in, scale_out are now only needed for BlockProviderExecutor subclasses - this work was intended to remove the need for all these stubs on executors which will manage their own workers (such as the radical pilot executor)

benclifford · 2023-11-13T09:52:29Z

The most recent change I made to how radical is installed (which only installs radical.pilot quite late in the CI, right before radical tests) suggests that there's a problem for non-radical users if radical is not installed - I think this will break things for all non-RADICAL parsl users.

This was masked by installing radical.pilot right at the start of the test run (and that's part of the argument for installing things later rather than earlier)

pytest parsl/tests/ -k "not cleannet" --config parsl/tests/configs/local_threads.py --random-order --durations 10
============================= test session starts ==============================
platform linux -- Python 3.9.18, pytest-7.4.3, pluggy-1.3.0
Using --random-order-bucket=module
Using --random-order-seed=818270

rootdir: /home/runner/work/parsl/parsl/parsl/tests
configfile: pytest.ini
plugins: typeguard-2.13.3, cov-4.1.0, random-order-1.1.0
collected 239 items / 1 error / 1 skipped

==================================== ERRORS ====================================
_______________ ERROR collecting test_radical/test_mpi_funcs.py ________________
ImportError while importing test module '/home/runner/work/parsl/parsl/parsl/tests/test_radical/test_mpi_funcs.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
parsl/tests/test_radical/test_mpi_funcs.py:4: in <module>
    from parsl.tests.configs.local_radical_mpi import fresh_config as local_config
parsl/tests/configs/local_radical_mpi.py:4: in <module>
    from parsl.executors.radical import RadicalPilotExecutor
parsl/executors/radical/__init__.py:1: in <module>
    from parsl.executors.radical.executor import RadicalPilotExecutor
parsl/executors/radical/executor.py:14: in <module>
    import radical.pilot as rp
E   ModuleNotFoundError: No module named 'radical'
=========================== short test summary info ============================
SKIPPED [1] parsl/tests/configs/ec2_single_node.py:30: 'public_ip' not configured in user_opts.py
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
========================= 1 skipped, 1 error in 0.50s ==========================
make: *** [Makefile:52: local_thread_test] Error 1

This is motivated by upcoming PR #2923 which adds a RADICAL-Pilot executor: that executor is able to re-use a submit side virtualenv, but not able to re-use `--user` level installs. This should not change any other behaviour of the CI: packages will be installed at different paths, but anything that was reliant on those paths being a specific values is suspicious.

AymenFJA · 2023-11-13T20:49:58Z

The most recent change I made to how radical is installed (which only installs radical.pilot quite late in the CI, right before radical tests) suggests that there's a problem for non-radical users if radical is not installed - I think this will break things for all non-RADICAL parsl users.

This was masked by installing radical.pilot right at the start of the test run (and that's part of the argument for installing things later rather than earlier)

pytest parsl/tests/ -k "not cleannet" --config parsl/tests/configs/local_threads.py --random-order --durations 10
============================= test session starts ==============================
platform linux -- Python 3.9.18, pytest-7.4.3, pluggy-1.3.0
Using --random-order-bucket=module
Using --random-order-seed=818270

rootdir: /home/runner/work/parsl/parsl/parsl/tests
configfile: pytest.ini
plugins: typeguard-2.13.3, cov-4.1.0, random-order-1.1.0
collected 239 items / 1 error / 1 skipped

==================================== ERRORS ====================================
_______________ ERROR collecting test_radical/test_mpi_funcs.py ________________
ImportError while importing test module '/home/runner/work/parsl/parsl/parsl/tests/test_radical/test_mpi_funcs.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
parsl/tests/test_radical/test_mpi_funcs.py:4: in <module>
    from parsl.tests.configs.local_radical_mpi import fresh_config as local_config
parsl/tests/configs/local_radical_mpi.py:4: in <module>
    from parsl.executors.radical import RadicalPilotExecutor
parsl/executors/radical/__init__.py:1: in <module>
    from parsl.executors.radical.executor import RadicalPilotExecutor
parsl/executors/radical/executor.py:14: in <module>
    import radical.pilot as rp
E   ModuleNotFoundError: No module named 'radical'
=========================== short test summary info ============================
SKIPPED [1] parsl/tests/configs/ec2_single_node.py:30: 'public_ip' not configured in user_opts.py
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
========================= 1 skipped, 1 error in 0.50s ==========================
make: *** [Makefile:52: local_thread_test] Error 1

I have the same behavior on my machine now. How should we approach this? I honestly have no clue how to fix this. I assumed that isolating RP from the Parsl import (explicit import) would prevent this issue, but apparently not on the test level, at least for tests that are only for the RADICAL executor.

benclifford · 2023-11-13T21:47:17Z

@AymenFJA I'll have a look at what's happening with imports.

AymenFJA · 2023-11-13T23:18:30Z

@AymenFJA I'll have a look at what's happening with imports.

Sorry for the noise. I tried this, it does allows other executors to pass but it actually ignores the test:

@pytest.mark.local
def test_radical_mpi(n=7):
    from parsl.tests.configs.local_radical_mpi import fresh_config as local_config
    # rank size should be > 1 for the
    # radical runtime system to run this function in MPI env
    for i in range(2, n):
        t = test_mpi_func(msg='mpi.func.%06d' % i, sleep=1, ranks=i, comm=None)
        apps.append(t)
    assert [len(app.result()) for app in apps] == list(range(2, n))

…ld on my laptop

benclifford · 2023-11-14T16:41:42Z

@AymenFJA this is passing in CI again now, and the documentation is building and I think that parsl is again importable when radical.pilot is not installed.

The last few commits I added to this branch are a very awkward mechanism we've used elsewhere in parsl to let classes be importable enough to discover their docstrings (which is needed for documentation generation) - I really don't like how that works but I don't have a better solution at the moment.

If you're happy with those changes that I made, I'll merge this PR.

AymenFJA · 2023-11-14T16:48:47Z

@benclifford this seems reasonable to me at least at the moment. Yes please feel free to merge if nothing major is blocking. Soon I will prepare another PR towards fixing the timeout for the bashapps test. Thanks for your efforts I appreciate it.

AymenFJA and others added 30 commits May 17, 2023 14:50

radical_executor

60ca3bf

init

645640a

Merge branch 'Parsl:master' into master

34f13a3

Merge branch 'Parsl:master' into master

a28da44

Merge branch 'Parsl:master' into master

b6f8fb7

Merge branch 'Parsl:master' into master

648594c

Merge branch 'Parsl:master' into master

f3fb0da

Merge branch 'Parsl:master' into master

f476b64

Merge branch 'Parsl:master' into master

0d8366e

Merge branch 'Parsl:master' into master

8e380c5

Merge branch 'Parsl:master' into master

a3397d0

adapt the new RAPTOR changes and cleanup

051db89

Merge branch 'Parsl:master' into master

0427428

Merge branch 'Parsl:master' into master

94162ba

addressing comments and refine

7ddea2f

Merge branch 'Parsl:master' into master

9f1d59f

flake8

a8ed4bc

rpex name

e242c34

refine and cleanup

9d7a7a9

Merge branch 'Parsl:master' into master

79227c0

Merge branch 'Parsl:master' into feature/refine

0069488

Merge branch 'Parsl:master' into master

6197677

Merge branch 'Parsl:master' into feature/refine

1a8d6b7

Merge pull request #1 from AymenFJA/feature/refine

ecc0559

addressing comments and refine

Merge branch 'Parsl:master' into master

bccb46e

refine and adapt Parsl Base Executor

f697cfe

Merge branch 'Parsl:master' into master

d65b735

Merge branch 'Parsl:master' into master

a286b0b

radical tests

a1bb899

Merge branch 'Parsl:master' into master

739bb75

fix env name and do not include the agent sandbox in the runinfo folder

65ff8f4

benclifford mentioned this pull request Nov 13, 2023

Install into virtualenv inside CI #2952

Merged

benclifford added 2 commits November 13, 2023 09:32

Remove spurious whitespace change

5306eea

Add RadicalPilotExecutor to reference guide

62b55f9

benclifford reviewed Nov 13, 2023

View reviewed changes

Install radical.pilot Python dependency the same as other optional de…

738e316

…pendencies

benclifford reviewed Nov 13, 2023

View reviewed changes

andre-merzky mentioned this pull request Nov 13, 2023

REPORTER should be disabled by default radical-cybertools/radical.utils#394

Closed

benclifford and others added 2 commits November 13, 2023 13:28

Merge branch 'master' into master

d606825

address Ben's comments

faaaefa

AymenFJA and others added 5 commits November 13, 2023 18:32

a try to isolate RPEX specific test

790efd5

fix radical executor ref.

b0c48af

Fix a couple of documentation build failures - at least, docs now bui…

931e58b

…ld on my laptop

Rearrange import time dependencies to not need radical.pilot installed

426c2b4

Fix type annotations: only one type annotation for _setup_paths

7e891ba

benclifford merged commit 3bcbc5d into Parsl:master Nov 14, 2023
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RADICAL-Pilot Integration with Parsl #2923

RADICAL-Pilot Integration with Parsl #2923

AymenFJA commented Oct 19, 2023

AymenFJA commented Nov 11, 2023 •

edited

Loading

benclifford commented Nov 11, 2023

AymenFJA commented Nov 11, 2023

benclifford commented Nov 11, 2023

AymenFJA commented Nov 12, 2023

benclifford commented Nov 13, 2023

benclifford Nov 13, 2023

andre-merzky Nov 13, 2023

benclifford Nov 13, 2023

benclifford commented Nov 13, 2023

AymenFJA commented Nov 13, 2023

benclifford commented Nov 13, 2023

AymenFJA commented Nov 13, 2023

benclifford commented Nov 14, 2023

AymenFJA commented Nov 14, 2023

RADICAL-Pilot Integration with Parsl #2923

RADICAL-Pilot Integration with Parsl #2923

Conversation

AymenFJA commented Oct 19, 2023

Description

AymenFJA commented Nov 11, 2023 • edited Loading

benclifford commented Nov 11, 2023

AymenFJA commented Nov 11, 2023

benclifford commented Nov 11, 2023

AymenFJA commented Nov 12, 2023

benclifford commented Nov 13, 2023

benclifford Nov 13, 2023

Choose a reason for hiding this comment

andre-merzky Nov 13, 2023

Choose a reason for hiding this comment

benclifford Nov 13, 2023

Choose a reason for hiding this comment

benclifford commented Nov 13, 2023

AymenFJA commented Nov 13, 2023

benclifford commented Nov 13, 2023

AymenFJA commented Nov 13, 2023

benclifford commented Nov 14, 2023

AymenFJA commented Nov 14, 2023

AymenFJA commented Nov 11, 2023 •

edited

Loading