Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fine-tuning] Integrate Ray benchmarking as an alternative fine-tuning job #580

Merged
merged 10 commits into from
Nov 7, 2024

Conversation

kpouget
Copy link
Contributor

@kpouget kpouget commented Nov 5, 2024

No description provided.

Copy link

openshift-ci bot commented Nov 5, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from kpouget. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

topsail-bot bot commented Nov 5, 2024

Jenkins Job #1596

🔴 Test of 'rhoai test test_ci' failed after 00 hours 03 minutes 09 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray

• Link to the Rebuild page.

Failure indicator:

/logs/artifacts/000_test_ci/002__test_fine_tuning/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/000_test_ci/002__test_fine_tuning" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1}"' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 132, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 49, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,

[...]

[Test ran on the internal Perflab CI]

Copy link

topsail-bot bot commented Nov 5, 2024

Jenkins Job #1597

🔴 Test of 'rhoai test test_ci' failed after 00 hours 00 minutes 04 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray

• Link to the Rebuild page.

Failure indicator:

/logs/artifacts/000_test_ci/FAILURE | Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 368, in test
    failed = _run_test_and_visualize()
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 262, in _run_test_and_visualize
    raise RuntimeError(msg)
RuntimeError: RHOAI not installed, cluster not prepared for fine-tuning



[Test ran on the internal Perflab CI]

Copy link

topsail-bot bot commented Nov 5, 2024

Jenkins Job #1598

🔴 Test of 'rhoai test test_ci' failed after 00 hours 04 minutes 35 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray

• Link to the Rebuild page.

Failure indicator:

/logs/artifacts/000_test_ci/002__test_fine_tuning/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1} --> 2
/logs/artifacts/000_test_ci/002__test_fine_tuning/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/000_test_ci/002__test_fine_tuning" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 132, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 49, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

[Test ran on the internal Perflab CI]

Copy link

topsail-bot bot commented Nov 5, 2024

Jenkins Job #1599

🔴 Test of 'rhoai test test_ci' failed after 00 hours 00 minutes 04 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray

• Link to the Rebuild page.

Failure indicator:

/logs/artifacts/000_test_ci/FAILURE | Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 368, in test
    failed = _run_test_and_visualize()
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 262, in _run_test_and_visualize
    raise RuntimeError(msg)
RuntimeError: RHOAI not installed, cluster not prepared for fine-tuning



[Test ran on the internal Perflab CI]

Copy link

topsail-bot bot commented Nov 5, 2024

Jenkins Job #1600

🔴 Test of 'rhoai test test_ci' failed after 00 hours 04 minutes 22 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray

• Link to the Rebuild page.

Failure indicator:

/logs/artifacts/000_test_ci/002__test_fine_tuning/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1} --> 2
/logs/artifacts/000_test_ci/002__test_fine_tuning/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/000_test_ci/002__test_fine_tuning" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 132, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 49, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

[Test ran on the internal Perflab CI]

Copy link

topsail-bot bot commented Nov 5, 2024

Jenkins Job #1601

🔴 Test of 'rhoai test test_ci' failed after 00 hours 05 minutes 35 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray

• Link to the Rebuild page.

Failure indicator:

/logs/artifacts/000_test_ci/002__test_fine_tuning/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1} --> 2
/logs/artifacts/000_test_ci/002__test_fine_tuning/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/000_test_ci/002__test_fine_tuning" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 132, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 49, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

[Test ran on the internal Perflab CI]

Copy link

topsail-bot bot commented Nov 5, 2024

Jenkins Job #1602

🔴 Test of 'rhoai test test_ci' failed after 00 hours 25 minutes 02 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray

• Link to the Rebuild page.

Failure indicator:

/logs/artifacts/000_test_ci/002__test_fine_tuning/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1} --> 2
/logs/artifacts/000_test_ci/002__test_fine_tuning/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/000_test_ci/002__test_fine_tuning" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 132, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 49, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

[Test ran on the internal Perflab CI]

Copy link

topsail-bot bot commented Nov 6, 2024

Jenkins Job #1605

🔴 Test of 'rhoai test test_ci' failed after 00 hours 00 minutes 05 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray

• Link to the Rebuild page.

Failure indicator:

/logs/artifacts/000_test_ci/FAILURE | Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 368, in test
    failed = _run_test_and_visualize()
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 262, in _run_test_and_visualize
    raise RuntimeError(msg)
RuntimeError: RHOAI not installed, cluster not prepared for fine-tuning



[Test ran on the internal Perflab CI]

Copy link

topsail-bot bot commented Nov 6, 2024

Jenkins Job #1606

🔴 Test of 'rhoai test test_ci' failed after 00 hours 10 minutes 01 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray

• Link to the Rebuild page.

Failure indicator:

/logs/artifacts/001_test_ci/002__test_fine_tuning/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1} --> 2
/logs/artifacts/001_test_ci/002__test_fine_tuning/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/001_test_ci/002__test_fine_tuning" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 132, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 49, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

[Test ran on the internal Perflab CI]

Copy link

topsail-bot bot commented Nov 6, 2024

Jenkins Job #1607

🔴 Test of 'rhoai test test_ci' failed after 00 hours 09 minutes 39 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray

• Link to the Rebuild page.

Failure indicator:

/logs/artifacts/000_test_ci/002__test_fine_tuning/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1} --> 2
/logs/artifacts/000_test_ci/002__test_fine_tuning/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/000_test_ci/002__test_fine_tuning" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'fine-tuning', 'model_name': 'bigscience/bloom-560m@hf', 'dataset_name': 'twitter_complaints_small.json', 'gpu': 1, 'dataset_replication': 1}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 132, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 49, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

[Test ran on the internal Perflab CI]

Copy link

topsail-bot bot commented Nov 6, 2024

Jenkins Job #1608

🔴 Test of 'rhoai test test_ci' failed after 00 hours 00 minutes 06 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray_bench
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray_bench

• Link to the Rebuild page.

Failure indicator:

/logs/artifacts/000_test_ci/FAILURE | Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 368, in test
    failed = _run_test_and_visualize()
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 288, in _run_test_and_visualize
    failed = _run_test(test_artifact_dir_p, test_override_values)
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 87, in _run_test
    dataset_source = sources[test_settings["dataset_name"]]
KeyError: None



[Test ran on the internal Perflab CI]

@kpouget
Copy link
Contributor Author

kpouget commented Nov 6, 2024

/test rhoai-light fine_tuning ray_bench

Copy link

openshift-ci bot commented Nov 6, 2024

@kpouget: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/rhoai-light 47f946e link true /test rhoai-light

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link

topsail-bot bot commented Nov 6, 2024

Jenkins Job #1610

🔴 Test of 'rhoai test prepare_ci' failed after 00 hours 05 minutes 18 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test prepare_ci
PR_POSITIONAL_ARGS: ray_bench
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray_bench

• Link to the Rebuild page.

Failure indicator:

/logs/artifacts/000_prepare_ci/001__prepare2/000__prepare_namespace/FAILURE | UnboundLocalError: local variable 'model' referenced before assignment
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/prepare_finetuning.py", line 224, in prepare_namespace
    download_data_sources(test_settings)
  File "/opt/topsail/src/projects/fine_tuning/testing/prepare_finetuning.py", line 113, in download_data_sources
    elif isinstance(model, list):
UnboundLocalError: local variable 'model' referenced before assignment



[Test ran on the internal Perflab CI]

Copy link

topsail-bot bot commented Nov 7, 2024

Jenkins Job #1615

🔴 Test of 'rhoai test test_ci' failed after 00 hours 07 minutes 47 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray_bench
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray_bench

• Link to the Rebuild page.

Failure indicator:

/logs/artifacts/000_test_ci/001__ray__ray-benchmark/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'fine-tuning', 'gpu': 1} --> 2
/logs/artifacts/000_test_ci/001__ray__ray-benchmark/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/000_test_ci/001__ray__ray-benchmark" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'fine-tuning', 'gpu': 1}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 139, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 49, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

[Test ran on the internal Perflab CI]

Copy link

topsail-bot bot commented Nov 7, 2024

Jenkins Job #1616

🔴 Test of 'rhoai test test_ci' failed after 00 hours 07 minutes 16 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray_bench
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray_bench

• Link to the Rebuild page.

Failure indicator:

/logs/artifacts/000_test_ci/001__ray__ray-benchmark/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'fine-tuning', 'gpu': 1} --> 2
/logs/artifacts/000_test_ci/001__ray__ray-benchmark/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/000_test_ci/001__ray__ray-benchmark" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'fine-tuning', 'gpu': 1}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 139, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 49, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

[Test ran on the internal Perflab CI]

Copy link

topsail-bot bot commented Nov 7, 2024

Jenkins Job #1617

🔴 Test of 'rhoai test test_ci' failed after 00 hours 07 minutes 27 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray_bench
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray_bench

• Link to the Rebuild page.

Failure indicator:

/logs/artifacts/000_test_ci/001__ray__ray-benchmark/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'fine-tuning', 'gpu': 1, 'hyper_parameters': {'num_samples': 10}} --> 2
/logs/artifacts/000_test_ci/001__ray__ray-benchmark/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/000_test_ci/001__ray__ray-benchmark" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'fine-tuning', 'gpu': 1, 'hyper_parameters': {'num_samples': 10}}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 139, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 49, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

[Test ran on the internal Perflab CI]

Copy link

topsail-bot bot commented Nov 7, 2024

Jenkins Job #1618

🔴 Test of 'rhoai test test_ci' failed after 00 hours 02 minutes 17 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray_bench
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray_bench

• Link to the Rebuild page.

Failure indicator:

/logs/artifacts/000_test_ci/001__ray__ray-benchmark/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'fine-tuning', 'gpu': 1, 'hyper_parameters': {'num_samples': 10}} --> 2
/logs/artifacts/000_test_ci/001__ray__ray-benchmark/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/000_test_ci/001__ray__ray-benchmark" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'fine-tuning', 'gpu': 1, 'hyper_parameters': {'num_samples': 10}}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 139, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 49, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 105, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

[Test ran on the internal Perflab CI]

Copy link

topsail-bot bot commented Nov 7, 2024

Jenkins Job #1620

🟢 Test of 'rhoai test test_ci' succeeded after 00 hours 06 minutes 47 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ray_bench
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci
PR_POSITIONAL_ARG_1: ray_bench

• Link to the Rebuild page.

[Test ran on the internal Perflab CI]

Copy link

topsail-bot bot commented Nov 7, 2024

Jenkins Job #1621

🟢 Test of 'rhoai test test_ci' succeeded after 00 hours 07 minutes 32 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

# RHOAI: run fine_tuning test test_ci
PR_POSITIONAL_ARGS: ''
PR_POSITIONAL_ARG_0: fine_tuning-perf-ci

• Link to the Rebuild page.

[Test ran on the internal Perflab CI]

@kpouget
Copy link
Contributor Author

kpouget commented Nov 7, 2024

tests passed ❤️ , merging

@kpouget kpouget merged commit a268e0b into openshift-psap:main Nov 7, 2024
6 checks passed
@kpouget kpouget deleted the ray branch November 7, 2024 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant