Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add in-tree special_models test suite using reworked iree-tooling. #17883

Merged
merged 16 commits into from
Jul 12, 2024

Conversation

saienduri
Copy link
Collaborator

@saienduri saienduri commented Jul 12, 2024

With this, we move away from using all the specialized json config files and complex workflows.
Instead, we use python scripts which allow us to use custom flags, tolerances, and configurations based on the backend/model.
Related PR in TestSuite: nod-ai/SHARK-TestSuite#271

This PR also removes all dependencies on SHARK-TestSuite tooling. Reworked the tools here so that downloading, caching, testing, and benchmarking occurs as intended with tools solely from this repo for iree_special_models. Whenever we are adding test files here, the goal is for an IREE user to be able to clone the repo and run the run tests knowing nothing about the SHARK-TestSuite .

Also didn't realize, but ireers here already has a process of stamping here to check if a file is already produced. I think we have to remove this because it will skip even if there is a newer version of the file available and there's really no point when downloading to a cache because once it's there, it is never removed so not a valuable signal.

(Third times the charm. Had to close the last two versions of this PR because couldn't get passed a pre-commit check that led me to rebase and add a bunch of commits that weren't mine 🤦 )

ci-exactly: build_all, test_amd_mi300, build_packages, regression_test

@saienduri saienduri marked this pull request as draft July 12, 2024 04:48
@ScottTodd ScottTodd added infrastructure Relating to build systems, CI, or testing infrastructure/benchmark Relating to benchmarking infrastructure labels Jul 12, 2024
.github/workflows/pkgci_regression_test.yml Show resolved Hide resolved
.github/workflows/pkgci_regression_test.yml Outdated Show resolved Hide resolved
experimental/benchmarks/sdxl/conftest.py Show resolved Hide resolved
Comment on lines +4 to +10
def pytest_addoption(parser):
parser.addoption(
"--goldentime-rocm-e2e-ms",
action="store",
type=float,
help="Golden time to test benchmark",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This can be deferred to a follow-up refactoring, to keep this PR more incremental)

We might not need flags for these options now that the files are living in the same repo.

I'd prefer for the workflow files to be relatively frozen, with expected results stored in source files in the tests (or very close to the tests). We could pull from a source file like this: https://github.com/iree-org/iree/blob/main/build_tools/benchmarks/common/benchmark_thresholds.py

This puts default values in a source file but then none of the defaults actually matter, since CI jobs always override them with CLI flags. When a developer runs the test they shouldn't also need to match exactly the flags used in the workflow file.

experimental/regression_suite/ireers_tools/artifacts.py Outdated Show resolved Hide resolved
Comment on lines +1 to +4
class VmfbManager:
sdxl_clip_cpu_vmfb = None
sdxl_vae_cpu_vmfb = None
sdxl_unet_cpu_vmfb = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this file exist? Can't these just be inlined into the files that use them?

Copy link
Collaborator Author

@saienduri saienduri Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this file to share data between tests. There is no cleaner way to inline and use the result from one test in another test

Copy link
Member

@ScottTodd ScottTodd Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which tests share data? I couldn't tell from a high level glance through. A few cases that I looked seemed to only use the vmfb from within the same file, so wouldn't a global in those files serve the same purpose as this separate file?

I want to avoid having a file like this needing to know about every configuration of every test.

Copy link
Collaborator Author

@saienduri saienduri Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I tried the global variable method. But, the way it works with those is that you can only consume global variables in all the independent tests. But, we can't change the value in one test for the vmfb and then have the run_module test consume that vmfb generated. The cleanest way I was able to find was this.

(By tests sharing data I mean every iree_run_module test is trying to use a vmfb generated by the respective iree_compile test)

Comment on lines +18 to +26
sd3_clip_inference_input_0 = fetch_source_fixture(
"https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sd3-prompt-encoder/inference_input.0.bin",
group="sd3_clip",
)

sd3_clip_inference_input_1 = fetch_source_fixture(
"https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sd3-prompt-encoder/inference_input.1.bin",
group="sd3_clip",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine for now, but we should zip these (nod-ai/SHARK-TestSuite#285) or use an array somehow here. This is a lot of boilerplate to download and use individual loose files.

Comment on lines +159 to +160
@pytest.mark.depends(on=["test_compile_clip_rocm"])
def test_run_clip_rocm(SD3_CLIP_COMMON_RUN_FLAGS, sd3_clip_real_weights):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add pytest marks for cpu, rocm, etc.

Copy link
Collaborator Author

@saienduri saienduri Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we are using -k command line arg similar to SHARK-TestSuite/iree_tests to check for the backend pattern, but yeah should mark here too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, fine to do what is most similar to the current setup in this PR, but marks will be more flexible. We couldn't as easily use marks before since the test cases were generated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this test style is a big improvement over the cross-repo style with json files.

Comment on lines +124 to +127
def test_compile_clip_cpu(sd3_clip_mlir):
VmfbManager.sd3_clip_cpu_vmfb = iree_compile(
sd3_clip_mlir, "cpu", CPU_COMPILE_FLAGS
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this differ from the fixture approach taken here?

@pytest.fixture
def llama2_7b_f16qi4_stripped_rdna3_rocm_vmfb(llama2_7b_f16qi4_stripped_source):
return iree_compile(
llama2_7b_f16qi4_stripped_source,
"rdna3_rocm",
flags=COMMON_FLAGS
+ [
"--iree-hal-target-backends=rocm",
"--iree-rocm-target-chip=gfx1100",
],
)
@pytest.mark.presubmit
@pytest.mark.unstable_linalg
@pytest.mark.plat_rdna3_rocm
def test_step_rdna3_rocm_stripped(llama2_7b_f16qi4_stripped_rdna3_rocm_vmfb):
iree_benchmark_module(
llama2_7b_f16qi4_stripped_rdna3_rocm_vmfb,
device="rocm",
function="first_vicuna_forward",
args=[
"--input=1x1xi64",
],
)
iree_benchmark_module(
llama2_7b_f16qi4_stripped_rdna3_rocm_vmfb,
device="rocm",
function="second_vicuna_forward",
args=[
"--input=1x1xi64",
]
+ (["--input=1x32x1x128xf16"] * 64),
)

Copy link
Collaborator Author

@saienduri saienduri Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous approach, we only had the iree_run_module part as the tests. Now, we make the iree_compile part tests also. They wouldn't show as tests in the log if we make compilation a fixture, which I think is pretty valuable for anyone running these tests.

@ScottTodd
Copy link
Member

I think we should aim to land this as an incremental move and then continue to iterate on the design after getting past the cross-repository work hurdles. Minimally, we should:

  • Ensure that compilation tests are working as expected: changes to the compiler should re-run .vmfb generation
  • Ensure that the right tests are running on each CI machine/job: check that tags and filters are all properly assigned

Both of those seem like small but nontrivial tasks, so I think we should still revert the test suite PR and get the other updates (including the torch-mlir bump) running in parallel.

Beyond that, I think focusing on the development experience with these tests will guide answers to the other review comments. The tests should be runnable locally using filters appropriate for whatever system they are run on. If a test fails, it should be obvious how to mark it as xfail in that configuration. If a benchmark is outside of expected values, it should be obvious how to update the values for that configuration (I'm not sure if CLI flags or source files is best for that yet).

@saienduri saienduri changed the title Update PckgCI regression test to use new special_models test suite. Add in-tree special_models test suite using reworked iree-tooling. Jul 12, 2024
@saienduri saienduri requested a review from ScottTodd July 12, 2024 20:29
@saienduri saienduri marked this pull request as ready for review July 12, 2024 20:36
Copy link
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, good enough to land with a few nits. A bit rough around the edges so starting in experimental/ is reasonable. Let's aim to iterate on the code and get it out of experimental/ in O(1-2 weeks).

Comment on lines +39 to +42
if return_code == 0:
return 0, proc.stdout
logging.getLogger().info(f"Command failed with error: {proc.stderr}")
return 1, proc.stdout
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has the same bug from nod-ai/SHARK-TestSuite#286. Once landed we can iterate on a fix in this repo.

Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
@saienduri saienduri merged commit 44808e1 into iree-org:main Jul 12, 2024
47 checks passed
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this pull request Jul 30, 2024
…ree-org#17883)

With this, we move away from using all the specialized json config files
and complex workflows.
Instead, we use python scripts which allow us to use custom flags,
tolerances, and configurations based on the backend/model.
Related PR in TestSuite:
nod-ai/SHARK-TestSuite#271

This PR also removes all dependencies on SHARK-TestSuite tooling.
Reworked the tools here so that downloading, caching, testing, and
benchmarking occurs as intended with tools solely from this repo for
iree_special_models. Whenever we are adding test files here, the goal is
for an IREE user to be able to clone the repo and run the run tests
knowing nothing about the SHARK-TestSuite .

Also didn't realize, but ireers here already has a process of stamping
here to check if a file is already produced. I think we have to remove
this because it will skip even if there is a newer version of the file
available and there's really no point when downloading to a cache
because once it's there, it is never removed so not a valuable signal.

(Third times the charm. Had to close the last two versions of this PR
because couldn't get passed a pre-commit check that led me to rebase and
add a bunch of commits that weren't mine 🤦 )

ci-exactly: build_all, test_amd_mi300, build_packages, regression_test

---------

Signed-off-by: saienduri <[email protected]>
Signed-off-by: Lubo Litchev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure/benchmark Relating to benchmarking infrastructure infrastructure Relating to build systems, CI, or testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants