-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Fuzzlyn to CI #60344
Add Fuzzlyn to CI #60344
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue Detailsnull
|
5d010b9
to
33945c4
Compare
Add support for Fuzzlyn in the exploratory pipeline files. * We use the pipeline name to determine which tool to use, since that seems to be the easiest way to have this available during template expansion. * All the .yml files are shared, and the setup script is also shared (renamed to fuzzer_setup.py). However the summarize and run scripts are not shared. * The summarize scripts now use the AZDO feature that allows outputting a markdown file that shows up rendered under the pipeline results. These can be seen on the "Extensions" tab of AZDO. * For Fuzzlyn, we automatically reduce silent bad codegen examples found and include these in the summary (but we do not reduce examples if we are over time). Assertion errors are not reduced, but the documentation in exploratory.md contains some information on how to reduce these manually. This should just be a temporary measure until we can more efficiently reduce these. * The issue zips are now part of the issues artifact (and I removed the "summary" part of the name) since the Fuzzlyn summarize script reads the reduced examples from the zip, and I feel it's simpler to have all the info in one artifact.
33945c4
to
55749f0
Compare
/azp run Antigen, Fuzzlyn |
Azure Pipelines successfully started running 2 pipeline(s). |
The arm/arm64 Antigen failures in run 1423235 are strange. The helix step failed to download some artifacts and the logs seem to be truncated in exactly the same place: Partition0, Partition1 |
/azp run Antigen, Fuzzlyn |
Azure Pipelines successfully started running 2 pipeline(s). |
Similar failure in the Fuzzlyn run on linux arm partition 2. The results seem to indicate that it did run the subprocess (there is a Fuzzlyn log file here), but the console output and output from the script seems to be truncated: see here. @kunalspathak Any ideas what could be going on and why we don't see output from the "run" scripts? EDIT: The Fuzzlyn log file that is attached to the results is also truncated, strangely. |
Yes, I have seen those failures and it was an understanding that this happen because of long running python scripts, but the one that you are running is just an hour long. FYI - @MattGal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good progress. Added few questions/comments.
@@ -38,11 +38,11 @@ | |||
|
|||
<!-- For Scheduled= 3 hours. For PRs= 1 hour --> | |||
<PropertyGroup Condition=" '$(RunReason)' == 'Scheduled' "> | |||
<WorkItemTimeout>3:15</WorkItemTimeout> | |||
<WorkItemTimeout>3:30</WorkItemTimeout> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not a big deal but curious why is this increased by 15 minutes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is that I added the reduction of silent bad codegen examples in the background for Fuzzlyn. The Fuzzlyn run will not start reducing examples after the 1 hour is up, but it might start reducing an example after 00:59:59. Reducing a silent bad codegen example typically doesn't take more than a few minutes, but for very large programs it might use more than 15 minutes, especially for some of the platforms where we have weaker hardware in CI.
displayName: ${{ format('Print unique issues ({0})', parameters.osGroup) }} | ||
continueOnError: true | ||
- script: $(PythonScript) $(Build.SourcesDirectory)/src/coreclr/scripts/$(SummarizeScript) -issues_directory $(IssuesLocation) -arch $(archType) -platform $(osGroup)$(osSubgroup) -build_config $(buildConfig) | ||
displayName: ${{ format('Summarize ({0})', parameters.osGroup) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also include parameters.archType
in the displayName?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
@@ -426,7 +426,7 @@ def download_progress_hook(count, block_size, total_size): | |||
block_size (int) : size of a block | |||
total_size (int) : total size of a payload | |||
""" | |||
sys.stdout.write("\rDownloading {0:.1f}/{1:.1f} MB...".format(count * block_size / 1024 / 1024, total_size / 1024 / 1024)) | |||
sys.stdout.write("\rDownloading {0:.1f}/{1:.1f} MB...".format(min(count * block_size, total_size) / 1024 / 1024, total_size / 1024 / 1024)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this related change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's unrelated, but it's so small I didn't want to do separate PR/CI runs for it.
"Fuzzlyn": "https://github.com/jakobbotsch/Fuzzlyn.git", | ||
} | ||
|
||
repo_url = repo_urls[coreclr_args.tool_name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a check that repo_url is not None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this one is already verified above in setup_args
src/coreclr/scripts/exploratory.md
Outdated
|
||
The basics of both tools are the same: they generate random programs using Roslyn and execute them with `corerun.exe` in a baseline and a test mode. | ||
Typically, baseline uses the JIT with minimum optimizations enabled while the test mode has optimizations enabled. | ||
Antigen also sets various `COMPlus_*` variables in its test mode to turn off different stress modes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Antigen also sets various `COMPlus_*` variables in its test mode to turn off different stress modes. | |
Antigen also sets various `COMPlus_*` variables in its test mode to turn on different stress modes or turn on/off different optimizations. |
src/coreclr/scripts/exploratory.md
Outdated
|
||
## Getting test examples from Antigen runs | ||
|
||
For Antigen runs the summary will show the assertion errors that were hit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Antigen runs the summary will show the assertion errors that were hit. | |
For Antigen runs, the summary will show the assertion errors that were hit. |
|
||
# Turned off since the output does not seem particularly useful | ||
# if len(remaining_issues) > 0: | ||
# f.write("# {} uncategorized issues found\n", len(remaining_issues)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still print uncategorized issues found
line so we can come back and investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
It's hard to actually say what's going on here. We don't try to stream the output from the docker container continuously because in prototyping this caused issues, so this got as far as it got with its std out buffer and this is how much output it copied. Some thoughts about this problem:
I think the best thing to do is to get a matching device to run this on and run the exact payload directly from an interactive bash session, and see where it actually hangs. When you're to this stage, you can ping @ilyas1974 for some help with it. |
While you are here, can you also modify the following to log the output? runtime/src/coreclr/scripts/azdo_pipelines_util.py Lines 49 to 50 in 862a90f
if output:
print(output.strip().decode("utf-8") + "\n")
of.write(output.strip().decode("utf-8") + "\n") |
/azp run Fuzzlyn |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Looks like the latest CI run found a silent bad codegen example. I opened #60597 for it.
The output does seem to indicate that the Python script started to execute. The results here has a Fuzzlyn-linux-arm-Partition2.log file. This file is created by the Python script file running on the partition. Note that this file is truncated too, which seems strange. Anyway, I will merge this PR for now and then see if I can find some time to investigate the failure further. |
@jakobbotsch that is very interesting because for any result file like this, we directly mount the outer volume into the Helix Docker container, so no matter how badly things go that file should represent as far as execution got before "the bad thing" happened. I'd definitely suggest investigating this from that perspective, as it'd be harder for a file to be accidentally partially written in this configuration than losing some std out. |
Add support for Fuzzlyn in the exploratory pipeline files.
seems to be the easiest way to have this available during template
expansion.
(renamed to fuzzer_setup.py). However the summarize and run scripts
are not shared.
a markdown file that shows up rendered under the pipeline results.
These can be seen on the "Extensions" tab of AZDO.
and include these in the summary (but we do not reduce examples if we
are over time). Assertion errors are not reduced, but the
documentation in exploratory.md contains some information on how to
reduce these manually. This should just be a temporary measure until
we can more efficiently reduce these.
"summary" part of the name) since the Fuzzlyn summarize script reads
the reduced examples from the zip, and I feel it's simpler to have all
the info in one artifact.
I have also included a small fix for the superpmi download display progress so that we do not display a larger size than the file downloaded.