Add Fuzzlyn to CI #60344

jakobbotsch · 2021-10-13T15:18:43Z

Add support for Fuzzlyn in the exploratory pipeline files.

We use the pipeline name to determine which tool to use, since that
seems to be the easiest way to have this available during template
expansion.
All the .yml files are shared, and the setup script is also shared
(renamed to fuzzer_setup.py). However the summarize and run scripts
are not shared.
The summarize scripts now use the AZDO feature that allows outputting
a markdown file that shows up rendered under the pipeline results.
These can be seen on the "Extensions" tab of AZDO.
For Fuzzlyn, we automatically reduce silent bad codegen examples found
and include these in the summary (but we do not reduce examples if we
are over time). Assertion errors are not reduced, but the
documentation in exploratory.md contains some information on how to
reduce these manually. This should just be a temporary measure until
we can more efficiently reduce these.
The issue zips are now part of the issues artifact (and I removed the
"summary" part of the name) since the Fuzzlyn summarize script reads
the reduced examples from the zip, and I feel it's simpler to have all
the info in one artifact.

I have also included a small fix for the superpmi download display progress so that we do not display a larger size than the file downloaded.

ghost · 2021-10-13T15:18:49Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author:	jakobbotsch
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

eng/pipelines/coreclr/templates/jit-run-exploratory-job.yml

Add support for Fuzzlyn in the exploratory pipeline files. * We use the pipeline name to determine which tool to use, since that seems to be the easiest way to have this available during template expansion. * All the .yml files are shared, and the setup script is also shared (renamed to fuzzer_setup.py). However the summarize and run scripts are not shared. * The summarize scripts now use the AZDO feature that allows outputting a markdown file that shows up rendered under the pipeline results. These can be seen on the "Extensions" tab of AZDO. * For Fuzzlyn, we automatically reduce silent bad codegen examples found and include these in the summary (but we do not reduce examples if we are over time). Assertion errors are not reduced, but the documentation in exploratory.md contains some information on how to reduce these manually. This should just be a temporary measure until we can more efficiently reduce these. * The issue zips are now part of the issues artifact (and I removed the "summary" part of the name) since the Fuzzlyn summarize script reads the reduced examples from the zip, and I feel it's simpler to have all the info in one artifact.

jakobbotsch · 2021-10-15T13:24:42Z

/azp run Antigen, Fuzzlyn

azure-pipelines · 2021-10-15T13:25:10Z

Azure Pipelines successfully started running 2 pipeline(s).

jakobbotsch · 2021-10-18T12:12:11Z

The arm/arm64 Antigen failures in run 1423235 are strange. The helix step failed to download some artifacts and the logs seem to be truncated in exactly the same place: Partition0, Partition1

jakobbotsch · 2021-10-18T12:23:17Z

/azp run Antigen, Fuzzlyn

azure-pipelines · 2021-10-18T12:23:46Z

Azure Pipelines successfully started running 2 pipeline(s).

jakobbotsch · 2021-10-18T15:26:03Z

Similar failure in the Fuzzlyn run on linux arm partition 2. The results seem to indicate that it did run the subprocess (there is a Fuzzlyn log file here), but the console output and output from the script seems to be truncated: see here.

@kunalspathak Any ideas what could be going on and why we don't see output from the "run" scripts?

EDIT: The Fuzzlyn log file that is attached to the results is also truncated, strangely.

kunalspathak · 2021-10-18T15:30:11Z

@kunalspathak Any ideas what could be going on and why we don't see output from the "run" scripts?

Yes, I have seen those failures and it was an understanding that this happen because of long running python scripts, but the one that you are running is just an hour long. FYI - @MattGal

kunalspathak

Good progress. Added few questions/comments.

kunalspathak · 2021-10-18T15:20:33Z

src/coreclr/scripts/exploratory.proj

@@ -38,11 +38,11 @@

  <!-- For Scheduled= 3 hours. For PRs= 1 hour -->
  <PropertyGroup Condition=" '$(RunReason)' == 'Scheduled' ">
-      <WorkItemTimeout>3:15</WorkItemTimeout>
+      <WorkItemTimeout>3:30</WorkItemTimeout>


It is not a big deal but curious why is this increased by 15 minutes?

The reason is that I added the reduction of silent bad codegen examples in the background for Fuzzlyn. The Fuzzlyn run will not start reducing examples after the 1 hour is up, but it might start reducing an example after 00:59:59. Reducing a silent bad codegen example typically doesn't take more than a few minutes, but for very large programs it might use more than 15 minutes, especially for some of the platforms where we have weaker hardware in CI.

kunalspathak · 2021-10-18T15:32:08Z

eng/pipelines/coreclr/templates/jit-run-exploratory-job.yml

-      displayName: ${{ format('Print unique issues ({0})', parameters.osGroup) }}
-      continueOnError: true
+    - script: $(PythonScript) $(Build.SourcesDirectory)/src/coreclr/scripts/$(SummarizeScript) -issues_directory $(IssuesLocation) -arch $(archType) -platform $(osGroup)$(osSubgroup) -build_config $(buildConfig)
+      displayName: ${{ format('Summarize ({0})', parameters.osGroup) }}


Could you also include parameters.archType in the displayName?

kunalspathak · 2021-10-18T15:32:43Z

src/coreclr/scripts/superpmi.py

@@ -426,7 +426,7 @@ def download_progress_hook(count, block_size, total_size):
        block_size (int)          : size of a block
        total_size (int)          : total size of a payload
    """
-    sys.stdout.write("\rDownloading {0:.1f}/{1:.1f} MB...".format(count * block_size / 1024 / 1024, total_size / 1024 / 1024))
+    sys.stdout.write("\rDownloading {0:.1f}/{1:.1f} MB...".format(min(count * block_size, total_size) / 1024 / 1024, total_size / 1024 / 1024))


Is this related change?

No, it's unrelated, but it's so small I didn't want to do separate PR/CI runs for it.

kunalspathak · 2021-10-18T15:34:52Z

src/coreclr/scripts/fuzzer_setup.py

+        "Fuzzlyn": "https://github.com/jakobbotsch/Fuzzlyn.git",
+    }
+
+    repo_url = repo_urls[coreclr_args.tool_name]


Could you add a check that repo_url is not None?

Actually this one is already verified above in setup_args

kunalspathak · 2021-10-18T15:35:49Z

src/coreclr/scripts/exploratory.md

+
+The basics of both tools are the same: they generate random programs using Roslyn and execute them with `corerun.exe` in a baseline and a test mode.
+Typically, baseline uses the JIT with minimum optimizations enabled while the test mode has optimizations enabled.
+Antigen also sets various `COMPlus_*` variables in its test mode to turn off different stress modes.


Suggested change

Antigen also sets various `COMPlus_*` variables in its test mode to turn off different stress modes.

Antigen also sets various `COMPlus_*` variables in its test mode to turn on different stress modes or turn on/off different optimizations.

kunalspathak · 2021-10-18T15:36:56Z

src/coreclr/scripts/exploratory.md

+
+## Getting test examples from Antigen runs
+
+For Antigen runs the summary will show the assertion errors that were hit.


Suggested change

For Antigen runs the summary will show the assertion errors that were hit.

For Antigen runs, the summary will show the assertion errors that were hit.

kunalspathak · 2021-10-18T15:42:24Z

src/coreclr/scripts/antigen_summarize.py

+
+# Turned off since the output does not seem particularly useful
+#        if len(remaining_issues) > 0:
+#            f.write("# {} uncategorized issues found\n", len(remaining_issues))


I would still print uncategorized issues found line so we can come back and investigate.

src/coreclr/scripts/fuzzlyn_run.py

MattGal · 2021-10-18T15:58:10Z

@kunalspathak Any ideas what could be going on and why we don't see output from the "run" scripts?

Yes, I have seen those failures and it was an understanding that this happen because of long running python scripts, but the one that you are running is just an hour long. FYI - @MattGal

It's hard to actually say what's going on here. We don't try to stream the output from the docker container continuously because in prototyping this caused issues, so this got as far as it got with its std out buffer and this is how much output it copied.

Some thoughts about this problem:

While older, the image is still in heavy usage daily (something like 526411 successful work items used it in the past month) so it's not likely specific to the image, rather what's running inside it.
At least for this instance of the problem, we don't actually know if / how far it's getting past the "install dependencies on this container so I can use Helix functionality" stage or not. It probably is, because it didn't time out, and pip would log some number of errors if it had exited w/ code 1 (unfortunately, 1 is many executables' favorite generic exit code including XUnit.) - Writing any old file to $HELIX_WORKITEM_UPLOAD_ROOT as the first thing you do should answer that question.
You can get rid or reduce this stage in execution by updating the image to latest; every time someone revs a dependency in the helix scripts this gets the docker images' preinstalled dependencies slightly out of whack with what's needed. In this case you'd update to mcr.microsoft.com/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-20211018132704-c537e64

I think the best thing to do is to get a matching device to run this on and run the exact payload directly from an interactive bash session, and see where it actually hangs. When you're to this stage, you can ping @ilyas1974 for some help with it.

kunalspathak · 2021-10-18T16:01:37Z

While you are here, can you also modify the following to log the output?

runtime/src/coreclr/scripts/azdo_pipelines_util.py

Lines 49 to 50 in 862a90f

    
           if output: 
        
               of.write(output.strip().decode("utf-8") + "\n")

 if output: 
     print(output.strip().decode("utf-8") + "\n") 
     of.write(output.strip().decode("utf-8") + "\n")

jakobbotsch · 2021-10-18T17:59:01Z

/azp run Fuzzlyn

azure-pipelines · 2021-10-18T17:59:16Z

Azure Pipelines successfully started running 1 pipeline(s).

kunalspathak

LGTM!

jakobbotsch · 2021-10-18T23:30:22Z

Looks like the latest CI run found a silent bad codegen example. I opened #60597 for it.

@MattGal

At least for this instance of the problem, we don't actually know if / how far it's getting past the "install dependencies on this container so I can use Helix functionality" stage or not. It probably is, because it didn't time out, and pip would log some number of errors if it had exited w/ code 1 (unfortunately, 1 is many executables' favorite generic exit code including XUnit.) - Writing any old file to $HELIX_WORKITEM_UPLOAD_ROOT as the first thing you do should answer that question.

The output does seem to indicate that the Python script started to execute. The results here has a Fuzzlyn-linux-arm-Partition2.log file. This file is created by the Python script file running on the partition. Note that this file is truncated too, which seems strange.

Anyway, I will merge this PR for now and then see if I can find some time to investigate the failure further.

MattGal · 2021-10-18T23:32:44Z

Note that this file is truncated too, which seems strange.

@jakobbotsch that is very interesting because for any result file like this, we directly mount the outer volume into the Helix Docker container, so no matter how badly things go that file should represent as far as execution got before "the bad thing" happened. I'd definitely suggest investigating this from that perspective, as it'd be harder for a file to be accidentally partially written in this configuration than losing some std out.

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 13, 2021

kunalspathak reviewed Oct 13, 2021

View reviewed changes

eng/pipelines/coreclr/templates/jit-run-exploratory-job.yml Outdated Show resolved Hide resolved

jakobbotsch force-pushed the fuzzlyn-in-ci branch 2 times, most recently from 5d010b9 to 33945c4 Compare October 15, 2021 12:20

jakobbotsch added 3 commits October 15, 2021 14:21

Cap downloaded size to total file size when reporting progress

c696379

Rename src/coreclr/scripts/{antigen.md -> exploratory.md}

74785d4

jakobbotsch force-pushed the fuzzlyn-in-ci branch from 33945c4 to 55749f0 Compare October 15, 2021 12:21

runfoapp bot mentioned this pull request Oct 15, 2021

System.IO.Tests.File_ReadWriteAllBytes.ReadAllBytes_NonSeekableFileStream_InWindows failed #60427

Open

Let summarize script fail pipeline

b0f36cb

jakobbotsch marked this pull request as ready for review October 18, 2021 14:34

jakobbotsch requested review from kunalspathak and BruceForstall October 18, 2021 14:34

kunalspathak reviewed Oct 18, 2021

View reviewed changes

Address feedback

68b2613

kunalspathak approved these changes Oct 18, 2021

View reviewed changes

jakobbotsch merged commit 730d1f4 into dotnet:main Oct 18, 2021

jakobbotsch deleted the fuzzlyn-in-ci branch October 18, 2021 23:31

ghost locked as resolved and limited conversation to collaborators Nov 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Fuzzlyn to CI #60344

Add Fuzzlyn to CI #60344

jakobbotsch commented Oct 13, 2021 •

edited

Loading

ghost commented Oct 13, 2021

jakobbotsch commented Oct 15, 2021

azure-pipelines bot commented Oct 15, 2021

jakobbotsch commented Oct 18, 2021

jakobbotsch commented Oct 18, 2021

azure-pipelines bot commented Oct 18, 2021

jakobbotsch commented Oct 18, 2021 •

edited

Loading

kunalspathak commented Oct 18, 2021

kunalspathak left a comment

kunalspathak Oct 18, 2021

jakobbotsch Oct 18, 2021

kunalspathak Oct 18, 2021

jakobbotsch Oct 18, 2021

kunalspathak Oct 18, 2021

jakobbotsch Oct 18, 2021

kunalspathak Oct 18, 2021

jakobbotsch Oct 18, 2021

jakobbotsch Oct 18, 2021

kunalspathak Oct 18, 2021

kunalspathak Oct 18, 2021

kunalspathak Oct 18, 2021

jakobbotsch Oct 18, 2021

MattGal commented Oct 18, 2021

kunalspathak commented Oct 18, 2021

jakobbotsch commented Oct 18, 2021

azure-pipelines bot commented Oct 18, 2021

kunalspathak left a comment

jakobbotsch commented Oct 18, 2021

MattGal commented Oct 18, 2021

	Antigen also sets various `COMPlus_*` variables in its test mode to turn off different stress modes.
	Antigen also sets various `COMPlus_*` variables in its test mode to turn on different stress modes or turn on/off different optimizations.


		## Getting test examples from Antigen runs

		For Antigen runs the summary will show the assertion errors that were hit.

	For Antigen runs the summary will show the assertion errors that were hit.
	For Antigen runs, the summary will show the assertion errors that were hit.

Add Fuzzlyn to CI #60344

Add Fuzzlyn to CI #60344

Conversation

jakobbotsch commented Oct 13, 2021 • edited Loading

ghost commented Oct 13, 2021

jakobbotsch commented Oct 15, 2021

azure-pipelines bot commented Oct 15, 2021

jakobbotsch commented Oct 18, 2021

jakobbotsch commented Oct 18, 2021

azure-pipelines bot commented Oct 18, 2021

jakobbotsch commented Oct 18, 2021 • edited Loading

kunalspathak commented Oct 18, 2021

kunalspathak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MattGal commented Oct 18, 2021

kunalspathak commented Oct 18, 2021

jakobbotsch commented Oct 18, 2021

azure-pipelines bot commented Oct 18, 2021

kunalspathak left a comment

Choose a reason for hiding this comment

jakobbotsch commented Oct 18, 2021

MattGal commented Oct 18, 2021

jakobbotsch commented Oct 13, 2021 •

edited

Loading

jakobbotsch commented Oct 18, 2021 •

edited

Loading