-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crossgen-comparison Linux arm checked fails with timeouts #1282
Comments
Is this a regression, or has it always failed in |
I saw it green a few times, for example here https://dev.azure.com/dnceng/public/_build/results?buildId=456520&view=logs&j=e14ed261-6507-56f9-47b6-1ad25f52112f. |
Generally this is failure happens due to us running at capacity for the helix queue. With the amount of machines that were added last month, there is most likely something that has changed with what jobs are submitted and from where to cause this, or there are a lot of machine that have gone offline. |
This has not always failed, although the amount of jobs submitted has probably increased this month over last month. |
Thanks, @jashook . We can make a quick surgical fix and stop running everything twice (under runtime and under runtime-coreclr pipelines, delete duplicates from https://github.com/dotnet/runtime/blob/master/eng/pipelines/runtime-official.yml and https://github.com/dotnet/runtime/blame/master/eng/pipelines/coreclr/pr.yml) or wait for a few weeks when @safern replaces existing pipelines. |
Seems like the wait times are not too high in the queue, if I remember right these jobs ran close to the timeout. I would suggest upping the timeout by 15 to 30 minutes. |
I should be removing duplication this week. Hope to have a PR before Wednesday. |
Disable crossgen comparison runs that systematically fail in CoreCLR outerloop runs. Tracking issue: dotnet#1282 Thanks Tomas
Disable crossgen comparison runs that systematically fail in CoreCLR outerloop runs. Tracking issue: #1282 Thanks Tomas
I just merged: #1473 which removes the duplication of pipelines and it will now only be ran once per PR and once on CI. Will leave this open to monitor the outcome after this change and react if needed. |
This is presently impacting about 9% of our builds
|
@jashook You thought before that this was failing due to overloaded Linux arm hardware, leading to the timeouts. Do you still believe that? |
Yes, I think there is more pressure on the hardware with mono running arm now. |
Removing blocking label now that mitigations are in place. |
Is this issue still actionable now that the mitigation is in place? Should we open another issue or link to an existing one which tracks the ARM hardware capacity issue? |
@echesakovMSFT I noticed that the crossgen-comparison job is still commented out in ci.yml, so it's not running in the outerloop job. Should we fix that? |
@BruceForstall I think it is okay. We still have crossgen-comparison job triggered in PRs (if they affect code under src/coreclr). One of the latest runs is green - https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-48830-merge-9b43a99dd41241aaa5/WorkItem/console.42781747.log?sv=2019-07-07&se=2021-03-23T08%3A59%3A24Z&sr=c&sp=rl&sig=Eu7gjhAYzP2ozwKcOMC%2FsAzV9caUqAe2qehEdfwvEjM%3D In addition to that, @davidwrighton added crossgen2-outerloop pipeline that I believe runs similar comparison jobs but with crossgen2. In theory, the changes that break JIT cross-bitness, cross-architecture or cross-os compatibility should be caught by these jobs. In reality, these jobs have been red for a while (e.g. https://dev.azure.com/dnceng/public/_build?definitionId=701). I opened #49077 to track the issue. |
@echesakovMSFT I intend to get around to fixing some of those crossgen2 comparision issues soon. |
@davidwrighton Thank you, David! |
It happens in all runs, the timeout is 60 minutes. We have not seen that in coreclr.
cc @echesakovMSFT @BruceForstall
Log example:
https://dev.azure.com/dnceng/public/_build/results?buildId=471248
The text was updated successfully, but these errors were encountered: