fix(Jenkinsfile): Fix jenkins pipline for CI #67

MrKevinWeiss · 2020-05-15T11:44:16Z

Contribution Description

It appears that the current stash and unstash has some issues with asynchronous behaviour. For example an unstash occurs on a node in a working directory that is different than what is expected when running a test. It also appears that some directories are not being cleaned.

The following PR makes a number of changes to fix that:

makes jenkinsfile declarative as this is better supported
run all tests on node before releasing to fix any shared workspace problems
Cleanup function names and steps to make it more readable
Add timeouts to overall process of 1 hour and per node at 45 mins
Handle errors if unstash fails only stop the node

Testing Procedure

Check the CI, I don't know how that params will effect everything but it is better than now and we can always fix later if an issue occurs

Related Issues

Checks some boxes on #66

MrKevinWeiss · 2020-05-15T11:44:45Z

I think I need to add the params and get rid of the RIOT submodule change.

cgundogan · 2020-05-15T12:07:43Z

Jenkinsfile

+                stash name: 'sources'
+                script {
+                    for (i = 0; i < nodes.size(); i++) {
+                        echo "${nodes[i]}"


is this script block a left-over from debugging, or do you want to keep it for informational purposes?

Debugging, thanks!

MrKevinWeiss · 2020-05-15T12:09:50Z

Also it seems like I am not getting the notifications... I don't know why... yet!

MrKevinWeiss · 2020-05-15T12:20:14Z

I am wondering if the overall timeout is a good idea as it may cause some problems if there are many jobs in the queue since the nodes can be blocked for a long time but the master ticker may still be going...

cgundogan · 2020-05-15T12:27:46Z

I am wondering if the overall timeout is a good idea

I guess having that global timeout throughout the job lifetime is good. It's very unlikely, but if we observe hangs in the setup or notification phase (or future stages), then the global timeout seems to be our only rescue? Of course, we could also wrap each of that stage in a local timeout .. but the global one is more convenient.

MrKevinWeiss · 2020-05-15T12:32:22Z

I guess having that global timeout throughout the job lifetime is good.

Good, the problem is the timing, if I start 10 jobs at once the last one would have to wait for the nodes to be complete, meaning my timeout would need to be some function of running jobs or something (it would take at least 3 hours to run through 10 jobs).

Anyways currently it is set at 1 hour, I think that is fine if we don't have to wait for other jobs to finish with the node but that currently is not the case. What would be a good balance?

MrKevinWeiss · 2020-05-15T12:35:10Z

Darn also it seems like the catching of the errors prevents timeouts and aborts. Maybe for the time I increase everything to something that should work and we can tune later once I figure out how to capture error types (ie a timeout occured or a stop message occured).

MrKevinWeiss · 2020-05-15T12:39:11Z

Oh man... the timeout actually seems not too nice..

MrKevinWeiss · 2020-05-15T18:12:51Z

Maybe it is ready. Still could use some work but there was at least one case where the timeouts and exiting worked out well. It would be nice to get this in by the end of the day.

cgundogan · 2020-05-15T19:42:32Z

Jenkinsfile

-])
+def runParallel(args) {
+    parallel args.items.collectEntries { name -> [ "${name}": {
+        // We want to timeout of a node doesn't respond in 15 mins


cgundogan · 2020-05-15T19:43:18Z

Jenkinsfile

+                        stepFlash(tests[i])
+                        stepTest(tests[i])
+                        stepArchiveTestResults(tests[i])
+                    } catch (org.jenkinsci.plugins.workflow.steps.FlowInterruptedException e) {


why this particular exception?

It is the timeout or abort exception. Without it a timeout will only cancel one test on a node.

cgundogan · 2020-05-15T19:46:18Z

Jenkinsfile

+                }
+                if (caughtException) {
+                    // This should exit out of the node that failed
+                    error caughtException.message


Why don't we move this line into the catch statement? The surrounding if seems to be a bit verbose .. (below, too)

Then it gets caught by the catcherror which sets build status and stage status. This is what is required to exit out.

I am willing to say there is a better way than using the catcherror call though. I haven't tested.

cgundogan · 2020-05-15T19:48:58Z

Is there a test run that I can look at? Back at Jenkins I couldn't find any

MrKevinWeiss · 2020-05-15T21:14:58Z

Looks like there is still some work. Darn. Will take care of tomorrow.

MrKevinWeiss · 2020-05-18T09:43:43Z

I tried to simplify the catchError command since I need to use try catch anyways. The problem is now I cannot see failures in the stages. The catchError allowed me to set buildResult and stageResult but it appears I don't have that control with the currentBuild global variable. I guess I am really struggling with the documentation on what I have access to.

Should I just call it quits and have a catchError with a try catch that allows me to throw the caught error while outside the catchError context or can we accept that things look like they are passing when they are not (we still get correct test results) or should I continue to search for a way where I can try catch and only fail that stage?

For some reason the robot-test fail case seems to function properly as the unstable setting is showing up.

It appears that the current stash and unstash has some issues with asynchronous behaviour. For example an unstash occurs on a node in a working directory that is different than what is expected when running a test. It also appears that some directories are not being cleaned. The following commit makes a number of changes to fix that: - makes jenkinsfile declarative as this is better supported - run all tests on node before releasing to fix any shared workspace problems - Cleanup function names and steps to make it more readable - Add timeouts to overall process and timeout per node after it starts - Handle errors if unstash fails only stop the node - Allow a timeout/stop to exit the whole set of tests

MrKevinWeiss · 2020-05-18T15:12:02Z

I confirmed the node timeout only starts ticking after the node is acquired. I set it to 1 hour and the whole process to 3 hours.

There are still some strange things happening when we try to stop and it is trying to change states but it just requires an additional stop and it seems fine. I think we can leave it for now as we have yet to get too many lockup problems.

cgundogan

This rework greatly improves the pipeline design and reduces the overall build time. We can address the remaining minor irks and quirks in follow-up PRs to keep the diff minimal. ACK!

MrKevinWeiss · 2020-05-18T19:41:40Z

Thanks for all the help!

MrKevinWeiss added the enhancement New feature or request label May 15, 2020

MrKevinWeiss requested a review from cgundogan May 15, 2020 11:44

MrKevinWeiss self-assigned this May 15, 2020

cgundogan reviewed May 15, 2020

View reviewed changes

MrKevinWeiss force-pushed the test/jenkinsfile0 branch from 2ed734d to 3196db6 Compare May 15, 2020 18:11

cgundogan reviewed May 15, 2020

View reviewed changes

MrKevinWeiss force-pushed the test/jenkinsfile0 branch from eacc5f9 to a72ab97 Compare May 18, 2020 15:10

cgundogan approved these changes May 18, 2020

View reviewed changes

cgundogan merged commit 563ec26 into master May 18, 2020

MrKevinWeiss deleted the test/jenkinsfile0 branch May 18, 2020 19:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(Jenkinsfile): Fix jenkins pipline for CI #67

fix(Jenkinsfile): Fix jenkins pipline for CI #67

MrKevinWeiss commented May 15, 2020

MrKevinWeiss commented May 15, 2020

cgundogan May 15, 2020

MrKevinWeiss May 15, 2020

MrKevinWeiss commented May 15, 2020

MrKevinWeiss commented May 15, 2020

cgundogan commented May 15, 2020

MrKevinWeiss commented May 15, 2020

MrKevinWeiss commented May 15, 2020

MrKevinWeiss commented May 15, 2020

MrKevinWeiss commented May 15, 2020

cgundogan May 15, 2020

cgundogan May 15, 2020

MrKevinWeiss May 15, 2020

cgundogan May 15, 2020

MrKevinWeiss May 15, 2020

cgundogan commented May 15, 2020

MrKevinWeiss commented May 15, 2020

MrKevinWeiss commented May 18, 2020

MrKevinWeiss commented May 18, 2020

cgundogan left a comment

MrKevinWeiss commented May 18, 2020

fix(Jenkinsfile): Fix jenkins pipline for CI #67

fix(Jenkinsfile): Fix jenkins pipline for CI #67

Conversation

MrKevinWeiss commented May 15, 2020

Contribution Description

Testing Procedure

Related Issues

MrKevinWeiss commented May 15, 2020

cgundogan May 15, 2020

Choose a reason for hiding this comment

MrKevinWeiss May 15, 2020

Choose a reason for hiding this comment

MrKevinWeiss commented May 15, 2020

MrKevinWeiss commented May 15, 2020

cgundogan commented May 15, 2020

MrKevinWeiss commented May 15, 2020

MrKevinWeiss commented May 15, 2020

MrKevinWeiss commented May 15, 2020

MrKevinWeiss commented May 15, 2020

cgundogan May 15, 2020

Choose a reason for hiding this comment

cgundogan May 15, 2020

Choose a reason for hiding this comment

MrKevinWeiss May 15, 2020

Choose a reason for hiding this comment

cgundogan May 15, 2020

Choose a reason for hiding this comment

MrKevinWeiss May 15, 2020

Choose a reason for hiding this comment

cgundogan commented May 15, 2020

MrKevinWeiss commented May 15, 2020

MrKevinWeiss commented May 18, 2020

MrKevinWeiss commented May 18, 2020

cgundogan left a comment

Choose a reason for hiding this comment

MrKevinWeiss commented May 18, 2020