Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release rockpi-4b-rk3399 to production with only automated tests #129

Open
jellyfish-bot opened this issue May 31, 2022 · 43 comments
Open

Comments

@jellyfish-bot
Copy link

[klutchell] undefined

@jellyfish-bot
Copy link
Author

jellyfish-bot commented Sep 13, 2022

@acostach sorry for all the pings, but this week I'm trying to go through each of the devices and see whats blocking them -

I've just tried to run tests on the rockpi testbot, and the DUT supposedly isn't powering on:

Sep 13 08:20:13 2a42142 f2f7283b67c2[1604]: Booting DUT with the balenaOS flasher image
Sep 13 08:20:13 2a42142 f2f7283b67c2[1604]: waiting for DUT to be on
Sep 13 08:20:13 2a42142 f2f7283b67c2[1604]: 
Sep 13 08:20:13 2a42142 f2f7283b67c2[1604]: DUT is currently Off
Sep 13 08:20:18 2a42142 f2f7283b67c2[1604]: waiting for DUT to be on
Sep 13 08:20:18 2a42142 f2f7283b67c2[1604]: 
Sep 13 08:20:18 2a42142 f2f7283b67c2[1604]: DUT is currently Off
Sep 13 08:20:23 2a42142 f2f7283b67c2[1604]: waiting for DUT to be on
Sep 13 08:20:23 2a42142 f2f7283b67c2[1604]: 
Sep 13 08:20:23 2a42142 f2f7283b67c2[1604]: DUT is currently Off
Sep 13 08:20:28 2a42142 f2f7283b67c2[1604]: waiting for DUT to be on
Sep 13 08:20:28 2a42142 f2f7283b67c2[1604]: 
Sep 13 08:20:28 2a42142 f2f7283b67c2[1604]: DUT is currently Off
Sep 13 08:20:33 2a42142 f2f7283b67c2[1604]: waiting for DUT to be on

This is either because:

  • the DUT isn't actually powering on - either power from testbot isn't plugged in. I can confirm testbot is providing 5V from teh voltage regulator so could just be a loose connection somewhere
  • The DUT is powering on - but the testbot isn't detecting that via the ethernet signal - is ethernet plugged into testbot via the USB adapter?

@jellyfish-bot
Copy link
Author

[acostach] @rcooke-warwick checking now, the device has the green light on, which means it's powered, but the ethernet LEDs are off, so it's not booting. could be due to the mux/sd-card. Can I remove it from the testbot and try boot it with the already flashed sd-card in the mux?

@jellyfish-bot
Copy link
Author

jellyfish-bot commented Sep 13, 2022

@rcooke-warwick looks like it might be because of the voltage, I will make a PR to increase it from 5 to 12. On Radxa website they say that it works with 5V but may cause stability issues once the load rises, and this is what I see locally now, it powers off during flashing with 5V but not with 12V.

@jellyfish-bot
Copy link
Author

jellyfish-bot commented Sep 13, 2022

nice find @acostach sounds like a good idea to increase it

@jellyfish-bot
Copy link
Author

[acostach] done, I will merge balena-io-hardware/testbot-sdk-sw#48 once checks pass and then update leviathan worker, after that we can test again.

@jellyfish-bot
Copy link
Author

jellyfish-bot commented Sep 13, 2022

@rcooke-warwick I updated the testbotsdk to increase the voltage and also leviathan-worker, where I merged balena-os/leviathan-worker#26

@jellyfish-bot
Copy link
Author

[acostach] 1) Is answered, the rig app updated already

@jellyfish-bot
Copy link
Author

jellyfish-bot commented Sep 13, 2022

@acostach nice one, the rig did update a while back and I retried the rockpi job, its flashing at the moment, will report back here with the result

@jellyfish-bot
Copy link
Author

jellyfish-bot commented Sep 13, 2022

@acostach update, unfortunately it hasn't worked - 12v is coming out of the tesbot but we're still getting: Sep 13 11:03:40 2a42142 eda46eb94756[1604]: DUT is currently Off Sep 13 11:03:45 2a42142 eda46eb94756[1604]: waiting for DUT to be on

@jellyfish-bot
Copy link
Author

[rcooke-warwick] now I'm wondering if there's something going wrong with the detection of if the DUT is on/off ...

@jellyfish-bot
Copy link
Author

[rcooke-warwick] does the rockpi have ethernet?

@jellyfish-bot
Copy link
Author

[rcooke-warwick] (I remember the rockpi flashing has worked before, but maybe I'm remembering wrong)

@jellyfish-bot
Copy link
Author

jellyfish-bot commented Sep 13, 2022

@rcooke-warwick I plugged and unplugged the cable, it should flash the device once again

@jellyfish-bot
Copy link
Author

[acostach] I recall we did run tests on this DT before and they were running, provisioning worked

@jellyfish-bot
Copy link
Author

[acostach] it's not powering off, is the test running normally?

@jellyfish-bot
Copy link
Author

[rcooke-warwick] which cable did you unplug/replug?

@jellyfish-bot
Copy link
Author

[acostach] ethernet and usbc

@jellyfish-bot
Copy link
Author

jellyfish-bot commented Sep 13, 2022

usb-c is the power cable

@jellyfish-bot
Copy link
Author

[rcooke-warwick] hmm its just staying "on" now

@balena-os balena-os deleted a comment from jellyfish-bot Sep 13, 2022
@jellyfish-bot
Copy link
Author

jellyfish-bot commented Sep 13, 2022

@acostach yep, the device names cross wires - I did realize and eventually created and linked a new ticket. Sorry for the noise.

@jellyfish-bot
Copy link
Author

jellyfish-bot commented Sep 13, 2022

want me to plug and unplug the power cable @rcooke-warwick ? That would trigger the re-flashing IF the sd-card is switched to DUT

@jellyfish-bot
Copy link
Author

jellyfish-bot commented Sep 13, 2022

I already turned the DUT off then on again to try to achieve that @acostach

@jellyfish-bot
Copy link
Author

[rcooke-warwick] the DUT remained to stay on forever - so for some reason it isn't internally flashing the DUT

@jellyfish-bot
Copy link
Author

[rcooke-warwick] the device is currently in this state, the test job is still running

@jellyfish-bot
Copy link
Author

[rcooke-warwick] retrying with fresh slate

@jellyfish-bot
Copy link
Author

[acostach] ok. if it still doesn't work let me know and I'll hook up the serial cable from my PC to the device and kick the suite, see where it hangs

@jellyfish-bot
Copy link
Author

jellyfish-bot commented Sep 13, 2022

@acostach rockpi seems to be flashing now: https://jenkins.product-os.io/job/leviathan-v2-template/4673/console - I've flashed it 3 times from a local test job in a row, and now this jenkins one is running - maybe there was just something loose that got fixed when you unplugged and replugged

@jellyfish-bot
Copy link
Author

jellyfish-bot commented Sep 13, 2022

very possible @rcooke-warwick , good thing it's working now, thanks for letting me know as I was just going to connect the serial and restart it

@balena-os balena-os deleted a comment from jellyfish-bot Sep 13, 2022
@balena-os balena-os deleted a comment from jellyfish-bot Sep 13, 2022
@balena-os balena-os deleted a comment from jellyfish-bot Sep 13, 2022
@jellyfish-bot
Copy link
Author

[rcooke-warwick] @acostach it has been consistently flashing every time last night and this morning. Now we move on to the problem of tests failing. First roadblock is the test here: https://github.com/balena-os/meta-balena/blob/master/tests/suites/os/tests/chrony/index.js#L157

Which has failed both times I've tried it. This test I;m not that familiar with, but here's what I get from it:

  • in this test, NTP is blocked by disabling the dnsmasq service. Chronyd is restarted.
  • Once chronyd comes back date --set="-2min is used to skew the time on the device. The test then parses the logs for NTP time lost synchronization - restarting chronyd - however this never appears in the logs, so the test fails

just running it again now to get the journal logs to see why that might be happening.

@jellyfish-bot
Copy link
Author

[rcooke-warwick] furstratingly, that test has now passed...

I think it is linked to this issue: balena-os/meta-balena#2758

From what I've seen, in the case of failure, chronyc is started with some sort of wrong permissions:

Sep 14 08:08:44 b0105db healthdog[5665]: 2022-09-14T08:08:44Z Wrong permissions on /run/chrony
Sep 14 08:08:44 b0105db healthdog[5665]: 2022-09-14T08:08:44Z Disabled command socket /run/chrony/chronyd.sock
Sep 14 08:08:44 b0105db healthdog[5665]: 2022-09-14T08:08:44Z Running with root privileges
Sep 14 08:08:44 b0105db healthdog[5665]: 2022-09-14T08:08:44Z Frequency 0.000 +/- 1000000.000 ppm read from /var/lib/chrony/drift
Sep 14 08:08:44 b0105db healthdog[5668]: [chrony-healthcheck][INFO] No online NTP sources - forcing poll
Sep 14 08:08:44 b0105db healthdog[5668]: [chrony-healthcheck][ERROR] Failed to trigger NTP sync

In the case of the test passing, I never see that message about a disabled command socket

cc @alexgg @jakogut

@jellyfish-bot
Copy link
Author

[rcooke-warwick] on a side note, does anyone know if the rockpi led flashing works? I saw this issue: #10 and also I checked supervisor.conf on the rockpi and it has LED_FILE=/dev/null#

@acostach
Copy link
Contributor

@rcooke-warwick the LED is not implemented for the radxa-zero nor rockpi4b so this test can be skipped

@jellyfish-bot
Copy link
Author

jellyfish-bot commented Sep 14, 2022

^ @acostach @floion added this finding to this issue ^ #10 -- this is currently causing the rockpi4b to fail the OS test suite. Does this device have an LED? The contract says it does

@jellyfish-bot
Copy link
Author

jellyfish-bot commented Sep 14, 2022

@rcooke-warwick do we have a mechanism to mark a test as not mandatory on a per DT basis?

@jellyfish-bot
Copy link
Author

jellyfish-bot commented Sep 14, 2022

@acostach the test runs because in the contract for the rockpi , it is set to LED: true - should I make the PR for the contract to set this to false?

@jellyfish-bot
Copy link
Author

jellyfish-bot commented Sep 14, 2022

@jellyfish-bot
Copy link
Author

jellyfish-bot commented Sep 14, 2022

I pushed the PR and it should merge soon @rcooke-warwick balena-io/contracts#326

@jellyfish-bot
Copy link
Author

jellyfish-bot commented Sep 14, 2022

done, it's merged in the contracts @rcooke-warwick

@jellyfish-bot
Copy link
Author

jellyfish-bot commented Sep 14, 2022

thanks @acostach I'll now bump contracts in leviathan - which will autobump in meta balena--- eventually it will reach the rockpi repo ;P

@jellyfish-bot
Copy link
Author

jellyfish-bot commented Sep 21, 2022

@acostach @alexgg looks like rockpi4b can now pass the entire test suite: https://jenkins.product-os.io/job/leviathan-v2-template/5140/

Although it looks like the tests won't run on balena-radxa PR's. I'll fix that and then technically we can use Alexes workflow to autodeploy for rockpi if tests pass

@jellyfish-bot
Copy link
Author

[rcooke-warwick] I can see you've added the rockpro64 into the rig - does this device successfully flash with the testbot?

@jellyfish-bot
Copy link
Author

[rcooke-warwick] was balena-radxa called balena-rockpi until recently?

@acostach
Copy link
Contributor

@rcooke-warwick yes, looks like it was renamed from balena-rockpi to balena-radxa
Regarding the rockpro64, it was added to the rig but it didn't get to the flashing step yet, the leviathan job stops during initialization https://jenkins.product-os.io/job/leviathan-v2-template/5148/console

Some more ethernet switches and cables have been ordered and are on their way here, currently the the RockPro64 in the rig is not connected via ethernet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants