Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After enabling a channel, it get's disabled after a few seconds #662

Closed
echo-devnull opened this issue Jul 26, 2023 · 4 comments
Closed

Comments

@echo-devnull
Copy link

echo-devnull commented Jul 26, 2023

Description

After enabling a channel, a few seconds later the channel get's "disabled" without really giving me any hint as to why.

The logs are also not very clear to me. (see attached)

Impact

This means flatcar nodes do not get their upgrade fix ;-)

Environment and steps to reproduce

This is nebraska 2.7.0 installed in Kubernetes using the helm chart.

Expected behavior

That the channel stay's enabled and hands out upgrades

@pothos
Copy link
Member

pothos commented Jul 26, 2023

Without looking at the logs I think that sounds like "safe mode" being enabled. See this comment: #427 (comment)

@echo-devnull
Copy link
Author

echo-devnull commented Jul 27, 2023

Ah yes, safe mode was indeed stopping further deployments.. Probably because all nodes are staying in "on hold" mode... So that's my next challenge. Figuring out why they are in on hold mode even though the machine itself say:
7
(One of the machines that are "on hold" according to nebraska. but not giving me a reason for the hold)

I0727 14:50:44.829499 4087288 update_engine_client.cc:241] Querying Update Engine status...
LAST_CHECKED_TIME=1690360753
PROGRESS=0.000000
CURRENT_OP=UPDATE_STATUS_IDLE
NEW_VERSION=0.0.0
NEW_SIZE=0

And the update-engine log is reporting omaha error:

Jul 27 14:38:48 lgs-lnx7974.<snipped> update_engine[942]: I0727 14:38:48.880442   942 libcurl_http_fetcher.cc:263] HTTP response code: 200
Jul 27 14:38:48 lgs-lnx7974.<snipped> update_engine[942]: I0727 14:38:48.881340   942 libcurl_http_fetcher.cc:320] Transfer completed (200), 231 bytes downloaded
Jul 27 14:38:48 lgs-lnx7974.<snipped> update_engine[942]: I0727 14:38:48.881354   942 omaha_request_action.cc:619] Omaha request response: <?xml version="1.0" encoding="UTF-8"?>
Jul 27 14:38:48 lgs-lnx7974.<snipped> update_engine[942]: <response protocol="3.0" server="nebraska"><daystart elapsed_seconds="0"></daystart><app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" status="ok"><event status="ok"></event></app></response>
Jul 27 14:38:48 lgs-lnx7974.<snipped> update_engine[942]: E0727 14:38:48.881364   942 omaha_request_action.cc:626] HTTP reported success but Omaha reports an error.
Jul 27 14:38:48 lgs-lnx7974.<snipped> update_engine[942]: I0727 14:38:48.881368   942 action_processor.cc:65] ActionProcessor::ActionComplete: finished last action of type OmahaRequestAction
Jul 27 14:38:48 lgs-lnx7974.<snipped> update_engine[942]: I0727 14:38:48.881371   942 action_processor.cc:73] ActionProcessor::ActionComplete: finished last action of type OmahaRequestAction
Jul 27 14:38:48 lgs-lnx7974.<snipped> update_engine[942]: I0727 14:38:48.881373   942 update_attempter.cc:302] Processing Done.
Jul 27 14:38:48 lgs-lnx7974.<snipped> update_engine[942]: I0727 14:38:48.881377   942 update_attempter.cc:306] Error event sent.
Jul 27 14:38:48 lgs-lnx7974.<snipped> update_engine[942]: I0727 14:38:48.881383   942 update_check_scheduler.cc:74] Next update check in 43m52s

So this part:
HTTP reported success but Omaha reports an error. is probably the server returning an error?

Server logs:

<nil> DBG Omaha trace XML="<request protocol=\"3.0\" installsource=\"scheduler\" ismachine=\"1\" version=\"update_engine-0.4.10\" updaterversion=\"update_engine-0.4.10\">\n <os platform=\"CoreOS\" version=\"Chateau\" sp=\"3510.2.1_x86_64\"></os>\n <app appid=\"{e96281a6-d1af-4bde-9a0a-97b76e56dc57}\" lang=\"en-US\" version=\"3510.2.1\" board=\"amd64-usr\" track=\"production\" alephversion=\"3374.2.3\" bootid=\"{6c37b0a0-5017-47f9-9291-37de22e6ceda}\" machinealias=\"<snipped>-az2-lgs-lnx7404\" machineid=\"8a2a1702625c4d58995fb7ef26601cd3\" oem=\"vmware\" oemversion=\"12.1.0\">\n  <ping active=\"1\"></ping>\n  <updatecheck></updatecheck>\n  <event eventtype=\"3\" eventresult=\"2\"></event>\n </app>\n</request>" context=omaha
<nil> INF processEvent eventError 0 appID=e96281a6-d1af-4bde-9a0a-97b76e56dc57 context=omaha event="update complete.success reboot" group=20ef7bc0-1be2-4f9a-9251-af83afe5bd16 machineId=8a2a1702625c4d58995fb7ef26601cd3 previousVersion=
<nil> DBG processEvent error nebraska: no update in progress context=omaha machineId=8a2a1702625c4d58995fb7ef26601cd3
<nil> DBG Omaha trace XML="<response protocol=\"3.0\" server=\"nebraska\">\n <daystart elapsed_seconds=\"0\"></daystart>\n <app appid=\"{e96281a6-d1af-4bde-9a0a-97b76e56dc57}\" status=\"error-updatesDisabled\">\n  <ping status=\"ok\"></ping>\n  <updatecheck status=\"error-internal\">\n   <urls></urls>\n  </updatecheck>\n  <event status=\"ok\"></event>\n </app>\n</response>" context=omaha
{"time":"2023-07-27T13:50:19.426204293Z","id":"92be13fa1b76f6de1aeb70ab247440f4","remote_ip":"10.31.115.62","host":"nebraska-az2.<snipped>","method":"POST","uri":"/v1/update","user_agent":"","status":200,"error":"","latency":25536264,"latency_human":"25.536264ms","bytes_in":781,"bytes_out":339}
<nil> DBG Omaha trace XML="<request protocol=\"3.0\" installsource=\"scheduler\" ismachine=\"1\" version=\"update_engine-0.4.10\" updaterversion=\"update_engine-0.4.10\">\n <os platform=\"CoreOS\" version=\"Chateau\" sp=\"3510.2.1_x86_64\"></os>\n <app appid=\"{e96281a6-d1af-4bde-9a0a-97b76e56dc57}\" lang=\"en-US\" version=\"3510.2.1\" board=\"amd64-usr\" track=\"production\" alephversion=\"3374.2.3\" bootid=\"{6c37b0a0-5017-47f9-9291-37de22e6ceda}\" machinealias=\"<snipped>-az2-lgs-lnx7404\" machineid=\"8a2a1702625c4d58995fb7ef26601cd3\" oem=\"vmware\" oemversion=\"12.1.0\">\n  <event eventtype=\"3\" eventresult=\"0\" errorcode=\"268435490\"></event>\n </app>\n</request>" context=omaha
<nil> INF processEvent eventError 268435490 appID=e96281a6-d1af-4bde-9a0a-97b76e56dc57 context=omaha event="update complete.error" group=20ef7bc0-1be2-4f9a-9251-af83afe5bd16 machineId=8a2a1702625c4d58995fb7ef26601cd3 previousVersion=
<nil> DBG processEvent error nebraska: no update in progress context=omaha machineId=8a2a1702625c4d58995fb7ef26601cd3
{"time":"2023-07-27T13:50:20.401133283Z","id":"c3f8ad44258cc1bce7142932f3e5c56a","remote_ip":"10.31.115.62","host":"nebraska-az2.<snipped>","method":"POST","uri":"/v1/update","user_agent":"","status":200,"error":"","latency":4511502,"latency_human":"4.511502ms","bytes_in":715,"bytes_out":231}
<nil> DBG Omaha trace XML="<response protocol=\"3.0\" server=\"nebraska\">\n <daystart elapsed_seconds=\"0\"></daystart>\n <app appid=\"{e96281a6-d1af-4bde-9a0a-97b76e56dc57}\" status=\"ok\">\n  <event status=\"ok\"></event>\n </app>\n</response>" context=omaha
^C

I can't really see a clear reason for the errors...

@echo-devnull
Copy link
Author

Well, this morning everything started upgrading again. I did nothing to make it that way.. soooo #magic ;-)

@pothos
Copy link
Member

pothos commented Jul 28, 2023

In the instance view of Nebraska you should see the events that led to this but it requires finding the right instance.
With safe mode disabled it should continue, otherwise after a day (or the configured interval?) of waiting it should also work again with safe mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants