Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NTP zone failed to start due to U2NotFound error #5502

Closed
askfongjojo opened this issue Apr 10, 2024 · 1 comment
Closed

NTP zone failed to start due to U2NotFound error #5502

askfongjojo opened this issue Apr 10, 2024 · 1 comment
Milestone

Comments

@askfongjojo
Copy link

askfongjojo commented Apr 10, 2024

This happened to a sled that was added to the cluster on rack2.

Steps taken before hitting the error:

  1. Mupdated sled 7 (BRM27230045) to the same SP/RoT/OS version as the cluster's currently running software.
  2. From switch zone, executed omdb --destructive nexus sleds add BRM27230045 913-0000019.
  3. Confirmed that after step 2, sled 7 was no longer listed in oxide system hardware sled list-uninitialized and started to show up in oxide system hardware sled list.
  4. Regenerated blueprint and set the new one as target (see the bottom of the description for the detailed output)
  5. Checked sled-agent log file on sled 7 and saw that the ntp zone was unable to start due to an U2NotFound error:
00:46:40.868Z WARN SledAgent (ServiceManager): Zone failed to start
    file = sled-agent/src/services.rs:3001
    zone = oxz_ntp_30bfad3a-fab5-4589-a11d-63f1ea076cdb
00:46:40.868Z INFO SledAgent (dropshot (SledAgent)): request completed
    error_message_external = Internal Server Error
    error_message_internal = Failed to initialize zones: [("oxz_ntp_30bfad3a-fab5-4589-a11d-63f1ea076cdb", U2NotFound)]
    file = /home/build/.cargo/git/checkouts/dropshot-a4a923d29dccc492/29ae98d/dropshot/src/server.rs:837
    latency_us = 19705
    local_addr = [fd00:1122:3344:123::1]:12345
    method = PUT
    remote_addr = [fd00:1122:3344:102::3]:44382
    req_id = 085ac5d0-dddf-4833-8356-76c68d20027a
    response_code = 500
    uri = /omicron-zones

Blueprint commands ran and their output:

root@oxz_switch1:~# omdb nexus blueprints list
T ENA ID                                   PARENT TIME_CREATED             
* no  95c3f06b-4dbf-4614-ae7c-507c1193bde9 <none> 2024-03-22T18:37:17.291Z 
root@oxz_switch1:~# omdb -w nexus blueprints regenerate
generated new blueprint 0ff40c05-188e-4690-ab15-a63d737d550f
root@oxz_switch1:~# omdb nexus blueprints list
T ENA ID                                   PARENT                               TIME_CREATED             
      0ff40c05-188e-4690-ab15-a63d737d550f 95c3f06b-4dbf-4614-ae7c-507c1193bde9 2024-04-10T21:08:55.829Z 
* no  95c3f06b-4dbf-4614-ae7c-507c1193bde9 <none>                               2024-03-22T18:37:17.291Z 
root@oxz_switch1:~# omdb nexus blueprints diff 95c3f06b-4dbf-4614-ae7c-507c1193bde9 0ff40c05-188e-4690-ab15-a63d737d550f
from: blueprint 95c3f06b-4dbf-4614-ae7c-507c1193bde9
to:   blueprint 0ff40c05-188e-4690-ab15-a63d737d550f

  ---------------------------------------------------------------------------------------------------------
     zone type         zone ID                                disposition   underlay IP             status 
  ---------------------------------------------------------------------------------------------------------
                                                                                                           
  UNCHANGED SLEDS:                                                                                         
                                                                                                           
   sled 0c7011f7-a4bf-4daf-90cc-1c2410103300: zones at generation 2                                        
     boundary_ntp      c3ec3d1a-3172-4d36-bfd3-f54a04d5ba55   in service    fd00:1122:3344:104::e          
     crucible          167cf6a2-ec51-4de2-bc6c-7785bbc0e436   in service    fd00:1122:3344:104::c          
     crucible          23e1cf01-70ab-422f-997b-6216158965c3   in service    fd00:1122:3344:104::8          
     crucible          50209816-89fb-48ed-9595-16899d114844   in service    fd00:1122:3344:104::6          
     crucible          650f5da7-86a0-4ade-af0f-bc96e021ded0   in service    fd00:1122:3344:104::5          
     crucible          7ce9a2c5-2d37-4188-b7b5-a9db819396c3   in service    fd00:1122:3344:104::d          
     crucible          8bc0f29e-0c20-437e-b8ca-7b9844acda22   in service    fd00:1122:3344:104::7          
     crucible          8d202759-ca06-4383-b50f-7f3ec4062bf7   in service    fd00:1122:3344:104::4          
     crucible          a76b3357-b690-43b8-8352-3300568ffc2b   in service    fd00:1122:3344:104::a          
     crucible          c6fde82d-8dae-4ef0-b557-6c3d094d9454   in service    fd00:1122:3344:104::9          
     crucible          fcdda266-fc6a-4518-89db-aec007a4b682   in service    fd00:1122:3344:104::b          
     internal_dns      51c9ad09-7814-4643-8ad4-689ccbe53fbd   in service    fd00:1122:3344:1::1            
     nexus             20b100d0-84c3-4119-aa9b-0c632b0b6a3a   in service    fd00:1122:3344:104::3   
...
   sled f15774c1-b8e5-434f-a493-ec43f96cba06: zones at generation 2                                        
     cockroach_db      4c3ef132-ec83-4b1b-9574-7c7d3035f9e9   in service    fd00:1122:3344:105::3          
     crucible          23dca27d-c79b-4930-a817-392e8aeaa4c1   in service    fd00:1122:3344:105::e          
     crucible          3d420dff-c616-4c7d-bab1-0f9c2b5396bf   in service    fd00:1122:3344:105::a          
     crucible          912346a2-d7e6-427e-b373-e8dcbe4fcea9   in service    fd00:1122:3344:105::5          
     crucible          92d3e4e9-0768-4772-83c1-23cce52190e9   in service    fd00:1122:3344:105::6          
     crucible          9470ea7d-1920-4b4b-8fca-e7659a1ef733   in service    fd00:1122:3344:105::c          
     crucible          9c5d88c9-8ff1-4f23-9438-7b81322eaf68   in service    fd00:1122:3344:105::b          
     crucible          b3e9fee2-24d2-44e7-8539-a6918e85cf2b   in service    fd00:1122:3344:105::d          
     crucible          ce8563f3-4a93-45ff-b727-cbfbee6aa413   in service    fd00:1122:3344:105::9          
     crucible          f9940969-b0e8-4e8c-86c7-4bc49cd15a5f   in service    fd00:1122:3344:105::7          
     crucible          f9c1deca-1898-429e-8c93-254c7aa7bae6   in service    fd00:1122:3344:105::8          
     crucible_pantry   375296e5-0a23-466c-b605-4204080f8103   in service    fd00:1122:3344:105::4          
     internal_ntp      76b79b96-eaa2-4341-9aba-e77cfc92e0a9   in service    fd00:1122:3344:105::f          
                                                                                                           
  ADDED SLEDS:                                                                                             
                                                                                                           
+  sled bd96ef7c-4941-4729-b6f7-5f47feecbc4b: zones at generation 2                                        
+    internal_ntp      30bfad3a-fab5-4589-a11d-63f1ea076cdb   in service    fd00:1122:3344:123::21  added  

  METADATA:
    internal DNS version:  1 (unchanged) 
    external DNS version:  25 (unchanged)
root@oxz_switch1:~# omdb -w nexus blueprints target set 0ff40c05-188e-4690-ab15-a63d737d550f enabled
set target blueprint to 0ff40c05-188e-4690-ab15-a63d737d550f
@askfongjojo askfongjojo added this to the 8 milestone Apr 10, 2024
smklein added a commit that referenced this issue Apr 14, 2024
… via blueprints (#5506)

Automatically adopt disks and deploy them to sleds using blueprints

- Queries for physical disk info during reconfigurator planning phase
- Adds "physical disks" to blueprint, in-memory as well as the database
schema
- Blueprint planning now ensures that in-service physical disks appear
in the blueprint
- Blueprint execution sends a request to sled agents via
`omicron-physical-disks PUT`
- A background task has been added to automatically adopt new physical
disks as control plane objects, and to insert them into the database
- "Physical disk upsert" has largely been changed to "Physical disk
insert", to avoid potential overwriting issues. "Zpool upsert" has also
been updated to "Zpool insert".
- The physical disk "vendor/serial/model" uniqueness constraint has been
removed for decommissioned disks. This will provide a pathway to
eventually re-provisioning deleted disks, if an operator asks for it.

Fixes #5503 ,
#5502
@davepacheco
Copy link
Collaborator

I believe based on the descriptions in #5503 and #5506 that this issue has been fixed by #5506.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants