networking background tasks emit lots of warnings in simulated environments (including test suite) #6076

davepacheco · 2024-07-12T23:17:10Z

It's easiest to see this in omicron-dev run-all:

$ cargo run --bin=omicron-dev -- run-all
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.50s
     Running `target/debug/omicron-dev run-all`
omicron-dev: setting up all services ... 
log file: /dangerzone/omicron_tmp/omicron-dev-omicron-dev.9781.0.log
note: configured to log to "/dangerzone/omicron_tmp/omicron-dev-omicron-dev.9781.0.log"
DB URL: postgresql://root@[::1]:43799/omicron?sslmode=disable
DB address: [::1]:43799
log file: /dangerzone/omicron_tmp/omicron-dev-omicron-dev.9781.2.log
note: configured to log to "/dangerzone/omicron_tmp/omicron-dev-omicron-dev.9781.2.log"
log file: /dangerzone/omicron_tmp/omicron-dev-omicron-dev.9781.3.log
note: configured to log to "/dangerzone/omicron_tmp/omicron-dev-omicron-dev.9781.3.log"
omicron-dev: services are running.
omicron-dev: nexus external API:    127.0.0.1:12220
omicron-dev: nexus internal API:    [::1]:12221
omicron-dev: cockroachdb pid:       9862
omicron-dev: cockroachdb URL:       postgresql://root@[::1]:43799/omicron?sslmode=disable
omicron-dev: cockroachdb directory: /dangerzone/omicron_tmp/.tmphbrhfC
omicron-dev: internal DNS HTTP:     http://[::1]:38018
omicron-dev: internal DNS:          [::1]:36531
omicron-dev: external DNS name:     oxide-dev.test
omicron-dev: external DNS HTTP:     http://[::1]:60088
omicron-dev: external DNS:          [::1]:50282
omicron-dev:   e.g. `dig @::1 -p 50282 test-suite-silo.sys.oxide-dev.test`
omicron-dev: management gateway:    http://[::1]:57986 (switch0)
omicron-dev: management gateway:    http://[::1]:58720 (switch1)
omicron-dev: silo name:             test-suite-silo
omicron-dev: privileged user name:  test-privileged

If you look at the log file, it's emitting lots of warnings. It's easiest to see them by filtering for warning-level messages:

23:06:56.891Z WARN e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): failed to identify switch slot for dendrite, will retry in 2 seconds
    background_task = bfd_manager
    reason = Communication Error: error sending request for url (http://[::1]:12225/local/switch-id): error trying to connect: tcp connect error: Connection refused (os error 146)
    zone_address = ::1
23:06:58.007Z WARN e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): failed to identify switch slot for dendrite, will retry in 2 seconds
    background_task = nat_v4_garbage_collector
    reason = Communication Error: error sending request for url (http://[::1]:12225/local/switch-id): error trying to connect: tcp connect error: Connection refused (os error 146)
    zone_address = ::1
23:06:59.285Z WARN e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (ServerContext): failed to identify switch slot for dendrite, will retry in 2 seconds
    background_task = switch_port_config_manager
    rack_id = c19a698f-c6f9-4a17-ae30-20d711b8f7dc
    reason = Communication Error: error sending request for url (http://[::1]:12225/local/switch-id): error trying to connect: tcp connect error: Connection refused (os error 146)
    zone_address = ::1

This also happens if you run Nexus by hand and I expect it happens in the test suite, too.

These warnings are coming from at least three different background tasks.

This is coming from map_switch_zone_addrs():

omicron/nexus/src/app/mod.rs

Lines 1036 to 1041 in e4bcfee

    
           warn!( 
        
               log, 
        
               "failed to identify switch slot for dendrite, will retry in 2 seconds"; 
        
               "zone_address" => #?addr, 
        
               "reason" => #?e 
        
           );

I noticed that if you're running Nexus by hand, you run into the same warning and it blocks Nexus startup. The workaround seems to be to set mgd in the Nexus config file to point directly at the instances. That's what the test suite does:

omicron/nexus/test-utils/src/lib.rs

Line 548 in e4bcfee

self.config.pkg.mgd.insert(switch_location, config);

and it works because it sets up clients directly:

omicron/nexus/src/app/mod.rs

Lines 282 to 288 in e4bcfee

    
           for (location, config) in &config.pkg.mgd { 
        
               let mg_client = mg_admin_client::Client::new( 
        
                   &format!("http://{}", config.address), 
        
                   log.clone(), 
        
               ); 
        
               mg_clients.insert(*location, Arc::new(mg_client)); 
        
           }

and bypasses the loop that emits this warning:

omicron/nexus/src/app/mod.rs

Line 322 in e4bcfee

if config.pkg.mgd.is_empty() {

This might be a dup of #5201? I was confused that even after setting these values in the config, that fixed one part of Nexus (the startup path) but not the other (the background task). I guess maybe the difference is that the startup path was getting stuck on mgd, while the background tasks are getting stuck on dendrite, and I only overrode mgd in my config?

The text was updated successfully, but these errors were encountered:

davepacheco mentioned this issue Jul 12, 2024

could provide DNS records for mgd #6077

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

networking background tasks emit lots of warnings in simulated environments (including test suite) #6076

networking background tasks emit lots of warnings in simulated environments (including test suite) #6076

davepacheco commented Jul 12, 2024

networking background tasks emit lots of warnings in simulated environments (including test suite) #6076

networking background tasks emit lots of warnings in simulated environments (including test suite) #6076

Comments

davepacheco commented Jul 12, 2024