Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporary name resolution on module with network host #6374

Closed
amunozh opened this issue May 20, 2022 · 10 comments
Closed

Temporary name resolution on module with network host #6374

amunozh opened this issue May 20, 2022 · 10 comments

Comments

@amunozh
Copy link

amunozh commented May 20, 2022

Expected Behavior

I'm developing a module that needs to access to the bluetooth stack. Therefore, I configured it with network host. I expected that having the edgeHub ports exposed the module connects without any problem.

Current Behavior

When the module tries to connect to the edge hub environment it crashes because a failure on name resolution. Thus, I think it does not find the ip of the edge hub.

Steps to Reproduce

Provide a detailed set of steps to reproduce the bug.

  1. Configure a module with network host configuration
  2. Deploy the module
  3. The module wont start

Output of iotedge check

Click here
sudo iotedge check
Configuration checks
--------------------
√ config.yaml is well-formed - OK
√ config.yaml has well-formed connection string - OK
√ container engine is installed and functional - OK
√ config.yaml has correct hostname - OK
√ config.yaml has correct URIs for daemon mgmt endpoint - OK
√ latest security daemon - OK
√ host time is close to real time - OK
√ container time is close to host time - OK
‼ DNS server - Warning
    Container engine is not configured with DNS server setting, which may impact connectivity to IoT Hub.
    Please see https://aka.ms/iotedge-prod-checklist-dns for best practices.
    You can ignore this warning if you are setting DNS server per module in the Edge deployment.
‼ production readiness: certificates - Warning
    The Edge device is using self-signed automatically-generated development certificates.
    They will expire in 43 days (at 2022-07-03 14:46:49 UTC) causing module-to-module and downstream device communication to fail on an active deployment.
    After the certs have expired, restarting the IoT Edge daemon will trigger it to generate new development certs.
    Please consider using production certificates instead. See https://aka.ms/iotedge-prod-checklist-certs for best practices.
√ production readiness: logs policy - OK
‼ production readiness: Edge Agent's storage directory is persisted on the host filesystem - Warning
    The edgeAgent module is not configured to persist its /tmp/edgeAgent directory on the host filesystem.
    Data might be lost if the module is deleted or updated.
    Please see https://aka.ms/iotedge-storage-host for best practices.
‼ production readiness: Edge Hub's storage directory is persisted on the host filesystem - Warning
    The edgeHub module is not configured to persist its /home/pi/covid_gw/tmp_hub/edgeHub directory on the host filesystem.
    Data might be lost if the module is deleted or updated.
    Please see https://aka.ms/iotedge-storage-host for best practices.

Connectivity checks
-------------------
√ host can connect to and perform TLS handshake with IoT Hub AMQP port - OK
√ host can connect to and perform TLS handshake with IoT Hub HTTPS / WebSockets port - OK
√ host can connect to and perform TLS handshake with IoT Hub MQTT port - OK
√ container on the default network can connect to IoT Hub AMQP port - OK
√ container on the default network can connect to IoT Hub HTTPS / WebSockets port - OK
√ container on the default network can connect to IoT Hub MQTT port - OK
√ container on the IoT Edge module network can connect to IoT Hub AMQP port - OK
√ container on the IoT Edge module network can connect to IoT Hub HTTPS / WebSockets port - OK
√ container on the IoT Edge module network can connect to IoT Hub MQTT port - OK

18 check(s) succeeded.
4 check(s) raised warnings. Re-run with --verbose for more details.

Device Information

  • Host OS [e.g. Ubuntu 18.04, Windows Server IoT 2019]: Linux raspberrypi 4.19.66-v7+ remove env var (#1247) #1253 SMP Thu Aug 15 11:49:46 BST 2019 armv7l GNU/Linux
  • Architecture [e.g. amd64, arm32, arm64]: arm32v7
  • Container OS [e.g. Linux containers, Windows containers]: Linuz Containers

Runtime Versions

  • aziot-edged [run iotedge version]: iotedge 1.1.13
  • Edge Agent [image tag (e.g. 1.0.0)]: :1.0.10-linux-arm32v7
  • Edge Hub [image tag (e.g. 1.0.0)]: :1.0.10-linux-arm32v7
  • Docker/Moby [run docker version]: 3.0.13+azure

Logs

aziot-edged logs

<Paste here between the triple backticks>

edge-agent logs
<6> 2022-05-20 16:28:26.207 +00:00 [INF] - Module 'DangerousZoneHandler' scheduled to restart after 05m:00s (02m:02s left).
My module logs

INFO 2022/05/20 16:20:22 PM transport.connect raised error
INFO 2022/05/20 16:20:22 PM Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 391, in connect
    host=self._hostname, port=443, keepalive=self._keep_alive
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 914, in connect
    return self.reconnect()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1044, in reconnect
    sock = self._create_socket_connection()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 3685, in _create_socket_connection
    return socket.create_connection(addr, timeout=self._connect_timeout, source_address=source)
  File "/usr/local/lib/python3.7/socket.py", line 707, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "/usr/local/lib/python3.7/socket.py", line 752, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/pipeline/pipeline_stages_mqtt.py", line 193, in _run_op
    self.transport.connect(password=password)
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 418, in connect
    raise exceptions.ConnectionFailedError(cause=e)
azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')

ERROR 2022/05/20 16:20:22 PM Subscribe for input failed.  Not enabling feature
INFO 2022/05/20 16:20:22 PM Callback completed with error ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')
INFO 2022/05/20 16:20:22 PM ["azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')\n"]
WARNING 2022/05/20 16:20:22 PM Unexpected error while creating client ConnectionFailedError('Could not connect to IoTHub') caused by ConnectionFailedError(None)
ERROR 2022/05/20 16:20:22 PM Task was destroyed but it is pending!
task: <Task pending coro=<AsyncHandlerManager._receiver_handler_runner() running at /usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_handler_manager.py:43> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/local/lib/python3.7/asyncio/futures.py:351, <TaskWakeupMethWrapper object at 0x75a202b0>()]> cb=[_chain_future.<locals>._call_set_state() at /usr/local/lib/python3.7/asyncio/futures.py:358]>
ERROR 2022/05/20 16:20:23 PM Task was destroyed but it is pending!
task: <Task pending coro=<_AsyncQueueProxy.get() running at /usr/local/lib/python3.7/site-packages/janus/__init__.py:451> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x75a208b0>()]> cb=[_chain_future.<locals>._call_set_state() at /usr/local/lib/python3.7/asyncio/futures.py:358]>

Deployment Template

deployment.template.json
{
  "$schema-template": "2.0.0",
  "modulesContent": {
    "$edgeAgent": {
      "properties.desired": {
        "schemaVersion": "1.0",
        "runtime": {
          "type": "docker",
          "settings": {
            "minDockerVersion": "v1.25",
            "loggingOptions": "",
            "registryCredentials": {
              "testbench": {
                "username": "$CONTAINER_REGISTRY_USERNAME_testbench",
                "password": "$CONTAINER_REGISTRY_PASSWORD_testbench",
                "address": "covid19reply.azurecr.io"
              }
            }
          }
        },
        "systemModules": {
          "edgeAgent": {
            "type": "docker",
            "settings": {
              "image": "mcr.microsoft.com/azureiotedge-agent:1.0.10-linux-arm32v7",
              "createOptions": {
                "ExposedPorts": {
                  "9600/tcp": {}
                },
                "HostConfig": {
                  "PortBindings": {
                    "9600/tcp": [
                      {
                        "HostPort": "9600"
                      }
                    ]
                  }
                }
              }
            },
            "env": {
              "UpstreamProtocol": {
                "value": "AmqpWs"
              }
            }
          },
          "edgeHub": {
            "type": "docker",
            "status": "running",
            "restartPolicy": "always",
            "settings": {
              "image": "mcr.microsoft.com/azureiotedge-hub:1.0.10-linux-arm32v7",
              "createOptions": {
                "ExposedPorts": {
                  "9600/tcp": {},
                  "5671/tcp": {},
                  "8883/tcp": {},
                  "443/tcp": {}
                },
                "HostConfig": {
                  "Privileged": true,
                  "Binds": [
                    "/home/pi/covid_gw/tmp_hub:/tmp/edgeHub"
                  ],
                  "PortBindings": {
                    "5671/tcp": [
                      {
                        "HostPort": "5671"
                      }
                    ],
                    "8883/tcp": [
                      {
                        "HostPort": "8883"
                      }
                    ],
                    "443/tcp": [
                      {
                        "HostPort": "443"
                      }
                    ],
                    "9600/tcp": [
                      {
                        "HostPort": "9603"
                      }
                    ]
                  }
                }
              }
            },
            "env": {
              "UpstreamProtocol": {
                "value": "AmqpWs"
              },
              "storageFolder": {
                "value": "/home/pi/covid_gw/tmp_hub"
              }
            }
          }
        },
        "modules": {
          "UARTHandler": {
            "version": "1.0",
            "type": "docker",
            "status": "running",
            "restartPolicy": "always",
            "settings": {
              "image": "${MODULES.UARTHandler}",
              "createOptions": {
                "HostConfig": {
                  "Privileged": true
                },
                "ExposedPorts": {
                  "8080/tcp": {}
                }
              }
            },
            "env": {
              "UpstreamProtocol": {
                "value": "AmqpWs"
              }
            }
          },
          "ContactManager": {
            "version": "1.0",
            "type": "docker",
            "status": "running",
            "restartPolicy": "always",
            "settings": {
              "image": "${MODULES.ContactManager}",
              "createOptions": {
                "HostConfig": {
                  "Privileged": true,
                  "Binds": [
                    "/sys/class/net/eth0:/app/mac_address"
                  ]
                }
              }
            },
            "env": {
              "UpstreamProtocol": {
                "value": "AmqpWs"
              }
            }
          },
          "FOTAHandler": {
            "version": "1.0",
            "type": "docker",
            "status": "running",
            "restartPolicy": "always",
            "settings": {
              "image": "${MODULES.FOTAHandler}",
              "createOptions": {
                "HostConfig": {
                  "PortBindings": {
                    "8080/tcp": [
                      {
                        "HostPort": "8080"
                      }
                    ],
                    "1883/tcp": [
                      {
                        "HostPort": "1883"
                      }
                    ],
                    "443/tcp": [
                      {
                        "HostPort": "443"
                      }
                    ]
                  }
                }
              }
            }
          },
          "BadgeTwinHandler": {
            "version": "1.0",
            "type": "docker",
            "status": "running",
            "restartPolicy": "always",
            "settings": {
              "image": "${MODULES.BadgeTwinHandler}",
              "createOptions": {
                "ExposedPorts": {
                  "8080/tcp": {}
                },
                "HostConfig": {
                  "Privileged": true,
                  "Binds": [
                    "/home/pi/covid_gw/db:/app/db",
                    "/sys/class/net/eth0:/app/mac_address",
                    "/home/pi/covid_gw/certs:/app/certs:ro"
                  ]
                }
              }
            },
            "env": {
              "UpstreamProtocol": {
                "value": "AmqpWs"
              },
              "Customer": {
                "value": "PreProd"
              }
            }
          },
          "DiagnosticsModule": {
            "version": "1.0",
            "type": "docker",
            "status": "running",
            "restartPolicy": "always",
            "settings": {
              "image": "${MODULES.DiagnosticsModule}",
              "createOptions": {
                "HostConfig": {
                  "Privileged": true,
                  "Binds": [
                    "/proc/sys/kernel/sysrq:/app/system/enable_reboot",
                    "/proc/sysrq-trigger:/app/system/reboot_now",
                    "/etc/timezone:/app/timezone"
                  ]
                }
              }
            }
          },
          "DFUModule": {
            "version": "1.0",
            "type": "docker",
            "status": "running",
            "restartPolicy": "always",
            "settings": {
              "image": "${MODULES.DFUModule}",
              "createOptions": {
                "HostConfig": {
                  "Privileged": true,
                  "Binds": [
                    "/dev:/dev",
                    "/home/pi/covid_gw/certs:/app/certs:ro"
                  ]
                }
              }
            },
            "env": {
              "TestingRelease": {
                "value": "True"
              }
            }
          },
          "GatewayTwinHandler": {
            "version": "1.0",
            "type": "docker",
            "status": "running",
            "restartPolicy": "always",
            "settings": {
              "image": "${MODULES.GatewayTwinHandler}",
              "createOptions": {
                "ExposedPorts": {
                  "8080/tcp": {}
                },
                "HostConfig": {
                  "Privileged": true,
                  "Binds": [
                    "/sys/class/net/eth0:/app/mac_address",
                    "/home/pi/covid_gw/certs:/app/certs:ro",
                    "/home/pi/covid_gw/persistent_files:/app/persistent_files"
                  ]
                }
              }
            },
            "env": {
              "UpstreamProtocol": {
                "value": "AmqpWs"
              },
              "Customer": {
                "value": "PreProd"
              }
            }
          },
          "DangerousZoneHandler": {
            "version": "1.0",
            "type": "docker",
            "status": "running",
            "restartPolicy": "always",
            "settings": {
              "image": "${MODULES.DangerousZoneHandler}",
              "createOptions": {
                  "NetworkingConfig": {
                    "EndpointsConfig": {
                      "host": {}
                    }
                  },
                "HostConfig": {
                  "NerworkMode":"host",
                  "Privileged": true,
                  "Binds": [
                    "/var/run/dbus/:/var/run/dbus/"
                  ],
                  "Devices":
                  [{"CgroupPermissions":"ro",
                    "PathInContainer":"/dev/snd/",
                    "PathOnHost":"/dev/snd/"}],
                  "GroupAdd":
                  ["audio"]
                }
              }
            }
            ,
            "env": {
              "ALSA_CARD": {
                "value": "0"
              },
              "UpstreamProtocol": {
                "value": "AmqpWs"
              }
            }

          }
        }
      }
    },
    "$edgeHub": {
      "properties.desired": {
        "schemaVersion": "1.0",
        "routes": {
          "UARTHandlerToContactManager": "FROM /messages/modules/UARTHandler/outputs/contactsPayload INTO BrokeredEndpoint(\"/modules/ContactManager/inputs/input1\")",
          "ContactManagerToIoTHub": "FROM /messages/modules/ContactManager/outputs/* INTO $upstream",
          "FOTARequestFromBadgeTwinHandler": "FROM /messages/modules/BadgeTwinHandler/outputs/FOTARequest INTO BrokeredEndpoint(\"/modules/FOTAHandler/inputs/FOTARequest\")",
          "SOFToBadgeTwinHandler": "FROM /messages/modules/UARTHandler/outputs/SOF INTO BrokeredEndpoint(\"/modules/BadgeTwinHandler/inputs/SOF\")",
          "InitToGatewayTwinHandler": "FROM /messages/modules/UARTHandler/outputs/Initialization INTO BrokeredEndpoint(\"/modules/GatewayTwinHandler/inputs/Initialization\")",
          "FOTAHandlerToDFUModule": "FROM /messages/modules/FOTAHandler/outputs/DeviceToUpdate INTO BrokeredEndpoint(\"/modules/DFUModule/inputs/DeviceToUpdate\")",
          "DFUModuleToFOTAHandler": "FROM /messages/modules/DFUModule/outputs/UpdateResult INTO BrokeredEndpoint(\"/modules/FOTAHandler/inputs/UpdateResult\")",
          "AdvSetForFOTAHandler": "FROM /messages/modules/UARTHandler/outputs/AdvSet INTO BrokeredEndpoint(\"/modules/FOTAHandler/inputs/AdvSet\")",
          "EndTxForBadgeTwinHandler": "FROM /messages/modules/UARTHandler/outputs/EndTx INTO BrokeredEndpoint(\"/modules/BadgeTwinHandler/inputs/EndTx\")",
          "FOTAResultForBadgeTwinHandler": "FROM /messages/modules/FOTAHandler/outputs/FOTAResult INTO BrokeredEndpoint(\"/modules/BadgeTwinHandler/inputs/FOTAResult\")",
          "FOTARequestFromGatewayTwinHandler": "FROM /messages/modules/GatewayTwinHandler/outputs/FOTARequest INTO BrokeredEndpoint(\"/modules/FOTAHandler/inputs/FOTARequest\")",
          "ReportGatewayFromUARTHandler": "FROM /messages/modules/UARTHandler/outputs/GatewayReport INTO BrokeredEndpoint(\"/modules/GatewayTwinHandler/inputs/GatewayReport\")",
          "ReportGatewayFromBadgeTwinHandler": "FROM /messages/modules/BadgeTwinHandler/outputs/GatewayReport INTO BrokeredEndpoint(\"/modules/GatewayTwinHandler/inputs/GatewayReport\")",
          "FirealarmToBadgeTwinHandler": "FROM /messages/modules/GatewayTwinHandler/outputs/FireAlarm INTO BrokeredEndpoint(\"/modules/BadgeTwinHandler/inputs/FireAlarm\")",
          "DangerousZoneHandlerFromGatewayTwinHandler": "FROM /messages/modules/GatewayTwinHandler/outputs/DangerousZone INTO BrokeredEndpoint(\"/modules/DangerousZoneHandler/inputs/DangerousZone\")"
        },
        "storeAndForwardConfiguration": {
          "timeToLiveSecs": 172800
        }
      }
    },
    "GatewayTwinHandler": {
      "properties.desired": {
        "new_config": {}
      }
    },
    "BadgeTwinHandler": {
      "properties.desired": {
        "default_config": {
          "id": "",
          "activate": false,
          "customer": "",
          "firmware": {
            "version": "0.0.0",
            "update": {
              "status": "Up to date",
              "success": true,
              "message": ""
            }
          },
          "properties": {
            "distance": 200,
            "alarm_time": 15,
            "max_duration": 600,
            "silent_mode": false
          },
          "timers": {
            "silent_button": 2,
            "sleep_button": 16,
            "power_off_button": 20,
            "silent_duration": 300
          },
          "group": 0,
          "family": "[0,0,0,0]",
          "battery_alarm": true,
          "sleep_state": false,
          "last_seen": 0,
          "type": "badge"
        },
        "new_config": {
          "device_list": "[]",
          "properties": {}
        }
      }
    },
    "DiagnosticsModule": {
      "properties.desired": {
        "new_config": {
          "scheduled_time": ""
        }
      }
    },
    "FOTAHandler": {
      "properties.desired": {
        "massive_fota": ""
      }
    }
  }
}
@amunozh
Copy link
Author

amunozh commented May 24, 2022

Hello, Anyone could help me please? I'm blocked rigth now.

@gauravIoTEdge
Copy link
Contributor

@amunozh - 'Temporary failure in name resolution' is a DNS issue, not specific to Edge Hub.

For one, try restarting docker.

But more importantly, you are on very old versions of EdgeAgent and EdgeHub. Your iotedge daemon is 1.1.13, but EA and EH are 1.0.0 -- please update them to 1.1.13 and try again.

Please do not continue to use 1.0.0

@vadim-kovalyov
Copy link
Contributor

@amunozh it is not clear from the logs what name you use to connect to edgeHub. But just in case, you can't use edgeHub as a name, it will not be resolved from host network. Since edgeHub ports are bound to 0.0.0.0, you should be able to connect your local module to 0.0.0.0:8883

@vadim-kovalyov
Copy link
Contributor

Please reopen if you have further questions

@amunozh
Copy link
Author

amunozh commented May 26, 2022

Hello, I'm reopening because I did not solve the problem. I just updated all the runtime environment:

  • iotedge runtime: 1.2.9
  • edgeAgent: 1.2.9
  • edgeHub: 1.2.9

I'm not directly connecting to the edgeHub with any addess because I'm using the Python SDK, therefore to create the client connection with the edgeHub I'm using the following code:

try:
        #Create Edge Client
        client = IoTHubModuleClient.create_from_edge_environment(websockets=True) 
    except Exception as ex:
        log.warning("Unexpected error while creating client: %s", ex)
        raise

When creating from edge environment it crashes.

@vadim-kovalyov
Copy link
Contributor

vadim-kovalyov commented May 26, 2022

@amunozh the code you provided should work as is. There is something going on with the networking on you system. Check docker inspect <yourcontainer> and see the value of IOTEDGE_GATEWAYHOSTNAME env var - this is the address that IoTHubModuleClient.create_from_edge_environment() uses to connect to edgeHub. Please make sure that this name resolves from the host and that you can connect from the host network (try nslookup <name> and openssl s_client -connect <name>:443 -showcerts)

UPD: is there are reason you use websockets=True for module2module connection? I would suggest to use MQTT

@amunozh
Copy link
Author

amunozh commented May 30, 2022

Hello @vadim-kovalyov. Thanks for your replies. I don't know why that module is not taking into consideration the environment variable IOTEDGE_GATEWAYHOSTNAME that I'm setting to it. Here in below some logs of the module before it crashes and the environment variables it has.

Logs of the module

Click here
DEBUG 2022/05/30 14:17:09 PM Using selector: EpollSelector
DEBUG 2022/05/30 14:17:09 PM http://localhost:None "GET /trust-bundle?api-version=2021-12-07 HTTP/1.1" 200 1126
DEBUG 2022/05/30 14:17:09 PM http://localhost:None "POST /modules/DangerousZoneHandler/genid/637884722201263451/sign?api-version=2021-12-07 HTTP/1.1" 200 57
DEBUG 2022/05/30 14:17:09 PM Starting run_op in pipeline thread
DEBUG 2022/05/30 14:17:09 PM Creating pipeline executor
DEBUG 2022/05/30 14:17:09 PM Gateway Hostname Present. Setting Hostname to: raspberrypi
DEBUG 2022/05/30 14:17:09 PM HTTPTransportStage(InitializePipelineOperation): got connection args
DEBUG 2022/05/30 14:17:09 PM creating a SSL context
DEBUG 2022/05/30 14:17:09 PM InitializePipelineOperation: completing without error
DEBUG 2022/05/30 14:17:09 PM Starting <azure.iot.device.common.evented_callback.EventedCallback object at 0x75ae4230> in callback thread
DEBUG 2022/05/30 14:17:09 PM Creating callback executor
DEBUG 2022/05/30 14:17:09 PM Callback completed with result None
DEBUG 2022/05/30 14:17:09 PM Starting run_op in pipeline thread
DEBUG 2022/05/30 14:17:09 PM Scheduling SAS Token renewal at epoch time: 1653923709
DEBUG 2022/05/30 14:17:09 PM Gateway Hostname Present. Setting Hostname to: raspberrypi
DEBUG 2022/05/30 14:17:09 PM MQTTTransportStage(InitializePipelineOperation): got connection args
DEBUG 2022/05/30 14:17:09 PM creating mqtt client
INFO 2022/05/30 14:17:09 PM Creating client for connecting using MQTT over websockets
DEBUG 2022/05/30 14:17:09 PM creating a SSL context
DEBUG 2022/05/30 14:17:09 PM configuring SSL context with custom server verification cert
DEBUG 2022/05/30 14:17:09 PM Created MQTT protocol client, assigned callbacks
DEBUG 2022/05/30 14:17:09 PM InitializePipelineOperation: completing without error
DEBUG 2022/05/30 14:17:09 PM Starting <azure.iot.device.common.evented_callback.EventedCallback object at 0x75ae4470> in callback thread
DEBUG 2022/05/30 14:17:09 PM Callback completed with result None
DEBUG 2022/05/30 14:17:09 PM Creating new event loop - CLIENT_INTERNAL_LOOP
DEBUG 2022/05/30 14:17:09 PM Using selector: EpollSelector
DEBUG 2022/05/30 14:17:09 PM Creating new handler runner for handler: _on_message_received
DEBUG 2022/05/30 14:17:09 PM Creating new event loop - CLIENT_HANDLER_RUNNER_LOOP
DEBUG 2022/05/30 14:17:09 PM Using selector: EpollSelector
INFO 2022/05/30 14:17:09 PM Enabling feature:input...
DEBUG 2022/05/30 14:17:09 PM HANDLER RUNNER (_on_message_received): Starting runner
DEBUG 2022/05/30 14:17:09 PM enable_feature input called
DEBUG 2022/05/30 14:17:09 PM Starting run_op in pipeline thread
DEBUG 2022/05/30 14:17:09 PM EnableFeatureOperation: creating worker op of type MQTTSubscribeOperation
DEBUG 2022/05/30 14:17:09 PM AutoConnectStage(MQTTSubscribeOperation): Op needs connection.  Queueing this op and starting a ConnectionOperation
DEBUG 2022/05/30 14:17:09 PM AutoConnectStage(MQTTSubscribeOperation): calling down with Connect operation
INFO 2022/05/30 14:17:09 PM ReconnectStage(ConnectOperation): State changes LOGICALLY_DISCONNECTED->LOGICALLY_CONNECTED.  Adding to wait list and sending new connect op down
DEBUG 2022/05/30 14:17:09 PM ReconnectStage: sending new connect op down
DEBUG 2022/05/30 14:17:09 PM ConnectionLockStage(ConnectOperation): blocking
DEBUG 2022/05/30 14:17:09 PM MQTTTransportStage(ConnectOperation): connecting
DEBUG 2022/05/30 14:17:09 PM MQTTTransportStage(ConnectOperation): Starting watchdog
DEBUG 2022/05/30 14:17:09 PM connecting to mqtt broker
INFO 2022/05/30 14:17:09 PM Connect using port 443 (websockets)
INFO 2022/05/30 14:17:09 PM Forcing paho disconnect to prevent it from automatically reconnecting
DEBUG 2022/05/30 14:17:09 PM Done forcing paho disconnect
INFO 2022/05/30 14:17:09 PM transport.connect raised error
INFO 2022/05/30 14:17:09 PM Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 391, in connect
    host=self._hostname, port=443, keepalive=self._keep_alive
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 914, in connect
    return self.reconnect()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1044, in reconnect
    sock = self._create_socket_connection()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 3685, in _create_socket_connection
    return socket.create_connection(addr, timeout=self._connect_timeout, source_address=source)
  File "/usr/local/lib/python3.7/socket.py", line 707, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "/usr/local/lib/python3.7/socket.py", line 752, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/pipeline/pipeline_stages_mqtt.py", line 193, in _run_op
    self.transport.connect(password=password)
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 418, in connect
    raise exceptions.ConnectionFailedError(cause=e)
azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')

DEBUG 2022/05/30 14:17:09 PM MQTTTransportStage(ConnectOperation): cancelling watchdog
DEBUG 2022/05/30 14:17:09 PM ConnectOperation: completing with error ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')
DEBUG 2022/05/30 14:17:09 PM ConnectionLockStage(ConnectOperation): op failed.  Unblocking queue with error: ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')
DEBUG 2022/05/30 14:17:09 PM ConnectionLockStage(ConnectOperation): unblocking and releasing queued ops.
DEBUG 2022/05/30 14:17:09 PM ConnectionLockStage(ConnectOperation): processing 0 items in queue for error=ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')
DEBUG 2022/05/30 14:17:09 PM ReconnectStage(ConnectOperation): on_connect_complete error=ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution') state=LOGICALLY_CONNECTED never_connected=True connected=False
DEBUG 2022/05/30 14:17:09 PM ReconnectStage: completing waiting ops with error=ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')
DEBUG 2022/05/30 14:17:09 PM ConnectOperation: completing with error ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')
DEBUG 2022/05/30 14:17:09 PM AutoConnectStage(MQTTSubscribeOperation): Connection failed.  Completing with failure because of connection failure: ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')
DEBUG 2022/05/30 14:17:09 PM MQTTSubscribeOperation: completing with error ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')
DEBUG 2022/05/30 14:17:09 PM EnableFeatureOperation: Worker op (MQTTSubscribeOperation) has been completed
DEBUG 2022/05/30 14:17:09 PM EnableFeatureOperation: completing with error ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')
DEBUG 2022/05/30 14:17:09 PM Starting on_complete in callback thread
ERROR 2022/05/30 14:17:09 PM Subscribe for input failed.  Not enabling feature
INFO 2022/05/30 14:17:09 PM Callback completed with error ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')
INFO 2022/05/30 14:17:09 PM ["azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by gaierror(-3, 'Temporary failure in name resolution')\n"]
WARNING 2022/05/30 14:17:09 PM Unexpected error while creating client ConnectionFailedError('Could not connect to IoTHub') caused by ConnectionFailedError(None)
ERROR 2022/05/30 14:17:09 PM Task was destroyed but it is pending!
task: <Task pending coro=<AsyncHandlerManager._receiver_handler_runner() running at /usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_handler_manager.py:43> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/local/lib/python3.7/asyncio/futures.py:351, <TaskWakeupMethWrapper object at 0x75aaf5d0>()]> cb=[_chain_future.<locals>._call_set_state() at /usr/local/lib/python3.7/asyncio/futures.py:358]>
ERROR 2022/05/30 14:17:09 PM Task was destroyed but it is pending!
task: <Task pending coro=<_AsyncQueueProxy.get() running at /usr/local/lib/python3.7/site-packages/janus/__init__.py:451> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x75aaf5f0>()]> cb=[_chain_future.<locals>._call_set_state() at /usr/local/lib/python3.7/asyncio/futures.py:358]>

Environmental variables

Click here
PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=f4c4feced1a8
ALSA_CARD=0
IOTEDGE_APIVERSION=2021-12-07
IOTEDGE_AUTHSCHEME=sasToken
IOTEDGE_DEVICEID=TestCovidGW
IOTEDGE_GATEWAYHOSTNAME=raspberrypi
IOTEDGE_IOTHUBHOSTNAME=Covid19IoTHub.azure-devices.net
IOTEDGE_MODULEGENERATIONID=637884722201263451
IOTEDGE_MODULEID=DangerousZoneHandler
IOTEDGE_WORKLOADURI=unix:///var/run/iotedge/workload.sock
RuntimeLogLevel=Information
UpstreamProtocol=AmqpWs
LANG=C.UTF-8
GPG_KEY=0D96DF4D4110E5C43FBFB17F2D347EA6AA65421D
PYTHON_VERSION=3.7.13
PYTHON_PIP_VERSION=22.0.4
PYTHON_SETUPTOOLS_VERSION=57.5.0
PYTHON_GET_PIP_URL=https://github.com/pypa/get-pip/raw/a312303dbd516f6a692f2fee59852701bd828dd8/public/get-pip.py
PYTHON_GET_PIP_SHA256=8dd03e99645c19f49bbb629ce65c46b665ee92a1d94d246418bad6afade89f8d
HOME=/root

Configuration of the module in the deployment

"DangerousZoneHandler": {
            "version": "1.0",
            "type": "docker",
            "status": "running",
            "restartPolicy": "always",
            "settings": {
              "image": "${MODULES.DangerousZoneHandler}",
              "createOptions": {
                  "NetworkingConfig": {
                    "EndpointsConfig": {
                      "host": {}
                    }
                  },
                "HostConfig": {
                  "NerworkMode":"host",
                  "Privileged": true,
                  "Binds": [
                    "/var/run/dbus/:/var/run/dbus/"
                  ],
                  "Devices":
                  [{"CgroupPermissions":"ro",
                    "PathInContainer":"/dev/snd/",
                    "PathOnHost":"/dev/snd/"}],
                  "GroupAdd":
                  ["audio"]
                }
              }
            }
            ,
            "env": {
              "ALSA_CARD": {
                "value": "0"
              },
              "UpstreamProtocol": {
                "value": "AmqpWs"
              },
              "IOTEDGE_GATEWAYHOSTNAME": {
                "value": "0.0.0.0"
              }
            }

          }

PD: I set websockets=True just for testing if it may change something. But now I'll change it. (In which scenarios it would be convient to use websockets among modules?)

@amunozh
Copy link
Author

amunozh commented May 30, 2022

UPDATE:
I changed the hostname to verify if it was the problem, but anything changed.
It is weird that the edgeAgent and edgeHub created the certificates for the new hostname (proximitytool) but the modules remained with the old one. How does it work? How the hostname is set and distributed along the modules?

I tried the commands you said before @vadim-kovalyov :

nslookup raspberrypi

Server:         10.17.64.10
Address:        10.17.64.10#53

Name:   raspberrypi.replynet.prv
Address: 10.38.98.63

nslookup proximitytool

;; Got SERVFAIL reply from 10.17.64.10, trying next server
Server:         10.17.64.11
Address:        10.17.64.11#53

** server can't find proximitytool: SERVFAIL

openssl s_client -connect proximitytool:443 -showcerts

CONNECTED(00000003)
Can't use SSL_get_servername
depth=0 CN = raspberrypi
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 CN = raspberrypi
verify error:num=21:unable to verify the first certificate
verify return:1
depth=0 CN = raspberrypi
verify return:1
---
Certificate chain
 0 s:CN = raspberrypi
   i:CN = iotedged workload ca proximitytool
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
---
Server certificate
subject=CN = raspberrypi

issuer=CN = iotedged workload ca proximitytool

---
No client certificate CA names sent
Client Certificate Types: RSA sign, DSA sign, ECDSA sign
Requested Signature Algorithms: ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:Ed25519:Ed448:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA224:ECDSA+SHA1:RSA+SHA224:RSA+SHA1:DSA+SHA224:DSA+SHA1:DSA+SHA256:DSA+SHA384:DSA+SHA512
Shared Requested Signature Algorithms: ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:Ed25519:Ed448:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA-PSS+SHA256:RSA-PSS+SHA384:RSA-PSS+SHA512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA224:RSA+SHA224:DSA+SHA224:DSA+SHA256:DSA+SHA384:DSA+SHA512
Peer signing digest: SHA256
Peer signature type: RSA-PSS
Server Temp Key: X25519, 253 bits
---
SSL handshake has read 1450 bytes and written 388 bytes
Verification error: unable to verify the first certificate
---
New, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES256-GCM-SHA384
    Session-ID: B0C298F12AD9C5B8519E929C99876E0DAB68D5030484712F83E21C6E3D95C7F2
    Session-ID-ctx:
    Master-Key: DB37849367C8B233DEB52B91D275ADFAB5C473C4EFC74D66FFBCC58BB6212B7B47CD0E2542DBF2C825D8C64049D3D5F9
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    TLS session ticket lifetime hint: 7200 (seconds)
    TLS session ticket:
    ...

    Start Time: 1653925477
    Timeout   : 7200 (sec)
    Verify return code: 21 (unable to verify the first certificate)
    Extended master secret: yes
---

HTTP/1.1 400 Bad Request
Connection: close
Date: Mon, 30 May 2022 15:44:42 GMT
Server: Kestrel
Content-Length: 0

openssl s_client -connect raspberrypi:443 -showcerts

1995423808:error:0200206F:system library:connect:Connection refused:../crypto/bio/b_sock2.c:110:
1995423808:error:2008A067:BIO routines:BIO_connect:connect error:../crypto/bio/b_sock2.c:111:
connect:errno=111

@vadim-kovalyov
Copy link
Contributor

About websocket - the main use case is for isolated environments, where firewall rules don't allow MQTT or AMQP ports, but only HTTPS (433), and/or if you use HTTP proxy to connect to the outside world. Unless you have a good reason, I would not use WebSocket for module2module communication, it adds overhead with no apparent benefits. Just use MQTT.

You must not set IOTEDGE_GATEWAYHOSTNAME evn var yourself. I asked you just to verify the value and to see if the hostname is reachable from the host. The IOTEDGE_GATEWAYHOSTNAME is applied by iotedge service to every new container (module) that is being created by the iotedge runtime. The value is taken from config.toml.

Also, make sure that sudo iotedge check --verbose does not show any errors. The hostname of the OS must match the hostname value in the config.toml file. If you made changes to any of them, you need to restart the docker+iotedge, or just the machine.

@gauravIoTEdge
Copy link
Contributor

Since we haven't heard back, presumably, the enquiry has been addressed. Please reopen if there are further questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants