Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manual Network Try 3. #340

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

lipere123
Copy link
Contributor

A rework of the manual network.
I hope it will be a bit better.

…Edge Node class, which is part of the Exo project. The Edge Node class seems to encapsulate information about a device or node in a distributed system, including its capabilities and configuration.

Here's a breakdown of what I think each section does:

1. **Class Definition**: The code defines a class named `EdgeNode` with several attributes (e.g., `_discovery_config`, `_config_devices`) and methods (e.g., `__init__`, `get_capabilities`). This suggests that the Edge Node is responsible for managing its own configuration and capabilities.

2. **Initialization Method (`__init__`)**: The `__init__` method initializes an instance of the EdgeNode class with various attributes, including a discovery config, a list of configured devices, whoami information (e.g., hostname), node ID, host address, port number, model, chip type, memory size, and floating-point capabilities (fp32, fp16, int8). This method seems to set up the Edge Node's basic properties.

3. **Property Accessors**: The code defines several property accessors for various attributes of the EdgeNode class. These allow other parts of the program to access these attributes as if they were instance variables. For example, `@property def model(self): return self._model` allows you to get the value of the `_model` attribute using `edge_node.model`.

4. **Discovery Config and Config Devices**: The code defines two properties (`discovery_config` and `config_devices`) that seem to be related to how this Edge Node is configured or discovered within a system.

5. **Whoami Information**: The `whoami` property returns information about the host where this Edge Node is running, including its hostname.

6. **Node ID, Host Address, Port Number**: These properties (`node_id`, `node_host`, and `node_port`) seem to provide identifying information about the Edge Node itself.

7. **Model, Chip Type, Memory Size**: The `model`, `chip`, and `memory` properties appear to describe the capabilities of this Edge Node in terms of its hardware configuration.

8. **Floating-Point Capabilities**: The `fp32`, `fp16`, and `int8` properties seem to indicate the floating-point arithmetic capabilities of this Edge Node, with different types (float32, float16, int8) having different performance characteristics.

9. **Test Scripts**: The code includes two test scripts (`test_topology.sh` and `validate_topology.py`) that appear to be used for testing purposes. These scripts seem to validate the topology configuration by loading a YAML file containing Edge Node configurations and printing out their details.

Overall, this code seems to be part of a larger system designed to manage distributed computing resources (Edge Nodes) with varying capabilities. The EdgeNode class encapsulates information about each device, allowing for easy access and management of its properties.
@lipere123
Copy link
Contributor Author

@AlexCheema this new manual discovery is bit better I think.
But I have trouble with GRPC that is not connected.
When I check the port, they are open.
Thanks in avance.
Best REgards.
Benjamin.

edgenode2-exo-run.log
edgenode3-exo-run.log

exo/main.py Outdated
Comment on lines 66 to 67
if args.discovery_config is None:
args.discovery_module == "udp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on here? Can you please throw an error if args.discovery_module == "manual" and args.discovery_config is None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

import socket
import yaml

class ReadManualConfig():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a class? Just make a single function that reads the device capabilities from the file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly, I think it is more flexible.
For exemple, for now, I set args.wait_for_peers = 1 in main.py.
But after we finish debug, I want the number of peer to wait to comme from ReadManualConfig.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont understand what you mean

exo/main.py Outdated
if args.discovery_config is None:
args.discovery_module == "udp"
else:
list_device = ReadManualConfig(discovery_config=args.discovery_config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReadManualConfig should be a function not a class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly, I think it is more flexible.
For exemple, for now, I set args.wait_for_peers = 1 in main.py.
But after we finish debug, I want the number of peer to wait to comme from ReadManualConfig.

exo/main.py Outdated
args.discovery_module == "udp"
else:
list_device = ReadManualConfig(discovery_config=args.discovery_config)
list_device.device_capabilities((str((list_device.whoami))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im confused why there's another function call here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is 2 call fo ReadManualConfig.
One in the main for the current instance so the main.py is instancied with his own value.
The second one for the discovery of the other devices.

exo/main.py Outdated
args.node_port = list_device.node_port
args.wait_for_peers = 1

if args.discovery_module != "manual":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if shouldn't be here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just changed that, a bit better ?

for device in self.list_device.config_devices:
if f"{self.list_device.whoami}" != f"{device['server']}":
print(f"Getting Id {device['id']} == Adresse: {device['address']} {device['port']}")
peer_id = (str(f"{device['id']}"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's with the str(f" stuff? just peer_id = device['id'] is fine...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


for device in self.list_device.config_devices:
if f"{self.list_device.whoami}" != f"{device['server']}":
print(f"Getting Id {device['id']} == Adresse: {device['address']} {device['port']}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put if DEBUG >= 2: for all these prints

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

self.discovery_timeout = discovery_timeout
self.update_interval = update_interval
self.device_capabilities = device_capabilities
self.known_peers: Dict[str, Tuple[PeerHandle, float, float]] = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this just be Dict[str, PeerHandle] now? I don't think we need the times any more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about that, for now I simplified the discovery with one function that input all the other devices on startup.
But I remove disconnection, ... etc for debug purpose.

Frankly it is your call, same for the GRPC problem.
Do you have some discord time to help me on :
is_connected = await new_peer_handle.is_connected()
health_ok = await new_peer_handle.health_check()
that are not working, please ?

Thanks in advance.
Best Regards.
Benjamin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure I can help. it's quite simple tho - only healthy peers should be in the topology

lipere123 and others added 5 commits October 14, 2024 09:30
…lements a manual discovery mechanism for devices in a network. The changes made seem to be related to debugging and logging.

Here are the key changes:

1. **Debugging statements**: In both files, `if DEBUG >= 2: print(...)` statements have been added to include more detailed debug information.
2. **Type conversions**: In the first file, type conversions from string to int or float have been removed (e.g., `(str(f"{device['server']}"))` becomes just `f"{device['server']}"`).
3. **Variable assignments**: In the second file, variable assignments have been simplified (e.g., `self._node_id = (str(f"{device['id']}"))` becomes `self._node_id = str(f"{device['id']}")`).

These changes suggest that the code is being refactored to improve debugging and logging capabilities. The removal of type conversions and simplification of variable assignments may indicate a shift towards using Python's dynamic typing system.

To write this code from scratch, you would need to:

1. Import necessary libraries (e.g., `socket`, `yaml`).
2. Define classes and functions for manual discovery and reading configuration files.
3. Implement the logic for discovering devices and reading their configurations.
4. Add debugging statements as needed to include more detailed information.

Here's a simplified example of how you might implement the first file:
```python
import socket

class ManualDiscovery:
    def __init__(self):
        self.known_peers = {}

    async def discover_peers(self, device):
        # Implement logic for discovering peers here
        pass

    async def create_peer_handle(self, peer_id, host, capabilities):
        # Implement logic for creating a peer handle here
        pass

# Usage example:
manual_discovery = ManualDiscovery()
device = {'id': '123', 'address': '192.168.1.100', 'port': 8080}
await manual_discovery.discover_peers(device)
```
And here's a simplified example of how you might implement the second file:
```python
import yaml

class ReadManualConfig:
    def __init__(self):
        self._config_devices = []

    async def read_config(self, config_file):
        # Implement logic for reading configuration files here
        pass

# Usage example:
read_manual_config = ReadManualConfig()
await read_manual_config.read_config('config.yaml')
```
Note that these examples are highly simplified and do not include the actual implementation details. You would need to consult the original code and documentation to understand the specific requirements and logic involved.
…ionary assignments to improve code readability
if args.discovery_module == "manual":
if args.discovery_config is None:
raise ValueError("--discovery-config is necessary when using --discovery-module manual")
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant else

args.node_host = list_device.node_host
args.node_port = list_device.node_port
args.wait_for_peers = 1
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why else? can't the code here also run for manual discovery?

self.discovery_timeout = discovery_timeout
self.update_interval = update_interval
self.device_capabilities = device_capabilities
self.known_peers: Dict[str, Tuple[PeerHandle, float, float]] = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure I can help. it's quite simple tho - only healthy peers should be in the topology

import socket
import yaml

class ReadManualConfig():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont understand what you mean

if peer_id not in self.known_peers or self.known_peers[peer_id][0].addr() != f"{peer_host}:{peer_port}":
new_peer_handle = self.create_peer_handle(peer_id, f"{peer_host}:{peer_port}", self.list_device.device_capabilities((str((f"{device['server']}")))))
try:
is_connected = await new_peer_handle.is_connected()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to check is_connected. just health check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants