Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative mesh routing protocol options #57

Open
X-Ryl669 opened this issue Feb 27, 2020 · 56 comments
Open

Alternative mesh routing protocol options #57

X-Ryl669 opened this issue Feb 27, 2020 · 56 comments

Comments

@X-Ryl669
Copy link

X-Ryl669 commented Feb 27, 2020

If I understand your wiki completely, you're only dealing with 2 level of abstraction for the routing table.
As I understand it, it almost impossible to reach a node in a large network (let's say on the 120-th hop), because:

  1. The origin node will never be aware of it
  2. The limited routing table size can not grow exponentially.
  3. The metric are not static, it can fluctuate depending on the moon or the position of the node or a car passing by and so on. Thus, sometimes packet will flow into the network correctly, sometimes they'll just vanish.

Typically, let's say you have this topology:

  [A] => [B] => [C] => [D] => [E] => [F]
                    \=> [E]=> [F]

If node A wants to talk to node F, then it must know about it (this is impossible to solve by storing all possible nodes's MAC, BTW).
Let's say that it has learnt about it previously, and now wants to talk to it.
It sends a packet with DEST=F and in its routing table, it needs to go via NEXTHOP = B.

Similarly, B needs to have it in its routing table also (how could that possible scale if the number of node is very high ?).
Let's say B is Rainman and remembers about all possible nodes.

When B receives it, it'll have to make a choice between C and E for its nexthop.
The metric for C might be better because C is closer to B, even if there is an additional node D in the chain. So it chooses C. The conditions are bad, and the packet struggles to follow the chain that goes via D to reach E (or F) immediately.

After some time, A would resend the packet to F, and by the time, the metric in for F would increase so that B select E.
D finally wakes up and send the packet to E and finally F.

Then F can/will receive 2 packets (one coming from D=>E, another from E).
How can it replies ?

Now imagine that F moves closer to B, so that B could speak to it directly. Who is going to remove the old routing table value in C and D ? What it a packet is being sent to C for F and D is currently removing its entry for F in its routing table before C removed its own version ? In that case, the packet will be lost at D (when C sends it to D for F) and no feedback is sent to A about this packet loss.
This is going very bad on the global network consumption because, to deal with this kind of issues, A must resend the packet to ensure it'll arrive at its destination (wasting the global network resources) or any intermediary layer before F must answer to A somehow to tell it they've failed delivery (which could even be worse since a single packet could create a spread of packets back to the origin, if D tells A it failed, then C tells A it failed, and so on).

The issue described above is not a theory, it's the main routing issue we observe on internet. I'm not even speaking of bad route or packet loss here.

As I understand it, any approach based on storing a dictionary with all the possible participant will fail since it grows exponentially in memory and resources consumption (image the above example if all nodes could see F except A).

  1. We need some kind of path detection algorithm to figure out the best route to a destination.
  2. We need some kind of logarithm abstraction somewhere to break the exponential grow (something like a tree or a forest algorithm) required by the current algorithm.
  3. We need a way to avoid dual path sending (because it'll be almost impossible to keep the duty cycle to 1% if all paths starts sending the same packets twice)
@Head2Wind
Copy link

Head2Wind commented Feb 27, 2020 via email

@randomodbuild
Copy link

Sorry I don't have much experience in this regard, but what about IPv6 on this? Maybe Yggdrasil (like cjdns). This maybe totally unrelated and not even possible, just throwing ideas around.

@Juul
Copy link
Member

Juul commented Feb 28, 2020

@Head2Wind @randomodbuild The maximum LoRa packet size is not large enough for even the smallest IPv6 packet. Currently babel uses IPv6. I'm not convinced that babel is appropriate for disaster.radio but it might be an interesting experiment if someone modified it to work with the smaller packet size.

@X-Ryl669

As I understand it, any approach based on storing a dictionary with all the possible participant will fail since it grows exponentially in memory and resources consumption

Do you have an example of a mesh protocol that does not store an entry for each participant? I'm certain the current protocol can be greatly improved and I'm interested in any previous work in this area. My only experience is with WiFi mesh protocols such as babel, batman-adv and OLSR and they all store entries for every node in the mesh.

@X-Ryl669
Copy link
Author

Internet ? (with all its protocols, ICMP / Spanning Tree Protocol / BGP).
The way it works is because it's organized logarithmically (typically, you only need to store the base path to one one and when you need to send a packet to a node, you'll send it to this node, it's up to them to route the packet correctly)
For example, if Bob (123.1.2.3) need to send a packet to Alice (65.6.4.2), one just need to know that 65.255.255.255 is handled by Eve, and Eve only need to know that 65.6.255.255 is handled by John and so on.

@mwarning
Copy link

mwarning commented Feb 28, 2020

@X-Ryl669 the limitations you describe is the same basic limitation that babel, batman-adv, olsr and other mobile ad-hoc mesh routing protocols have. But it is much more severe as LoRa allows only very small amounts of data to be transmitted. DNA seems to transmit complete routing tables (up to 30 routing entries in one transmission) and data packets also serve a hello packets.

There are approaches that do not need to transfer all MAC addresses, but they have other drawbacks or have not left research (yet).

Yggdrasil uses an approach using a spanning tree (a name-dependent routing layer) and a DHT on top to map MAC addresses to locations on the spanning tree (aka name-independent routing).
This has great scalability possibilities on the cost of only using paths on the spanning tree, but not all paths on a real mesh structure.

There are other approaches that have not been tested in the real world...

@samuk
Copy link
Collaborator

samuk commented Feb 28, 2020

If we can assume that something like 50% of nodes has access to GPS, either on-board or via the Bluetooth app, could we do something with that?

The nodes just worry about accurate routing within +/- 0.5 degrees lat/long of their location, and aside from that they just send the packet closer to its grid location, towards nodes that will have a more accurate idea of where it is.

@mwarning
Copy link

@samuk what you propose is geographic routing. The general problem with this is that you will have local minima where packets get stuck (like in a one way street). The usual approach to "fix" this problem is to use a fallback-algorithm to get unstuck (e.g. face routing). I think this is not reliable and does not always avoid loops.

Anyway, you do not need GPS. Some papers suggest randomly selected "light tower" nodes to form a virtual coordinate system. But imho, you can create virtual coordinates even without (e.g. based on Vivaldi Network Coordinates).

@samuk
Copy link
Collaborator

samuk commented Feb 29, 2020

Thanks, I'd probably heard the idea somewhere but didn't realise geographic routing was a well-established concept.
Even if it has a routing void problem, maybe it's useful to explore geographic routing further as it does seem to solve the scaling problem? Perhaps there's something useful in this geographic/ fuzzy logic approach?

I also spotted this 2015 paper which suggests Cost to the Progress Ratio Routing might be worth a look.

@X-Ryl669 Our current understanding is that there is a 10% duty cycle in the EU, is this not correct?

@emunicio
Copy link

emunicio commented Mar 2, 2020

@Head2Wind @randomodbuild The maximum LoRa packet size is not large enough for even the smallest IPv6 packet. Currently babel uses IPv6. I'm not convinced that babel is appropriate for disaster.radio but it might be an interesting experiment if someone modified it to work with the smaller packet size.

@X-Ryl669

As I understand it, any approach based on storing a dictionary with all the possible participant will fail since it grows exponentially in memory and resources consumption

Do you have an example of a mesh protocol that does not store an entry for each participant? I'm certain the current protocol can be greatly improved and I'm interested in any previous work in this area. My only experience is with WiFi mesh protocols such as babel, batman-adv and OLSR and they all store entries for every node in the mesh.

Hi, What about using 6LoWPAN Header Compression to reduce the packet size?
And similarly, the RPL protocol for 6LoWPAN networks is virtual coordinate routing protocol (concretely a gradient-based routing protocol). I think it could solve the problem of having a next hop for every node in the network and could scale better.

@Head2Wind
Copy link

Head2Wind commented Mar 2, 2020 via email

@mwarning
Copy link

mwarning commented Mar 2, 2020

From what I read about 6LoWPAN, it looks like it does not support meshes with loops. It forms a tree and the root is one edge router/gateway.
Maybe someone has a good source to research 6LoWPAN...

@X-Ryl669
Copy link
Author

X-Ryl669 commented Mar 2, 2020

Pros of 6LoWPAN:

  • IPV6 Header compression down to 2 bytes in a single neighbor node
  • Easy to use for connecting to internet (so we need to deal up to the first gateway that's connected to internet or any router)
  • Quite old and stable

Pros of RPL protocol:

  • Field tested
  • Seems reliable

Pros of LRP protocol for mesh routing:

  • Minimal traffic
  • Possible to search for a node (no need to store the complete network topology in the nodes)
  • No overload in any packet (works with existing data packets)
  • No routing loops
  • Low memory footprint

Cons of 6LoWPAN:

  • Implementation is a bit large and complex (but ESP32 is able to deal with it easily)
  • Only one of the layer of the network stack, still requires DNS/HTTP... (again available on ESP32 easily).
  • Typically made for simple UDP protocol on top.

Cons of RPL protocol:

  • Very very complex (159 pages + 4 RFC)
  • Requires large packets for multi layer mesh network.
  • Illogically though, information is always missing so need a lot of caching/memory in nodes
  • High network usage for routing packet exchange

Cons of LRP protocol:

  • Could create functional but non optimal routes

If you read french, this is a very good document comparing few mesh protocols. For english speakers, see here for presentation of protocol and a reference source code for ContikiOS

@paidforby paidforby changed the title Not suitable for large network Alternative mesh routing protocol options Mar 14, 2020
@cyclomies
Copy link

cyclomies commented Apr 16, 2020

The team has done a nice job building the existing LL2 protocol.

What about not sending/shearing routing tables at all, and use packets header information instead? All nodes would be able to update the routing tables by reading the headers of all "heard" packets. This would keep the routing tables updated, even if changes in the mesh were fast (compared to periodic table updates).

The header could look like this, name (byte):
ttl (1), source_node (4), destination_node (4), sender_node (4), prev_node (4), next_node (4), seq_num (1)

If used 4 byte names, the header would be in total of 22 bytes, for 3 byte names 18 bytes, and for 2 byte names 12 bytes in total. It is a increase for the header, compared to the current LL2 header, but there would be no need for forwarding costly routing tables.

The header is always added, and updated while forwarding, to all packets sent. The nodes would then update their neighbors and routing tables, according to the information they receive from the headers they "hear" (not only from packets they are the recipient or intended forwarder).

As a packet is received, the node sends an ACK to the source_node. As the messages and ACK's are passed, the nodes forwarding, or "listenig", can update the total hop counts to other nodes, and thereby count the metrics.

Reducing airtime by:

  • No control messages nor discovery packages
  • No beacons, alive, or hello messages
  • No routing tables sent
  • Only messages and ACK's sent
  • Only implicit ACK's on forwarding ("listenig")
  • Route info carried within packet headers

If all control packets are discarded, from the protocol, then how do we begin messaging if no routes exist? I do suggest flooding as a failower method:

--> Flood (broadcast) if no routes
--> ACK when destination node is reached (stops flooding/routing)
--> Send ACK to the sender
--> Learn routes: update neighbors and 2-hop routing tables by listening
--> Start routing per routing tables (unicast by hop&metric values)
--> Flood if no route found (failower)

Small mesh 1-2 hop networks should shift to routing within first packets sent on the network. Larger networks might sift to routing by "zones", as 1-2 hop nodes are quickly learned, and the routes to the nodes on edges might take somewhat longer to learn.

Some flood optimizations to consider:

  • ACK immediately when msg rcw'd (stop reflooding)
  • Reflood weak signals "immediately" (wait for a possibe ACK)
  • Reflood intermediate signals if no weak's reflooded?
  • Reflood not strong signals (assume unnecessary)?
  • RSSI overlap on reflooding to minimize gaps
  • Transfer to routing if a fresh route exists within 2 hop range?

Some routing optimizations to consider:

  • ACK if message RCW'd (stop forwarding)
  • Failback to flood if no route
  • Implicit ACK's of forwards
  • Update routes from all heard packets
  • Reforward packet, if implicitely heard no ACK after a forward of a neighbor node (reinforcing TX)

Any comments?

@paidforby
Copy link
Contributor

The team has done a nice job building the existing LL2 protocol.

Thanks!

What about not sending/shearing routing tables at all, and use packets header information instead? All nodes would be able to update the routing tables by reading the headers of all "heard" packets. This would keep the routing tables updated, even if changes in the mesh were fast (compared to periodic table updates).

I've wanted to roll a portion of routing table maintenance into normal operation and I agree that doing so makes sense.

The header could look like this, name (byte):
ttl (1), source_node (4), destination_node (4), sender_node (4), prev_node (4), next_node (4), seq_num (1)

To make sure I'm understanding this correctly, if you have the simple network,
A -> B -> C -> D -> F

Then a example header for a message sent A to F at the C to D hop would read,

ttl, address of A, address of F, address of C, address of B, address of D, seq_number

This isn't so different from the existing packet/datagram combined header, it just adds two addresses, the source node and the previous node. However, whose sequence number is in the packet? How does the receiving node calculate a metric for three hops with only one sequence number?

Another issue is that, like MAC headers, I would want packet headers to only be used node-to-node. After one hop the packet header is stripped off and rewritten for the next hop. Only the datagram stays the same over the course of the route. So I would rather expand the datagram, like so,

ttl (1), sender_node (4), next_node (4), seq_num (1), destination_node (4), source_node (4), source_seq_num (1)

Notice that I added the sequence number of the source node. This would allow the receiving node to calculate the metric for the source node as well as the sender node. I'm still not sure this would work unless each source-destination pair has their own sequence number, though this adds another layer of complication where you are actually calculating the metrics of routes instead of destinations.

The only way you could reasonably include the prev_node is if you also provided a sequence number from that node. Otherwise you know nothing more than that you have received some packets from that intermediate node.

In this design, every node would need to calculate metrics for every route on it's own, rather than nodes sharing metrics with one another. You also don't have a good way of knowing how many hops you are from the source node without adding a hop counter byte to the datagram.

My main question is, how does a node make routing decisions? If a node has multiple routes to the same destination, how does the node decide now which route to use? If it's only based on metrics. How do you calculate accurate metrics for all destinations with limited knowledge of the route to get to them?

If used 4 byte names, the header would be in total of 22 bytes, for 3 byte names 18 bytes, and for 2 byte names 12 bytes in total. It is a increase for the header, compared to the current LL2 header, but there would be no need for forwarding costly routing tables.

This gives me the idea that we could provide the ability to choose how large you want your address to be? Might be a cool option. So if you want a small, high-efficiency network, you could choose 2 byte address, but then you would have to manually set node addresses because there are only 254 usable addresses. Or if you are deploying a larger, more dynamic network, you could use 4 byte addresses.

As a packet is received, the node sends an ACK to the source_node. As the messages and ACK's are passed, the nodes forwarding, or "listenig", can update the total hop counts to other nodes, and thereby count the metrics.

I'm hesitant about adding ACKs because of the limited duty cycle and introducing too much noise to networks.

  • No beacons, alive, or hello messages

I think we would still want occasional "i'm alive" messages sent when the network is quiet so that routes don't get stale if a node is not used for a while.

Small mesh 1-2 hop networks should shift to routing within first packets sent on the network. Larger networks might sift to routing by "zones", as 1-2 hop nodes are quickly learned, and the routes to the nodes on edges might take somewhat longer to learn.

The idea of routing by zones is interesting to me. Could be a good line of thought to investigate further. I have a hard time squaring it with the dynamic, ad-hoc design goal of LL2, but maybe I just need to think about it more deeply.

I like some of your ideas. It almost sounds like a source specific protocol. It may be worth simulating this to see how a larger network would deal with this alternative method of routing table management.

Any comments?

See above :)

@cyclomies
Copy link

cyclomies commented Apr 16, 2020

@paidforby Interesting notes!

What about not sending/shearing routing tables at all, and use packets header information instead? All nodes would be able to update the routing tables by reading the headers of all "heard" packets. This would keep the routing tables updated, even if changes in the mesh were fast (compared to periodic table updates).

I've wanted to roll a portion of routing table maintenance into normal operation and I agree that doing so makes sense.

The header could look like this, name (byte):
ttl (1), source_node (4), destination_node (4), sender_node (4), prev_node (4), next_node (4), seq_num (1)

To make sure I'm understanding this correctly, if you have the simple network,
A -> B -> C -> D -> F

Then a example header for a message sent A to F at the C to D hop would read,

ttl, address of A, address of F, address of C, address of B, address of D, seq_number

Correct.

This isn't so different from the existing packet/datagram combined header, it just adds two addresses, the source node and the previous node. However, whose sequence number is in the packet? How does the receiving node calculate a metric for three hops with only one sequence number?

My mistake - my seq_number refers to total hops from source_node, to current node. In your example, it would be 2 hops (B and C). Therefore all nodes hearing the packet, can count hops to neighbors, previous sender, and to the source. Whereas you refer, with sequence number, to a increasing packet resend number (identifying packet loss)?

We shall change, to avoid further mistakes: seq_num meaning times sent, and hop_count meaning total hop count from source?

Another issue is that, like MAC headers, I would want packet headers to only be used node-to-node. After one hop the packet header is stripped off and rewritten for the next hop. Only the datagram stays the same over the course of the route. So I would rather expand the datagram, like so,

The headers should always be recreated or updated, as you have to update the sender_node, prev_node, next_node, and hop counter from source. Where:

  • ttl = max hops (or something else)
  • source_node = packet orgin
  • destination_node = end target
  • sender node = current forwarder (or source)
  • prev_node = node before sender (or source)
  • next_node = intended forwarder (or destination)
  • hop_count = total hop count from source
  • seq_num = how many times sent (packet loss)

The hop count could be counted from the ttl, if the ttl was fixed. This would limit the options, and thereby I suggest we keep the ttl and hop count separate.

ttl (1), sender_node (4), next_node (4), seq_num (1), destination_node (4), source_node (4), source_seq_num (1)

Notice that I added the sequence number of the source node. This would allow the receiving node to calculate the metric for the source node as well as the sender node. I'm still not sure this would work unless each source-destination pair has their own sequence number, though this adds another layer of complication where you are actually calculating the metrics of routes instead of destinations.

I believe this would work for updating headers:

  • ttl (update by X-1)
  • source_node (no update)
  • destination_node (no update)
  • sender_node (update to source or current forwarder)
  • prev_node (update to source or the node before sender)
  • next_node (update: destination or the next forwarder),
  • hop_count (update by +1, to total hops from source)
  • seq_num (source updates: total retry number from source)

The only way you could reasonably include the prev_node is if you also provided a sequence number from that node. Otherwise you know nothing more than that you have received some packets from that intermediate node.

In this design, every node would need to calculate metrics for every route on it's own, rather than nodes sharing metrics with one another. You also don't have a good way of knowing how many hops you are from the source node without adding a hop counter byte to the datagram.

See above (better description of hop counter vs. seq number).

My main question is, how does a node make routing decisions? If a node has multiple routes to the same destination, how does the node decide now which route to use? If it's only based on metrics. How do you calculate accurate metrics for all destinations with limited knowledge of the route to get to them?

The node makes decisions by:

  • am I the designed forwarder/destination
  • did I hear a ACK from the destination
  • will ttl be >0
  • if I have route to destination, update header and forward to next forwarder/destination
  • if I can't hear the intended next forwarder to forward (implicit ack by listening), then flood (or use another route to destination, if many routes are maintained)
  • if no route, flood until destination send ACK or ttl is 0 (reset the ttl, if starting to flood)
  • if I am not the designated forwarder, but I can't hear the designated forwarder forward, although I can hear the forward request, and I have a fresh route to the source, then I do the forwarding (before the previous sender floods the packet)?

In this model, the route is defined by min hop count.

The routes are built upon information gathered from the headers: neighbours, 2 hop neighbours and hop count to different destinations. You get, as a forwarder/listener, routes, not only, for the sources (by parsing the header of the messages), and for the destinations (by parsing the header of the ACKs), but also routes to your neighbors and neighbors 2-hops away (to both directions). Therefore the routing tables are established quickly as messaging starts (boosted during flooding as the node can parse all headers it "hears").

As described, the ACK (from destination_node), keeps the user happy, and helps building routes. I think sequence metrics are not as important. This is due to the fact, that by this method, the source gets the ACK if message passes, and "listeners" can update their routes. Therefore the routes are kept fresh. Some randomness, as the delivery with the same hop count can be done in many ways, for routes can be actually good, to not to overload the only route?

If used 4 byte names, the header would be in total of 22 bytes, for 3 byte names 18 bytes, and for 2 byte names 12 bytes in total. It is a increase for the header, compared to the current LL2 header, but there would be no need for forwarding costly routing tables.

This gives me the idea that we could provide the ability to choose how large you want your address to be? Might be a cool option. So if you want a small, high-efficiency network, you could choose 2 byte address, but then you would have to manually set node addresses because there are only 254 usable addresses. Or if you are deploying a larger, more dynamic network, you could use 4 byte addresses.

This is a good idea. In my opinion, all the settings/values regarding the protocol should be changeable by the user. This way the protocol could be used within many different applications. Off course, "best" default values should be in place.

As a packet is received, the node sends an ACK to the source_node. As the messages and ACK's are passed, the nodes forwarding, or "listenig", can update the total hop counts to other nodes, and thereby count the metrics.

I'm hesitant about adding ACKs because of the limited duty cycle and introducing too much noise to networks.

In this scenario, ACKs deliver great value, as they're carrying information of a) routing information to nodes (hops to destination, neighbours and 2-hop information), and b) users can confirm the message has been delivered.

There could be a setting to enable or disable automatic ACK from LL2?

  • No beacons, alive, or hello messages

I think we would still want occasional "i'm alive" messages sent when the network is quiet so that routes don't get stale if a node is not used for a while.

All messages are beacons. Therefore, a GPS update packet would keep the routes alive. If routes get stale, the message is flooded (to learn the routes again).

Could this route_delete_delay also be changeable by the user? As I tend to think there might be quite static networks, and even users wanting to have some of their nodes in a passive (listen only) mode, for saving battery.

There could also be a beacon_interval setting for time (and change of position)?

Small mesh 1-2 hop networks should shift to routing within first packets sent on the network. Larger networks might sift to routing by "zones", as 1-2 hop nodes are quickly learned, and the routes to the nodes on edges might take somewhat longer to learn.

The idea of routing by zones is interesting to me. Could be a good line of thought to investigate further. I have a hard time squaring it with the dynamic, ad-hoc design goal of LL2, but maybe I just need to think about it more deeply.

By zones, I tried to describe the differences formed within a real world wireless mesh: nearby nodes tend to hear lots of neighboring traffic (nodes within their range), and thereby build more accurate routing tables of their neighbors and their neighbors, whereas two nodes many hops away, tend to build good routing tables slower (as no messages might not been sent from one end of the mesh to another one). This causes a situation, especially in the beginning or during fast changes of the mesh, where nearby packets are routed, whereas packets to nodes far away are flooded (as no routes have been established yet). This behavior is ok and expected.

@paidforby
Copy link
Contributor

Very interesting. Thanks for clarifying many of your points. It actually sounds very similar to a dynamic source routing protocol, which was also suggested by @geeksville here sudomesh/LoRaLayer2#10 (comment) and here #52 (comment). I'm going to think more about this DSR idea, i think it may be a good one. Also, this would give me a reason to revisit the simulator and sync it up with all the changes to LL2. I'll keep y'all posted if I make any progress toward a DSR fork.

@cyclomies
Copy link

cyclomies commented Apr 17, 2020

Very interesting. Thanks for clarifying many of your points. It actually sounds very similar to a dynamic source routing protocol, which was also suggested by @geeksville here sudomesh/LoRaLayer2#10 (comment) and here #52 (comment). I'm going to think more about this DSR idea, i think it may be a good one. Also, this would give me a reason to revisit the simulator and sync it up with all the changes to LL2. I'll keep y'all posted if I make any progress toward a DSR fork.

You might find these findings interesting (broadcasting, routing, zero-control-packet, and reactive & pro-active protocols in mesh networking):

Something similar could be done within LL2?

  • broadcast-->learn-->route-->broadcast (if no route)-->learn more..

Why only links refering to GoTenna? Because there have not so far been any other well known options..

@paidforby
Copy link
Contributor

Cool stuff, too bad their code isn't open source.

I started working on a more DSR-like fork of the LL2 protocol here, https://github.com/sudomesh/LoRaLayer2/tree/DSR

My thinking, and what I believe @cyclomies is suggesting, is to compress the routing table into the LL2 packet header.

The new packet struct looks like this,

struct Packet {
    uint8_t ttl;
    uint8_t totalLength;
    uint8_t sender[ADDR_LENGTH];
    uint8_t receiver[ADDR_LENGTH];
    uint8_t sequence; // message count of packets tx'd by sender
    uint8_t source[ADDR_LENGTH]; // original sender of datagram
    uint8_t hopCount; // start 0, incremented with each retransmit
    uint8_t metric; // of source-sender link
    Datagram datagram;
};

The datagram struct remains the same, but is now defined in the LL2 code,

struct Datagram {
    uint8_t destination[ADDR_LENGTH];
    uint8_t type;
    uint8_t message[MESSAGE_LENGTH];
};

To initialize the routing process, nodes must begin by transmitting packets with the broadcast address as the destination and receiver.

Upon receiving a broadcast packet, neighbors inspect the header to see who was the sender, they then add the sender address to their routing table with a hop distance of 1 and a metric based on the sequence number.

Once nodes have neighbors, they can begin transmitting packets with a neighbor address as both destination and receiver.

Neighbor nodes then listen to all packets that are transmitted. Upon receiving they inspect the packet for three items,

  • sender address + sequence number - to update neighbor table and neighbor route metric
  • receiver address - to add as a possible route, two hops away via the sender address with unknown metric
  • source address + hopCount + metric - add as known route via the sender address, distance or hopCount+1 away, with a metric that is the weighted average of the metric for the source-sender link (included in packet) and the metric for sender-receiver link (calculated previously from sequence number).

Then after sending packets to all neighbors, a node can begin attempting to sending packets with a two hop destination via the neighbor. This will then allow multi-hop routes two start developing metrics based on the metrics of each hop.

The internal routing table management is almost exactly the same. which means that it is still very compatible with the original table sharing protocol. Really the only difference is that instead of sending routing tables in a single, large packet, routing table entries are sent one at a time inside of the header of every packet.

This should work well. Though, I will still provide the option to enable control messages that share the entire routing table, such that a network could converge faster at the expense of airtime and increased noise.

My next step is to port LL2 back to the simulator to test the validity of this protocol. And to write up full documentation of this modification on the wiki.

@cyclomies
Copy link

cyclomies commented Apr 22, 2020

Cool stuff, too bad their code isn't open source.

Totally agree, and empathizes the value of this project!

I started working on a more DSR-like fork of the LL2 protocol here, https://github.com/sudomesh/LoRaLayer2/tree/DSR

My thinking, and what I believe @cyclomies is suggesting, is to compress the routing table into the LL2 packet header.

Correct. Sorry for my unclear text.

The new packet struct looks like this,

struct Packet {
    uint8_t ttl;
    uint8_t totalLength;
    uint8_t sender[ADDR_LENGTH];
    uint8_t receiver[ADDR_LENGTH];
    uint8_t sequence; // message count of packets tx'd by sender
    uint8_t source[ADDR_LENGTH]; // original sender of datagram
    uint8_t hopCount; // start 0, incremented with each retransmit
    uint8_t metric; // of source-sender link
    Datagram datagram;
};

I would suggest to add previous_sender (one sender before the last sender). This way a node could, not only get the last sender, but also the sender before the last sender. As such, the information of the header would let the node build a routing table (and hop counts..) of its neighbors, its neighbors neighbors, and routes to sources.

The datagram struct remains the same, but is now defined in the LL2 code,

struct Datagram {
    uint8_t destination[ADDR_LENGTH];
    uint8_t type;
    uint8_t message[MESSAGE_LENGTH];
};

To initialize the routing process, nodes must begin by transmitting packets with the broadcast address as the destination and receiver.

Destination address could be the node we are looking for?

Upon receiving a broadcast packet, neighbors inspect the header to see who was the sender, they then add the sender address to their routing table with a hop distance of 1 and a metric based on the sequence number.

..and if we had previous_sender (sender before the last sender), we could add 2 hop destinations to the table.

Once nodes have neighbors, they can begin transmitting packets with a neighbor address as both destination and receiver.

..and to nodes two hops away, if previous_sender would be added in the header, and node databases would be updated corresponding.

Neighbor nodes then listen to all packets that are transmitted. Upon receiving they inspect the packet for three items,

I suggest the nodes would use all packets they overhear for building the node database, but process only messages they are the intended sender or destination.

  • sender address + sequence number - to update neighbor table and neighbor route metric
  • receiver address - to add as a possible route, two hops away via the sender address with unknown metric
  • source address + hopCount + metric - add as known route via the sender address, distance or hopCount+1 away, with a metric that is the weighted average of the metric for the source-sender link (included in packet) and the metric for sender-receiver link (calculated previously from sequence number).

Correct. And if adding the previous_sender, we would learn the routes within our neighborhood (1-2 hop away) fast, as we overhear nodes transmitting around us.

Then after sending packets to all neighbors, a node can begin attempting to sending packets with a two hop destination via the neighbor. This will then allow multi-hop routes two start developing metrics based on the metrics of each hop.

And allow the node to route packets to nodes, that it has been heard as sources (as it knows the starting path and the hop count). If, sorry for repeating, we had previous_sender information received, we could route already to nodes three hops away. This would expand the the coverage for routing a lot, without a too large header.

The internal routing table management is almost exactly the same. which means that it is still very compatible with the original table sharing protocol. Really the only difference is that instead of sending routing tables in a single, large packet, routing table entries are sent one at a time inside of the header of every packet.

This should work well. Though, I will still provide the option to enable control messages that share the entire routing table, such that a network could converge faster at the expense of airtime and increased noise.

This would be a good option, especially used on static nodes and when the network has been quiet (otherwise, we might start to be unsure of our old routes?).

I do believe this could be left unused, if the intended destination nodes would send ACKs of recieved packets. This would, not only let the sender know the packet was received, but also update node databases for packets coming from the direction of the last destination (=learn routes with fewer hops to the destinations).

Furthermore, previous_node information could be used to determine shorter routes to nodes: if we overhear a packet sent from previous_node and hear the packet of the sender, we could correct our route for the source node to go straight through previous_node.

By adding this, we could always, when overhearing a packet, update our routing table for source, sender, and previous_sender. If ACKs where integrated, we would also learn the routes towards the destinations (as we would soon hear the ACK).

In a small 6-10 node max 4 hop network, we could learn the whole mesh maybe with 1-2 broadcast messages and 1-3 unicast messages (including the ACKs), as all packets reveals routes to three nodes.

My next step is to port LL2 back to the simulator to test the validity of this protocol. And to write up full documentation of this modification on the wiki.

Good luck!

@paidforby
Copy link
Contributor

I think my issue with adding a previous_sender address is that it is another address of uncertain quality (the receiver address also being uncertain until you actually hear from it) and does not contain useful information until you have a five node (four hop) wide network. In a four node (three hop) wide network, the previous_sender would always be either empty or the source. Could be a useful option still. I will look into testing it in the simulator.

I suggest the nodes would use all packets they overhear for building the node database, but process only messages they are the intended sender or destination.

This is currently how I've written the protocol. It always parses the header for routes, but it only retransmits the packet if the receiver address matches the nodes local address and it only passes the packet to Layer3 if it if the destination matches your nodes local address or the broadcast address. See the receive() function through which all incoming packets are processed.

By adding this, we could always, when overhearing a packet, update our routing table for source, sender, and previous_sender. If ACKs where integrated, we would also learn the routes towards the destinations (as we would soon hear the ACK).

Good point, I understand what you are suggesting. Right now we are working under the idea of manual ACKing by the user of a node. Maybe could make automatic ACKs with an optional feature. I definitely see the value of ACKing with this style of protocol.

I will play around with these ideas in the simulator soon.

@paidforby
Copy link
Contributor

Also if anyone is interested in playing with the this fork of the protocol, I've synced up the 1.0.0-rc.2 branch with the LL2 DSR branch.

@cyclomies
Copy link

I think my issue with adding a previous_sender address is that it is another address of uncertain quality (the receiver address also being uncertain until you actually hear from it) and does not contain useful information until you have a five node (four hop) wide network. In a four node (three hop) wide network, the previous_sender would always be either empty or the source. Could be a useful option still. I will look into testing it in the simulator.

You're right on point.

Previous_sender should help with larger mesh networks. There is, according to my understanding, one more benefit of that feature: if, and when, node A overhears several nodes (B-D) routing packets to B from source Z, A can recognize a shorter route to E trough D (previous_sender for packet forwarded by D), even if E was not the source. previous_sender should not left empty, but marked as source (hop counting logic for finding shortest route should be easier to do?).

This could not only shorten hop count of one node to another one (A-->E), but also optimize the whole "zone of overhead transmits".

I will play around with these ideas in the simulator soon.

I'm interested to hear your verdict of those simulations on both prev_sender and ACKs. Option for manual and automated ACKs sounds great (automated for one to one messaging, and reducing traffic by manual ACK for one to all room messages?).

A video/animation of the simulations could be fun to watch.

@samuk
Copy link
Collaborator

samuk commented Apr 25, 2020

Cool stuff, too bad their code isn't open source.

I started working on a more DSR-like fork of the LL2 protocol here, https://github.com/sudomesh/LoRaLayer2/tree/DSR

@geeksville did you have a chance to check out the DSR fork? Might it meet your needs?

@cyclomies
Copy link

cyclomies commented Apr 28, 2020

@paidforby, did you find time for those simulations?

@paidforby
Copy link
Contributor

Not yet, I became incredibly distracted with updating the the simulator to make it more usable (namely introducing another websocket server at Layer 3 so I can send messages using the real web app). I should be able to test in another day or two.

@geeksville
Copy link

@samuk Alas I have not, I'll definitely check it out in the next couple of weeks. I still might end up doing my own DSR, based on cribbing from radiohead's version and RFC 4728 (because it is super mature and well studied). The radiohead guy seems to have just taken that RFC and only implemented the fourish minimum requirements. I'd love to share an implementation with ya'll but I also kinda love protocol buffers for this sort of stuff. If curious my current work queue is here.

@samuk
Copy link
Collaborator

samuk commented May 14, 2020

@jacquesalbert "LOCALLY, but all addresses were hierarchical enough to have another layer of routing that is roughly geographic? Spitballing: using some kind of Maidenhead-style address (I was thinking instead of alphanumeric, it'd just be a sort of lat/long based limited quadtree or something"

Sounds interesting, I think I understand this concept, let me restate to see.

So we add eight numbers, and two +/- symbols to give latitude/ longitude to one decimal place. This gives us 6.9mile squares covering the earth. We also add one (b)order flag to each packet. So this would cost 11 characters per packet?

When I add my first node in a network I either manually enter the lat/long, or the GPS does it. For this theoretical network, it is 35.1, -121.5

Each additional node brought into the network listens for 100 packets and uses the most frequent Lat/long it hears as it's 'main' network (eg 35.1, -121.5) If it hears additional lat/long then it also acts as a 'border node'.

Routing within the 35.1, -121.5 square happens in the reactive way outlined by Grant above.

If a 'normal node' receives a message originating outside of its main lat/long square (eg 35.1,-121.4) it just ignores/drops it.

If a Border node receives a message originating in 35.1,-121.4 that has an unset (b)order flag it edits the packet to 35.1,-121.5 and sets the (b) flag before forwarding.

Each packet should therefore only cross a geographic border once? Writing this I'm still not sure I understand it. Is that kind of what you meant?

@paidforby
Copy link
Contributor

Can you specify more clearly, how the nodes start to operate when cold booting (no routing tables) the mesh? From the wiki: "WHen a node joins a LoRa mesh network it can begin sending messages to it's neighbors immeadiately, but it has to wait to hear messages sent to nodes more than 1 hop away before it can begin routing outside of it's immediate neighbors."

If a node does not know a route to a specific destination, shouldn't the node broadcast the message? The broadcast is changed to unicast (routing), when the broadcast reaches a node with sufficient routing information? Those nodes that "hear" the change, stop broadcasting?

@cyclomies to clarify on the cold boot scenario. Since this is reactive routing there is no automated process to build routes (though I suppose one could be implemented). A user of a node must begin by broadcasting a message. Neighboring nodes hear the broadcast, add the sender as a route and then could then begin sending unicast messages to that sender. Once the original sender hears a broadcast or unicast message and adds routes, it can then also begin transmitting unicast messages.

By saying "sending messages to it's neighbors immeadiately" I mean broadcasting messages. Broadcast messages are defined as messages addressed to all neighbors (1 hop destinations).

Is there a need for sending the "via node" (neighbor) information? Could the table be sent only with hops, destination and metric?

You are correct, there is no need to send the "via node" info in the route table packet, in fact I was already doing it in the most efficient way possible, see https://github.com/sudomesh/disaster-radio/wiki/Protocol#routing-table-packets. Thanks for pointing that out, I would have wasted a lot of time, trying to rethink something that was already as minimal as possible.

In the wiki, should "from" be replaced by "to", for indicating routes to destinations? Just a minor thought.

I suppose this is a matter of perspective.

@jacquesalbert
Copy link

@samuk Sounds interesting, I think I understand this concept, let me restate to see.

You've got the right idea, but let me clarify:

So we add eight numbers, and two +/- symbols to give latitude/ longitude to one decimal place. This gives us 6.9mile squares covering the earth. We also add one (b)order flag to each packet. So this would cost 11 characters per packet?

I think a more efficient system might be a bitwise geocode. For example, alternating E/W-N/S bits for increasing precision. Maybe the first bit is basically 0 for the west half of the globe, and 1 for the east half. Then the next bit is 0 for the North half and 1 for the South half. Then the next bits are 0 /1 for the West/East half of the west half of the globe, and the same for the N/S half of the North half of the globe, and so on. Using this method my napkin math tells me we should be able to get a precision of less than half a mile with 16 bits each, so a single int. Additionally, we don't WANT or NEED that level of precision, so either we can use fewer bits and pack them in, or always use an int and mask or shift it. Regarding the border node tag, see below.

When I add my first node in a network I either manually enter the lat/long, or the GPS does it. For this theoretical network, it is 35.1, -121.5

Each additional node brought into the network listens for 100 packets and uses the most frequent Lat/long it hears as it's 'main' network (eg 35.1, -121.5) If it hears additional lat/long then it also acts as a 'border node'.

Routing within the 35.1, -121.5 square happens in the reactive way outlined by Grant above.

I think this (barring the caveat above about the format of the address) is a great idea, for new nodes to potentially "learn" where they are based on surrounding nodes if they have not already been given a location. Other than that, yeah. Any node that handles a packet with a matching geocode just routes it using the mesh routing tables as discussed above.

If a 'normal node' receives a message originating outside of its main lat/long square (eg 35.1,-121.4) it just ignores/drops it.

If a Border node receives a message originating in 35.1,-121.4 that has an unset (b)order flag it edits the packet to 35.1,-121.5 and sets the (b) flag before forwarding.

Each packet should therefore only cross a geographic border once? Writing this I'm still not sure I understand it. Is that kind of what you meant?

In the system I was imagining, packets might have to cross a geographic border any number of times, and therefore I think some additional logic is necessary, but I see where you are coming from with regard to the changing of the border flag. I'm thinking no border flag is necessary on the packet because we have the geocode already in the address. Here's an example:
GeoCodeHybridRouting

Nodes 1, 2, 3, and 4 are all in the same submesh (0000) so they all route between each other like normal. We need to find a way for the routing table to include nodes outside of the submesh, but ONLY if they are directly linked to nodes inside the submesh. Then, if the destination is in the submesh, find the next best submesh hop like normal, but if it's outside the submesh, find the next best submesh and THEN find the next best hop to the "intermediate" node in the table that is in that submesh.

I think choosing the right balance of submesh size (geocode precision) to routing table size would at least put a sort of limit on the total table size to the number of submesh nodes PLUS the number of border nodes in other submeshes. For low power mesh networks, this should be pretty low. I would think having a submesh size of something like a few miles might mean that most local comms within a town would be routed normally and georouting would be rare. I also suggested a Maidenhead style system of alternating lat/long with increasing precision because I have also wondered if there is a way to use a dynamic mask to allow submeshes to be "resized" in areas where that might be beneficial, since each node is handling this on a packet-by-packet basis instead of maintaining a full topology map outside of its submesh. After all, if there are only 5 nodes in a city, georouting is probably more of a detriment than a benefit, but if there are 5000, it might be a necessity. Alternately, if the submesh size is too small you might be able to detect somehow that you are running into the "dead end" problem and increase the size until you minimize that.

Regarding routing:
Okay, now imagine a packet is coming from node 1 (full geocoded address is 00001) and it is intended for node 9 (full address 10019). Node 1 knows that 10019 is not in its submesh (1001 != 0000) so instead of looking for 10019 in its routing table, it looks for any nodes that are NOT in 0000 and compares all of them geographically to pick the next best hop. So in this case node 1 might have node 5 in its table also. The geo distance between 0010 (the geocode portion of node 5's full address) and the destination of 1001 is less than the distance between 0000 and 1001 so this is a good candidate for the "intermediate destination" of node 5. Obviously, Node 1 does not see node 5 directly, so it picks the next hop for this packet using normal mesh routing as you guys have discussed previously, choosing node 3.

Now, node 3 does the same calculation. It has a packet destined for 1001, it is in 0000 so it looks for the best off-mesh link and finds 5. Normal mesh routing causes it to pick 2 or 4, let's say 2 has a better link quality so it chooses 2. Node 2 does the same thing and it has a direct link to 5 so it sends it to 5.

Node 5 has the same problem as node 3: it has been handed a packet with a destination not in its own submesh, so it follows the same process and picks either Node 7 or C. Both have the same geo distance, so it probably picks by the next hop link quality for each of those possibilities. Let's say C. C hands it to A, and node A sees that this packet is destined for the submesh that matches its own, so it just looks it up in the table like normal (ignoring any geo addressing and skipping those calculations).

@cyclomies
Copy link

cyclomies commented May 16, 2020

@cyclomies to clarify on the cold boot scenario. Since this is reactive routing there is no automated process to build routes (though I suppose one could be implemented). A user of a node must begin by broadcasting a message. Neighboring nodes hear the broadcast, add the sender as a route and then could then begin sending unicast messages to that sender. Once the original sender hears a broadcast or unicast message and adds routes, it can then also begin transmitting unicast messages.

By saying "sending messages to it's neighbors immeadiately" I mean broadcasting messages. Broadcast messages are defined as messages addressed to all neighbors (1 hop destinations).

@paidforby Understood, perfect. And if I understood correctly, broadcast messages will be re-broadcasted according to the TTL value the sender specify?

I suppose this is a matter of perspective.

Well put! ;)

To avoid collisions, there could be specific random delayed time slots for TX:

  • slot 1: 200-400ms ACK's
  • slot 2: 600-2000ms re-broadcasts
  • slot 3: 4000-8000ms routed packets and broadcasts

Slot 1 for ACK's of the intended receiver, as they will also end other nodes to continue broadcasting or routing to the intended recipient.

Slot 2 to prioritize re-broadcasted packets, as they are extremely handy for node discovery and updating node databases. Therefore, they will work for optimize routing.

Slot 3 for all routed packets and broadcasted packets from the originating node.

In short: messages (unicast or broadcast), and packets intended for routing, would be sent with a 4-8s delay. Any packets intended for re-broadcast will be sent before them, as they will most likely help in discovering new nodes and optimize the routes. ACK's are always sent "immediately", as they will end unnecessary broadcasting and unnecessary routing, and therefore free up airtime. ACK's would only be sent in slot 1 for 1-to-1 messaging only. Other ACK's would be sent in slot 3.

The ms used in slots are only examples, as I did not count the correct packet airtime or physical TX/RX delays. Those ms delays for slots should actually be specified with % (of max packet air time for 1 minute?), not hard coded, to work with different network settings, duty cycles and so on?

@cyclomies
Copy link

About those TX slots I mentioned: At first, I incorrectly thought, that the packets (with a maximum character length) should be possible to send, before the next TX slot. This is off course not the case. Sorry for the inconvenience!

Therefore, the TX timeslots could be as follows:

  • slot 1: 200-400ms (ACK's)
  • slot 2: 420-620ms (re-broadcasts)
  • slot 3: 640-840ms (all other packets)

As such, a routed packet would be sent in slot 3, if there are no other traffic on the air at that time. Moreover, if a broadcast is sent, and four nodes (including the intended recipient) hear the packet, the intended recipient node could send the ACK packet (slot 1) before any other node would have the time to re-broadcast (slot 2) the message. And as the ACK is heard by other nodes, they update their nodeDB and do not re-broadcast the message.

@samuk
Copy link
Collaborator

samuk commented May 21, 2020

I think a more efficient system might be a bitwise geocode. For example, alternating E/W-N/S bits for increasing precision. Maybe the first bit is basically 0 for the west half of the globe, and 1 for the east half.

@paidforby are you interested in this supplemental geo approach suggested by @jacquesalbert ? Realise it's solving problems we don't yet have, but a globally scaleable network does appeal.

@paidforby
Copy link
Contributor

I am interested in trying to implement it as optional feature. I need to think more deeply about how to actually integrate it into the LL2 library, not sure how soon i'll get to it. but I'll add it to my todo list.

@jacquesalbert
Copy link

jacquesalbert commented May 21, 2020

I am interested in trying to implement it as optional feature. I need to think more deeply about how to actually integrate it into the LL2 library, not sure how soon i'll get to it. but I'll add it to my todo list.

@paidforby I have been playing around with this idea a little bit with the help of the simulator and while I think that something like what I described above could be a very effective "best of both worlds" system, I have struggled a bit to fit all the pieces of the idea together in a clean way.

However, I think that the most effective way to get some mileage out of it DOES actually solve a current problem with the mesh: convergence time, especially when the mesh is less than active. By implementing geohashed addresses (almost for free, data-wise, since you already need a fairly long unique address) you can implement simple directional georouting as a fallback for when the routing table does not yet contain the destination, or when the routing table has reached some maximum size.

Basically, if a node needs to send a packet to something it doesn't have a route to, it just routes it to the node in its table that is the closest geographically to the destination.

@tgies
Copy link

tgies commented Jun 11, 2020

The S2 coordinate system might be useful for this geographic routing. Uses 64-bit coordinates which are the distance along a Hilbert curve superimposed on the globe, precision can be arbitrarily reduced, checking for containment of a point within a cell or adjacency of points are literally bitwise operations, etc. https://s2geometry.io/

@cyclomies
Copy link

@paidforby, Any major updates regarding LL2?

@paidforby
Copy link
Contributor

Nothing much. I started a new j.o.b. so I've had less time to work on this. Most recent updates involved supporting dual lora modules (which will be on the new disaster radio dev boards). Next, I plan on updating/improving the simulator, but who knows when that will actually get done. I've mostly stopped revising the routing protocol because it requires too much time/energy. I encourage others to try implementing the geographic routing, which I still find very desirable.

@samuk
Copy link
Collaborator

samuk commented Aug 16, 2021

I noticed that Reticulum has had a first release: https://unsigned.io/projects/reticulum/

There's a discussion of porting to C: markqvist/Reticulum#2 but no code yet. Does anyone have the skills/interest to help with this?

@X-Ryl669
Copy link
Author

If I understand Reticulum correctly, it'll not fit LoRA radio (unlike what they claims), mainly because it requires a MTU of 500 bytes (which can not be done reliably on LoRAWAN since it implies multiple packet other a short time, if you want to respect the law that's a no-no). Also, I don't understand how they deal with exponential routing table issue (I think they don't deal with it in any way), so it'll require a lot of memory on the client to store the routes & Link.

@mwarning
Copy link

Recticulum seems to send announcements as floods. This does not scale very well.

@markqvist
Copy link

Reticulum is perfectly well suited to LoRa, but probably not LoRaWAN (which is a bit purposeless any way). It is correct that Reticulum has a fixed MTU of 500 bytes, which LoRa can accomodate just fine, and without any reliability impact. LoRaWAN probably can't - I haven't actually made any attempts at this, so that is just my guess. In no jurisdiction is "multiple packets over a short time" viewed as a "no-no". That is not what duty cycle means ;)

Announcements are not floods in Reticulum. They are prioritised and forwarded in a way that scales quite well. Some more info: https://markqvist.github.io/Reticulum/manual/understanding.html#the-announce-mechanism-in-detail

@samuk
Copy link
Collaborator

samuk commented Aug 17, 2021

Thanks @markqvist what happens when the link table is full?

"Any node that forwards the link request will store a link id in it’s link table , along with the amount of hops the packet had taken when received. " https://markqvist.github.io/Reticulum/manual/understanding.html#link-establishment-in-detail

@markqvist
Copy link

Ideally, that should not happen. Just one megabyte of memory can accommodate more than 22.000 active links, so in most cases, this will not be a problem. Inactive links are of course culled from the table automatically.

@X-Ryl669
Copy link
Author

From that table:
image
There is not SF where LoRa can send a 500 byte packet. So Reticulum will have to split a "packet" on multiple LoRa packet.
In reality, in my area, only 125kHz uplink channel are allowed, and a maximum duty cycle of 10%, so it limits to SF between 7 to 9, and a theorical bitrate between ~1700 to ~5000bps. Yet, if you want to respect the duty cycle, you can't emit at 5kbps, but only 500bps on average (surely, you can peak to 5kbps, but only if your average emission is only <= 500bps).

Even then a 500 bytes packet must be split to at least 3 packets (since LoRa support a maximum of 242 bytes packet), and more generally, if you want to reach some long range, you'll limit yourself to a SF 9, so with 53 bytes per packet, meaning that a Reticulum packet will use 10 LoRa packet for a single own packet. Ten packets on a 1700bps link mean it's using 2.35s to send a Reticulum packet, and you'll have to wait ~21s before sending another one.

By watching my LoRa's gateway receiving time graph, a 2.3s transmission will saturate and prevent many other services to talk.
I'm not even speaking about error code correction (since a 2.3s transmission is more likely to get disturbed in ISM band) that'll also require some additional bandwidth to include.

All in all, I don't think Reticulum fits LoRa network, or only on a lab experiment where only the lowest SF are used.

@markqvist
Copy link

markqvist commented Aug 19, 2021

You seem to be talking about LoRaWAN, which that table also seem to be describing. As I said before, I believe it's quite pointless to run Reticulum over LoRaWAN.

LoRa != LoRaWAN

LoRa is a PHY modulation scheme. LoRaWAN is a networking stack running on top of the LoRa PHY. Reticulum is another networking stack that can run on top of LoRa, amongst many other things. Maybe you are confusing the terms.

Really, no offense intended here, but I don't really care much whether you "don't think Reticulum fits LoRa network". I have hundreds of LoRa devices deployed in the real world running Reticulum, and they seem to be doing fine, in spite of your thoughts on the matter.

A lot of your statements in the above are false, in relation to LoRa. They are maybe true in regards to LoRaWAN.

Edit: Either way, I think it is pointless for this thread to get filled with discussion on Reticulum, since it would not be of any help to the disaster.radio project before a C++ implementation is ready anyways, and that is still a while in the future. As of writing this, the only functional Reticulum release is the Python reference implementation.

@samuk
Copy link
Collaborator

samuk commented Aug 19, 2021

Ideally, that should not happen. Just one megabyte of memory can accommodate more than 22.000 active links,

So ~4400 as an upper limit on 192Kb devices.

@markqvist
Copy link

So ~4400 as an upper limit on 192Kb devices.

~4400 if you can allocate 192KB to the link table, since other things will also need memory ;) So my guess is a 192KB device could probably handle around 1500 active links. But I think the link throughput and CPU cycles needed to actually pass traffic for 1500 active connections would be exhausted before the RAM runs out.

@X-Ryl669
Copy link
Author

X-Ryl669 commented Aug 19, 2021

You seem to be talking about LoRaWAN, which that table also seem to be describing

No, it comes from here: https://lora-developers.semtech.com/library/tech-papers-and-guides/lora-and-lorawan/ and the table is about LoRa and not LoRaWAN.

You can probably use your node the way you want (with low SF), not giving a thought about respecting the standards and rules (duty cycle for one) but doing so does not give confidence in your network, and it's bothering other users around (which might be using LoRaWAN, or not). As long as you're not spreading on all channels...

Sure, you can have hundred of nodes that are close to each other and it'll work.

I never had hundred of nodes but only few that are very far and the duty cycle constraint bothers me a lot more than the bandwidth constraint since sending data takes forever (and I'm talking about ~160 bytes here, not 500) and the node must stay awake when they transmit.

All in all, I've followed the Reticulum protocol when it was first described and I think it's a good protocol by itself, yet, I feel it's too large to work on LoRa and you'll maybe prove me wrong (and it would be great!). One of the missing thing in the protocol, IMHO, is the possibility to pre-provision symmetric keys for group communication, in order to avoid transmitting all the session creation packets. After all, if you master the nodes on the network, you can start communicating with encryption (and a known, secret, symmetric key) without having to deal with setting up DH and that would save a lot of bandwidth.

Another improvement I haven't seen in all the network mesh protocols, is to prepare for next packet wake up time to save battery. Many nodes have a very precise clock (GPS, oscillator, etc...) they could use to synchronize communication time, including transmission delay (thus allowing a better usage of the duty cycle limit) and compute the varying session's encryption keys from the elapsed time/wallclock time. That's some information that can be saved to transmit.

@markqvist
Copy link

@X-Ryl669, that table is specifically talking about modulation characteristics in the context of LoRaWAN networks. Maybe you should read the document again.

Sure, you can have hundred of nodes that are close to each other and it'll work.

Who said I had a hundred nodes close to each other? Quit with the straw manning already dude ;)

not giving a thought about respecting the standards and rules

Again with the straw man. That's a completely unfounded accusation, which I find just a wee bit offending ;) I sincerely hope you will take that back. I care a great deal about proper spectrum use, and I've designed and built commercial networks in both licensed and unlicensed spectrum in more or less every band from HF to microwave. To me it seems like you are not that knowledgeable about spectrum use regulation, outside of a rather specific area, which you then seem to believe you can apply to everything. Spectrum access regulation is pretty complex, and while that Semtech page you linked to is probably a fine reference for what you can and cannot do within the LoRaWAN standard, you can extract very little about actual regulation, ie. law, from that.

Just so you know, there is much more spectrum available (both licensed and unlicensed) where LoRa can be used legally than just the standard channels specified by the LoRa Alliance.

As I said before, I completely agree that using Reticulum on top of, or within the confines of the LoRaWAN standard is pointless at best. There are plenty of other cases where Reticulum over LoRa PHY works wonderfully though. Reticulum and LoRaWAN are very different protocols, that serve very different purposes. They just both happen to be able to use the LoRa PHY.

I think you have a good point about how pre-provisioning of group keys are cumbersome at the moment. I can be done, but there is no easy-to-use API for it. I will look into that in the future.

And by the way, the LoRa MTU is 255 bytes, not 242 as you wrote ;)

@markqvist
Copy link

I still don't know if Reticulum is actually a good fit for disaster.radio, since my knowledge of disaster.radio is way too limited. But for what it's worth, Reticulum can now, since commit cd8de6420155dc14f1b4743fc99d8c5bf68589cb, work with MTUs all the way down to 211 bytes. Bandwidth efficiency is negatively impacted by such a low MTU, but it does work. This is not "officially" supported, and should probably be considered rather experimental.

The official default MTU is still 500 bytes, and running any other MTU than 500 will break intercommunication with networks that run the standard MTU.

So while I would not recommend using a different MTU, it is now possible if you want to :)

The change will be in beta 0.2.4, which I am going to release in a few days.

@samuk
Copy link
Collaborator

samuk commented Sep 2, 2021

@markqvist thanks for the update, we'd still need C I think before even exploring it. markqvist/Reticulum#2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests