Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from using String to using struct for messages between server and clients #48

Closed
paidforby opened this issue Jan 27, 2020 · 24 comments · Fixed by #68
Closed

Switch from using String to using struct for messages between server and clients #48

paidforby opened this issue Jan 27, 2020 · 24 comments · Fixed by #68

Comments

@paidforby
Copy link
Contributor

Related to recent PR #44

internal server-client communication currently uses a String, this is unsafe and problematic. Instead, it should use a struct that looks something like wsMessage from the pre-refactor code,

struct Message {
    uint8_t id[2];  // equivalent to uint16_t, running count of messages sent by server
    uint8_t type; // ASCII char corresponding to content/use of message
    uint8_t delimiter; // always '|' (pipe)
    uint8_t data[240]; // content of message
};
@tlrobinson
Copy link
Contributor

tlrobinson commented Jan 27, 2020

  1. Any reason not to use uint16_t for id vs two uint8_ts? The endianess should be specified in the protocol and properly parsed/serialized (i.e. memcpy(&msg_id, buf, 2) will only work on machines that match the protocol's endianess), but it's easier for the "clients" to work with a uint16_t that's already been parsed.

  2. Similarly, since it's an ASCII character why not use char?

  3. If this is a binary protocol (the first two bytes definitely aren't ASCII) what's the purpose of the delimiter? It seems like it should be either all ASCII with delimiters between message ID, type, and data (1|c|hello world), or leave out the delimiter completely.

  4. If data is a null-terminated can we make this char data[240]? Or does the format of data depend on the type (it's null-terminated for 'c' I assume?)

@Pablo2048
Copy link

Pablo2048 commented Jan 27, 2020

1. Any reason not to use `uint16_t` for `id` vs two `uint8_t`s? The endianess should be specified in the protocol and properly parsed/serialized (i.e. `memcpy(&msg_id, buf, 2)` will only work on machines that match the protocol's endianess), but it's easier for the "clients" to work with a `uint16_t` that's already been parsed.

+1 - I suggest to use HTONS & NTOHS in this case as we already do in TCP... Also maybe we do need packed structure here...

@tlrobinson
Copy link
Contributor

Also, I assume this is part of the same packet structured described here https://github.com/sudomesh/disaster-radio/wiki/Protocol#packet-structure but it doesn't quite match (only 1 byte sequence number, no delimiter)

@paidforby
Copy link
Contributor Author

Any reason not to use uint16_t for id vs two uint8_ts?

Not that I can think of, I think just used uint8_t because it felt more explicit that it was two bytes. And it kept everything in the same type.

Similarly, since it's an ASCII character why not use char?

I would keep this as uint8_t because, while it is typically a char, it does not, and should not have to always be a char.

If this is a binary protocol (the first two bytes definitely aren't ASCII) what's the purpose of the delimiter? It seems like it should be either all ASCII with delimiters between message ID, type, and data (1|c|hello world), or leave out the delimiter completely.

The delimiter is a hold-over from the initial design of the chat app. I believe it makes the messages easier to parse in javascript, maybe? I'm in favor of dropping the delimiter and making the messages entirely binary, since I believe that would be the most efficient use of the limited message size.

If data is a null-terminated can we make this char data[240]? Or does the format of data depend on the type (it's null-terminated for 'c' I assume?)

The data is not necessarily null terminated. I would say that data format definitely depends on the type. This means we would need a buffer+length, if I am not mistaken.

Ultimately, I'm not really tied to the original message structure, it is just what I inherited from the chat app design. If I wanted to re-imagine this in the context of the refactored code, I would suggest something like this,

struct Message {
    uint16_t id;  // 2 bytes global count of messages exchanged on server
    uint8_t data[238]; // content of message, possibly binary data, not necessarily null-terminated
    uint8_t type; //  1 byte (typically a char) corresponding to intended client/service 
    int data_length; // appended to message by client before transmitting to server
}

In this new struct, both type and data_length are appended by the transmitting client and should be removed by LoRaLayer2 before transmitting over LoRa because they are redundant as the protocol already has a spot for both of them (if you look at the protocol documentation linked below).

I also correctly calculated the length of the message now. It should be a total of 240 bytes (2 byte ID + 238 bytes data)

Also, I assume this is part of the same packet structured described here https://github.com/sudomesh/disaster-radio/wiki/Protocol#packet-structure but it doesn't quite match (only 1 byte sequence number, no delimiter)

No, it is not, that is the LoRaLayer2 packet structure, the message structure we are discussing here is for the "application layer" and is contained entirely within the data in that table (byte 16 to 256).

The sequence number there is not equivalent to the msg_ID. The sequence number is meant solely to calculate the metric and is global only to LoRaLayer2, not necessarily to the node, see the paragraph about "packet success" here https://github.com/sudomesh/disaster-radio/wiki/Protocol#metrics

The purpose of the msg_ID is (obviously) to identify messages, so if a broken or rouge client keeps transmitting the same message it could be identified and prevented from flooding the node with duplicate message.

I'm not sure of the best way of handling the msg_ID in the refactored code. Where would one properly store a global variable that needs to be accessed and updated by all the clients?

@paidforby
Copy link
Contributor Author

discussion related to the bluetooth development reminded me not to forget about routed packets. Whatsmore, both of these issues got me thinking that I need to document how data encapsulation is done on disaster radio. I wrote up some ideas related to that on the wiki, https://github.com/sudomesh/disaster-radio/wiki/Layered-Model

Some of the major divergences in this documentation from the current protocol include,

  • Treating the packet header more like a MAC-header, in that it is scraped and re-built every hop with a new nextHop address

  • The final destination address is now moved out of the packet header and into the Layer 3 header, this means that Layer 4 applications that know the routing table can select a route.

  • Rename the generic "Message" to "Datagram" to make it sound more like UDP, which is what I would like to emulate.

  • Get rid of the redundant "type" bytes, replace type in packet header with a single bit flag that tells whether the packet contains routing table information. type in datagram header will let clients know if they should "listen" to a datagram.

  • Get rid of msg_id? I'm on the fence about this. I'm not sure the value of keeping this msg_id byte, Would it really detect errors or prevent rouge nodes? Might be better to use a byte or two on something like a checksum.

@beegee-tokyo
Copy link
Contributor

beegee-tokyo commented Feb 16, 2020

@paidforby
I made a few tests. I converted all clients and servers to use char[] instead of String and it works. Not sure if a packet structure is required at this level, but could be done as well.
Then I investigated the routing table messages and found some problems.
In LoRaClient::loop() the data to be transferred was always parsing out the message ID, but a routing table has no message ID. This lead to a destroyed routing table message. The work around for now is to check the packet.type. and if it is not 'r' the message ID is parsed out, otherwise the full message is transmitted. With that I got the routing table working with my Android app.
The changed LoRaClient::loop

void LoRaClient::loop()
{
    LL2.daemon();
    struct Packet packet = LL2.popFromWSBuffer();
    size_t len = packet.totalLength - HEADER_LENGTH;
    if (packet.totalLength > HEADER_LENGTH && len > 0)
    {
        uint16_t msg_id;
        char msg[len - 2 + 1] = {'\0'};

		if (packet.type != 0x72) {
			// parse out message and message id
			memcpy(&msg_id, packet.data, 2);
			memcpy(msg, packet.data + 2, len - 2);
			server->transmit(this, msg, len - 2);
		}
		else
		{
			// routing package, has no id
			memcpy(msg, packet.data, len);
			server->transmit(this, msg, len);
		}
	}
}

But on the Websocket client it still didn't work. So I debugged the Javascript and found that the code expected the routing table to be formatted the same way as a chat message but with an 'r' instead of a 'c'.
Changing the Websocket client to send the routing table in the expected format made this working as well.
image
The changed WebSocketClient::receive code

void WebSocketClient::receive(char *message, size_t len)
{
	// TODO: msg id? defaulting to 0 for now
	uint16_t msg_id = 0;

	unsigned char buf[2 + len + 2] = {'\0'};
	memcpy(buf, &msg_id, 2);
	if (message[1] != '|')
	{
		buf[2] = 'r';
		buf[3] = '|';
		memcpy(&buf[4], message, len);
		client->binary(buf, len+4);
	}
	else
	{
		memcpy(&buf[2], message, len);
		client->binary(buf, len+2);
	}
}

The workarounds are a little bit flacky, but just to let you know where the problems were created.

@paidforby
Copy link
Contributor Author

Thanks for that input, I have a merge in the works that will address most of this. I was gonna use a struct of unsigned chars (uint8_t), which is not too different from your char* solution.

Getting the routing table to the web app was always a bit hacky. We need to come up with a "better", that is, more consistent way for clients to access the routing table.

@paidforby
Copy link
Contributor Author

@beegee-tokyo and @tlrobinson I've made some progress toward resolving this issue. I'm working on it in the 1.0.0-rc.1 branch. The updates I've made break backwards compatibility hence the major version increment. This is mostly due to changes in LoRaLayer2, specifically that the nextHop address has been moved into the LL2 packet header and now serves the dual purpose of either routing to a neighbor or telling LL2 that a packet contains routing table information.

I've more or less implemented what I outlined in my above comment and the Layered Model wiki post.

I've tentatively settled on the following struct for datagrams,

struct Datagram {
    uint8_t destination[ADDR_LENGTH];
    uint8_t type;
    uint8_t message[233];
};

All client receive functions are now required to be,

void receive(struct Datagram datagram, size_t len);

While the server's transmit function takes the following,

void transmit(DisasterClient *client, struct Datagram datagram, size_t len);

len is the length of the entire datagram, not just the message portion.

While it is possible to write a client that generates an entire datagram, right now I've maintained legacy support for the web app's confusing <msg_id><msg_type>|<msg> format inside of the datagram message. I hope to update the web app to use datagrams instead.

So far, I only have the LoRaClient and WebSocketClient working. The rest of the clients are in various states of completion. Bluetooth and OLED are very close to working, while History and Console still have a lot of work to be done.

I'd appreciate any help or input y'all might have on this.

@beegee-tokyo
Copy link
Contributor

@paidforby
I can make a fork and work on some stuff. Just tell me what clients I should take care of.

@paidforby
Copy link
Contributor Author

Here are the current issues listed by decreasing priority,

  • The BleUartClient has not been tested. You could test that and fix any bugs.
  • The OLEDClient is not parsing datagrams into Strings correctly. Maybe you could find a solution or a better way of writing it.
  • All History related clients and features are currently commented out. I believe the history buffer needs to be updated to take datagrams instead of Strings, wasn't sure where to start with that.
  • The Console middleware is completely commented out, we need to rewrite using chars and buffers instead of Strings.
  • The TCPClient has not been tested, I'm not sure of it's status.

Feel free to work on any of these. I have very little time to work on this in the next week.

@beegee-tokyo
Copy link
Contributor

beegee-tokyo commented Feb 24, 2020

#56 fixes BleUartClient, OLEDClient and Console. Smaller changes in main.ino as well (wrong message sizes in WelcomeMessage)

Will look into History tomorrow.

TCPClient is a problem, because I have no idea how to test as well.

@beegee-tokyo
Copy link
Contributor

Added a lot more changes to #56 . Lot's of changes. You can cherry-pick whatever you like.

Updated

  • Some more fixes in main.ino
  • BleUartClient to handle username saving
    • Username can be set/deleted from Android app and will be saved/deleted on the ESP32 (see Settings.cpp)
    • Requires Android app V1.4 to work properly.
  • LoRaClient to clear datagram.message before using
  • TCPClient to clear datagram.message before using
  • WebSocketClient to handle username saving
    • Without changing the JavaScript the handling of the saved username is incomplete. Once username is set in WS, ESP32 saves it.
    • Parsing incoming message for !~ which switches between WiFi and BLE connection
    • Parsing incoming message for !! which deletes the saved username on the ESP32
  • Console added command /switch to switch interface between WiFi and BLE.

New

  • Rewrite of all History functions. Still using String to save the history in memory. Using JSON to save the history on SD card. JSON is created from the datagram content. On replay the datagram can be recreated from the JSON content. Tried to use datagram directly, but that took too much heap when saving in memory.
  • Added option to switch between WiFi and BLE (default is WiFi) without recompilation.
  • Added settings to save username and interface selection (WiFi or BLE) permanent on the device. Default after fresh programming is WiFi.

@beegee-tokyo
Copy link
Contributor

TelnetClient (TCPCLient) is working as well.
I tested by connecting to the ESP32 with JuiceSSH on port 23.
It is basically the same as Console. Same commands work the same way as on Console. Only command /lora does not output anything, but that is because it is printing directly to Serial instead of sending Lora map as a datagram. Getting /lora to work requires some changes in LoRaLayer2 class. Either change LL2.printRoutingTable() to write to a char buffer or a stream. Or give public access to the routing table entries.

I made a workaround, just for testing, and it worked well under both Console and TelnetClient.
In LoRaLayer2.h

RoutingTableEntry *getRoutingTable(int index);

In LoRaLayer2.cpp

RoutingTableEntry *LL2Class::getRoutingTableEntry(int index)
{
    return &_routeTable[index];
}

And in Console.cpp

...
    else if ((strncmp(&args[0][1], "lora", 4) == 0))
    {
      RoutingTableEntry *tblPtr;
      int routeTableSize = LL2.getRouteEntry();
      Serial.printf("RoutingTableSize = %d", routeTableSize);
      Datagram loraInfo;
      memset((char *)loraInfo.message, 0, 233);
      memcpy(loraInfo.destination, LL2.broadcastAddr(), ADDR_LENGTH);
      loraInfo.type = 'c';
      size_t msgLen = 0;

      strcat((char *)loraInfo.message, "Address: ");
      msgLen += 9;
      uint8_t *localAdr = LL2.localAddress();
      char tempBuf[64];
      for (int i = 0; i < ADDR_LENGTH; i++)
      {
        msgLen += sprintf(tempBuf, "%02X", localAdr[i]);
        strcat((char *)loraInfo.message, tempBuf);
      }
      strcat((char *)loraInfo.message, "\n");
      msgLen++;
      msgLen += sprintf(tempBuf, "Routing Table: total routes %d\r\n", LL2.getRouteEntry());
      strcat((char *)loraInfo.message, tempBuf);
      for (int i = 0; i < routeTableSize; i++)
      {
        tblPtr = LL2.getRoutingTableEntry(i);
        msgLen += sprintf(tempBuf, "%d hops from ", tblPtr->distance);
        strcat((char *)loraInfo.message, tempBuf);
        for (int j = 0; j < ADDR_LENGTH; j++)
        {
          msgLen += sprintf(tempBuf, "%02X", tblPtr->destination[j]);
          strcat((char *)loraInfo.message, tempBuf);
        }
        strcat((char *)loraInfo.message, " via ");
        msgLen += 5;
        for (int j = 0; j < ADDR_LENGTH; j++)
        {
          msgLen += sprintf(tempBuf, "%02X", tblPtr->nextHop[j]);
          strcat((char *)loraInfo.message, tempBuf);
        }
        msgLen += sprintf(tempBuf, " metric %3d ", tblPtr->metric);
        strcat((char *)loraInfo.message, tempBuf);
        strcat((char *)loraInfo.message, "\n");
        msgLen++;
      }

      client->receive(loraInfo, msgLen + DATAGRAM_HEADER);

      // TODO: print to client instead of serial
      // Serial.printf("Address: ");
      // LL2.printAddress(LL2.localAddress());
      // LL2.printRoutingTable();
    }
...

@paidforby
Copy link
Contributor Author

@beegee-tokyo Thanks for all of this, lots of great additions. I finally got sometime to look through most of it.

As I mentioned in the PR, the OLED wasn't working for me. It appears that the datagram message was not being converted to a string successfully? I fixed this issue in recent commit to 1.0.0-rc.1 branch, f1851f0.

I am also seeing issues with the Serial Console and I have yet to test the Telnet client. I will continue debugging and post updates here.

@elimisteve
Copy link

Hey all, I recently found https://www.pjon.org/ and it looks like they've come up with a pretty brilliant, super efficient binary format that they use for all mediums: UDP over the internet, data over serial, data over LoRa, etc.

It is released in a single portable implementation that can be easily cross-compiled on many systems like ATtiny, ATmega, ESP8266, Teensy, Raspberry Pi, Windows X86, Apple and Android. It is a valid tool to quickly build a network of devices.

...

With PJON® makers can communicate with each other and with their devices securely, anonymously and for free. This is possible thanks to the flexibility of the PJON protocol, able to operate on a wide range of media, such as wires, unlicensed radio frequencies or light pulses.


Perhaps I am misunderstanding and you are talking about app-specific data structures that are independent of the underlying medium, but I just discovered PJON and am excited about the possibilities of being able to standardize communication using an open protocol and efficient data format suitable for any kind of network, no matter how decentralized or heterogeneous.

@elimisteve
Copy link

Its goal is the interoperability of diverse systems on a wide range of media with the use of a new set of Open Standards.

@paidforby
Copy link
Contributor Author

Thanks for pointing this out! Looks like a really awesome project and similar to our routing library, https://github.com/sudomesh/LoRaLayer2/.

However, it looks like the microcontroller we are using, the Espressif ESP32, is not listed as supported for PJON Through LoRa. Who knows if they've just haven't tested it or if there is an actual hardware limitation (memory is pretty limited on the ESP32).

And you are somewhat misunderstanding this thread. In this thread we are talking about the structure of messages sent between "clients" that are internal to the firmware (more Layer3/4 than Layer2). Not sure if PJON could be used to do this also, but the system we have now is pretty good once we clean up this String-struct conversion issue.

You maybe interested in another discussion #57 where folks are talking about possible alternatives to LoRaLayer2. It's not clear to me how PJON would handle mesh routing. I'll keep my eye on PJON and see if they get support for the ESP32, then I would definitely consider it as an alternative Layer 2/Layer 3 protocol.

@paidforby
Copy link
Contributor Author

Looking at their github, https://github.com/gioblu/PJON, I see ESP32 listed as supported. But maybe that's just for general PJON and not PJON Through LoRa? Not sure, more research is needed.

@elimisteve
Copy link

@paidforby
Copy link
Contributor Author

Yup, I just found that also. Looks like Through LoRa is not supported for the ESP32, wonder why?

@elimisteve
Copy link

Hmm you're right. DualUDP and GlobalUDP are supported! Which is so cool, but... unless the ESP32 uses a gateway (ESP32 =(UDP)=> gateway =(whatever)=> LoRa), looks like this won't work right now. Bummer!

@elimisteve
Copy link

elimisteve commented Mar 29, 2020

Yup, I just found that also. Looks like Through LoRa is not supported for the ESP32, wonder why?

Good question; I just asked about this to find out more: gioblu/PJON#342 .

@paidforby
Copy link
Contributor Author

Back to the task at hand.

In the recent commit e227e86, I heavily refactored the Console middleware, which appeared to be broken to me. I reintroduced the banner from the pre-client/server console and added a "prompt" showing the current username which mimics the output of the WebSocket and OLED display, though I'm interested in having a more bash-like username@node_addr: prompt.

Next, I'm going to take a look at the TCP client and refactor where necessary.

After that, I need to look at the History, which does not appear to be working, but I can't say I've completely tested it.

paidforby pushed a commit that referenced this issue Apr 1, 2020
@paidforby
Copy link
Contributor Author

So I think I finally have almost everything working. Telnet required a little refactoring, but mostly I just had to make some improvements to the Console middleware and make up for inconsistencies between the Stream client and the Telnet client.

@beegee-tokyo I resolved the /lora print out for telnet by changing the LoRaLayer2 printRoutingTable function to a getRoutingTable function that writes to a char array what was previously printed to serial. It's a little bit of a workaround, eventually I want to change the way Layer 3 applications access the routing table by allowing them to retrieve it as formatted data (maybe JSON?).

History appears to be working, but I'm having trouble getting a really good test. There are some changes/improvements I want to make to the History feature, but they are mostly unrelated to this issue.

I'm going to close this issue, since I believe it has served its purpose and all the necessary changes have been made on the 1.0.0-rc.1 branch. I should be merging the 1.0.0-rc.1 branch into master soon, at which point this issue will be truly resolved.

@paidforby paidforby mentioned this issue Apr 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants