-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LoRaWAN] Rework bands, official Rx windows, support ADR, confirm frames, improve EEPROM handling, support clock drift #867
Conversation
Known problem: terribly bad at receiving downlinks Mask-list bands (e.g. US915) untested, likely a few bugs
Hey @StevenCellist - thanks a lot for the awesome PR, This is the current test code
However, this does not compile successfully:
Maybe you have an idea? Thanks a lot, |
Hi Nico, I had to hotfix that one, it's on the last commit now. Hopefully that'll solve it :) |
Hey Steven,
|
Thank you for checking it out so fast! |
Thanks Steven :) |
I could get it to compile if I change LoRaWAN.cpp by removing the LoRaWANBands.cpp include and adding the getDownlinkDataRate there (removing it from LoRaWANBands.cpp). The resulting code successfully joins the LoRaWAN Gateway, however, if I remove node.wipe() and the OTAA functions and try to do a node.restore(); after a successful OTAA - I still get an
So that error seems to be still present, as in #864
|
Ah, yeah, I had to add that at some point for my local branch. Didn't throw errors for me, but it's fixed now. The node.wipe() is officially a user error. The problem is that when you go to upload the code that does not wipe, at some point during the build&upload, your device is triggered to reset and the code that's still on there runs for a second or two. So if wipe() is the first thing you do, that'll still run before the new code is uploaded. Best practice is to include a 10 second delay before the wipe & join, such that on the next upload, that won't get triggered before the new code is on there. |
This looks better, however, you still need to remove the The node.wipe(); explanation makes sense and I tried to do the run, e.g.
and as before, this worked well -> OTAA works out, RP2040/SX1262 starts to uplink messages. Then I did following
But this did fail:
magic id: 0 is probably all wrong. Could it be that there is an issue with recognizing this as an RP2040 board or - which is probably the root cause - could it be that the "save" logic for the OTAA data does not work for RP2040s? Do we have any way to test that? Because I think something is off here - and if its able to join successfully on activation each time but not on restore - it could be an issue in that direction. In addition to the potential "save OTAA" issue I also saw another user with an ESP32 also having issues restoring an OTAA session with their SX1262 ( #858 ) - maybe that could also point something in the SX1262 handling(?) Thanks :) |
Also as additional info, a complete power-on, wipe and OTAA with the first sent uplink as debug output, maybe that shows something?
|
I don't have any of that hardware, so I can't say things for sure. But if it didn't have EEPROM, I'd expect the .wipe() and .restore() and .save() functions could not even be in the compiled firmware or otherwise would have thrown errors. So I expect there is a mistake then somewhere, but I'm off for the day for a CD recording. I hope to get back to this by Tuesday unless someone else has a go at it first and can find the mistakes. The problem cannot be in the SX126x, as the restore function just looks at the nvm. It must be in the main logic (or something with the wipe or so). |
@nmaas87, if you can give Team @StevenCellist a bit of time to catch up with RL, I can furnish him with access to a Pico with an SX1262 (and SX1276) for debugging. In the meanwhile, can you look at what's different about your setup from the two setups that compile for the CI so we aren't chasing our tails on something unique. I can swap my radio test environment to use US so we can also verify that sometime middle of next week and then it can be tested, so anyone else with a burning ambition to run this PR will need to be patient! |
Thanks a lot for your help and best of luck for your recordings! :) I found the issue, the software EEPROM of the RP2040 was not handled correctly, please see this PR for the fix: #868 Also, please remove the duplicated getDownlinkDataRate function from the LoRaWANBands.cpp, otherwise it will not compile. |
That was already done, can you update and confirm |
Hey @HeadBoffin - sorry for skipping your comment, I really did not see it. As written, I have found the issue with the "not restoring a session" - it was due to a problem with the EEPROM handling and have already opened a fix/PR #868 . The error in my compiler aligns with the failed CI checks here (CI / arduino:avr:uno (pull_request)) and is due to these lines ( https://github.com/StevenCellist/RadioLib/blob/ccb28f3b7b554c1f59cdf85f3c85ca22e0947451/src/protocols/LoRaWAN/LoRaWANBands.cpp#L5-L13 ) being a duplicate from his last fix, so they need to be removed. Then it will just compile fine 👍🏻 Thanks and cheers, seems like the RAK11300 is working fine now - thanks a lot guys, just need to see if I can figure out how to better power the Antenna amplifier / only power it on RX and TX - but maybe |
Did some more research, and I think I start to understand more what is going on. So, the total RxTimeout value should equal 12.25 symbols + 20 bits, I assume. Then, the question is how I can convert from 20 bits to a number of symbols. From SX126x.cpp (nPreCodedSymbols), this conversion looks to be (20 + 4SF - 1) / 4SF. As this is an integer equation, this would always be just 1 symbol, which I don't really believe... |
@StevenCellist here's the relevant section from the SX1276 datasheet: Plugging in the known values (PL = 0, CRC = 0, IH = 0) I get
that simplifies to
Now notice how the part Curiously, adding the 12.25 preamble symbols, we get 20.25 symbols, which is the value you seem to have arrived at using trial and error. I'm not 100% certain the above calculation actually checks out, but I'm willing to call it good enough ;) |
That's actually pretty hilarious!! I think I know & understand enough to make this look and function decent including the scanGuard, thank you very much :) |
Delaying a day, getting some more tests done |
New commit with more & better examples, improved SX127x timeout, created a unified |
Hey @StevenCellist - thanks for the awesome work. Using the RAK11300 (RP2040+SX1262) I just tested LoRaWAN_End_Device Example and this looks good, as well as the persistent one - however persistent had some copy and paste doubeling here: I also tried the LoRaWAN_End_Device_Reference - it looks quite good, however (this one I don't understand) - it always receives a downlink with data in it. Looking at my Gateway and Chirpstack - they also send a downlink. The funny thing however is.. I did never enter anything into the downlink queue. So I am not really sure how that little bugger asks for the downlink and gets... whatever sent down to it - because it is not a confirmed uplink :). Other than this curiosity of mine - its looking quite good 👍🏻 |
@nmaas87 thanks for testing those all! Once I start working on it for too long it becomes a mess of what is normal and what may be a problem.. couldn't see it clear anymore. Gotcha on the copy-paste error. The saveSession() indeed stores everything that has changed since the last saveSession(), such as frame counters and other things that may have changed due to MAC commands (ADR setup, delays, ..). The reference does accidentally send confirmed uplinks all over the place - the |
@jgromes small heads-up that I'm waiting for seal of approval from @HeadBoffin as he has some equipment to test US915 as well. After his feedback is in I'll add the last commit with the fixes mentioned above :) |
I've exercised the changes for both AU915 & US915 in my secret underground bunker to shield the inhabitants of the UK from any illegal radio waves and found the code to be beta worthy. Because of all the little bitty corner cases that exist, as well as implementation habits of varying developers, I'd mark this as not suitable for deployment until we've all had a run at it for a few weeks with some devices that can be left alone as well as a few that are abused (power cycled, maybe hammered on downlinks (via private LNS) etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@StevenCellist I reviewed the changes and left three minor comments. Other than that, I'll try to do some testing myself.
We can keep this open for a bit longer, though I have to admit I'm a bit in the "move fast and break stuff" camp. Unfortunately, for me release (and fast subsequent patches) are the only practical way to see how this behaves in variety of environments. The fact is, not many people are willing (or even know they have the option) to try out patches from PRs for a project of this comparatively small scale. |
I'm suggesting that it is clearly marked as beta and not yet proven for deployment aka production - so warning peeps not to go and make a pile of devices and put them places that are awkward to get to lest a showstopping issue arises. I'm not saying don't merge, just that to my mind it should be described as is - exercised but not yet run for more than a few hours so far. I'm in the process of setting up some devices to leave to run to test this plus some new sensors and a solar charger board, so it won't take long to build confidence in long term use. |
@jgromes if the one comment regarding the SX127x.h interface is something you can solve, I'd say you merge it (and probably add a warning on the main README that LoRaWAN support is in beta). Looking forward to having this integrated 🤩 |
... and its done! Huge thanks @StevenCellist and @HeadBoffin |
There's a lot to unpack here, as a lot has happened. But first and foremost a HUGE thank you to @HeadBoffin for supporting me through this adventure.
So.. what did happen? The largest change is the integration of ADR which required a complete rework of the Rx windows. Using CAD does not work for high datarate / low SF due to the speed required to catch a downlink. So I had to change to RxSingle with timeout to properly know if a receive is in progress at the end of an Rx window; but then there is a difference in IRQ handling between SX126x and SX127x, such that the logic ends up a bit awkward. There may be room for improvement here.
Here's a (non-exhaustive, I may have forgotten something!) list of what has new or changed:
This PR has been tested on the following devices:
And has been tested on the following regions:
Further investigation should be done on the timeout of the SX127x series. It does work almost completely, but I do not fully understand why and this is definitely not final. The device does seem to fail to catch empty downlinks (e.g. confirming an uplink or after ADR Ack Req). One more known problem is the handling of the ADR channel mask on fixed bands, where the index of the channel appears offset by 8 maybe due to swapped bytes. But my testing infrastructure makes it hard to debug this.
I know that there are merging problems due to some further development on the main stack.. don't know how to go about that honestly.