Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SARA-R510S keeps CTS line sporadically high if DTR is used for power saving #192

Open
eeFLis opened this issue Feb 1, 2024 · 94 comments
Open

Comments

@eeFLis
Copy link
Contributor

eeFLis commented Feb 1, 2024

Hi

maybe no ubxlib related problem, but perhaps you have already observed this behavior

If the DTR pin is used for power saving, the SARA R510 s module sporadically holds the CTS line high and no longer releases it.
the module is then no longer responsive.

we had the problem several times with the module firmware version 03.15, A00.01.
We hoped that this problem would be solved with the current module firmware version 03.30, A00.01, but the problem still occurs, even if it is much less frequent (after hours ).

Have you observed such behavior in your test environment?

@RobMeades
Copy link
Contributor

Hi @eeFLis: can't say this is something we've noticed, @philwareublox is going to check internally.

@RobMeades
Copy link
Contributor

@eeFLis: some questions. You say that the module allows the CTS line to float high when DTR is used for power saving: is the module known to be otherwise functional when this problem has occurred, e.g. is VINT high, do you maybe get URC's turning up, that kind of thing? Have you happened to be able to capture any HW traces (e.g. of the UART lines with a Saleae probe or similar) that show the relationship between the DTR line and CTS or other lines?

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 1, 2024

Hi

I have attached the salea capture. However, this was already recorded some time ago with the module fw version 03.15, A00.01, but should show the problem
SARA-R510S-01B-00 keep CTS high if DTR is used for power save.zip

@RobMeades
Copy link
Contributor

Can you confirm that you are using the default value for U_CELL_PWR_UART_POWER_SAVING_GSM_FRAMES, so AT+UPSV looks like this:

AT+UPSV=1,1300
OK

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 1, 2024

Yes, we use the default value.
At the time of the salea capture, we certainly used a older version of the ubxlib.
If the default value for U_CELL_PWR_UART_POWER_SAVING_GSM_FRAMES was never adjusted in the ubxlib it was AT+UPSV=1,1300

Were you able to gather all the information from the salea capture or do you need additional information?

@RobMeades
Copy link
Contributor

RobMeades commented Feb 1, 2024

I've been staring at it with the relevant expert internally.

If you look at the picture below, at the green marker is where the module emits a +CSCON: 0, so it is no longer RRC-connected and hence could go to sleep at any time (FYI, I'm told that when DTR-controlled sleep is in play the timeout in the AT+UPSV=1,1300 doesn't apply, the module can go to sleep as soon as DTR permits it):

image

12 seconds later comes the problematic part, focusing on that:

image

You can see, in the middle of the picture, that the module has allowed CTS to float high, without there being any TXD UART activity, which must be because the module has decided to go to 32 kHz sleep. A short while later CTS goes low again, which is likely because some timed activity inside the module has caused it to wake up again, but only briefly. CTS goes high again, and it is at about this time that the MCU pulls DTR low in order to send the next AT command.

Pure speculation, of course, but a guess would be that something to do with the fact that the module is autonomously coming out of [EDIT: I meant going into] 32 kHz sleep at the same moment as DTR goes high [EDIT: I meant low] causes it to miss the DTR edge and remain in 32 kHz sleep.

Will continue to investigate...

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 1, 2024

you mean > at the same moment as DTRgoes low right?
That would make sense

@RobMeades
Copy link
Contributor

Sorry, yes at the same moment as DTR goes low.

@RobMeades
Copy link
Contributor

Out of interest, are you able to read the state of the CTS pin from SW? I had a feeling that STM32 wouldn't allow you to read the state of a pin that had been assigned to the UART? Vaguely wondering if there might be a workaround of some form that you could do if you could read it.

@RobMeades
Copy link
Contributor

Were you able to gather all the information from the Saleae capture or do you need additional information?

If you have additional Saleae captures of the problem then we could check if they look the same but no matter if you don't.

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 1, 2024

Unfortunately I have no additional capture.
I think the pin status can be read even if it is assigned to the UART.
Do you have an idea for a workaround if the pin could be read? then I can test it shortly.

@RobMeades
Copy link
Contributor

Yes, not exactly sure what the workaround would be really. Haven't been able to think of anything yet that would not introduce a new problem. Will carry on thinking...

@RobMeades
Copy link
Contributor

One more question: do you know if the problem goes away if you set UPSV=0? I mean I assume it does, but I thought I'd ask.

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 1, 2024

It already goes away when we set the U_CFG_APP_PIN_CELL_DTR -1.
However, this makes the application much less energy efficient.

@RobMeades
Copy link
Contributor

Thanks for confirming, understood, the aim is of course to retain the power-saving goodness, just wanted to be sure of the problem.

@RobMeades
Copy link
Contributor

RobMeades commented Feb 1, 2024

Just for my information, if you set U_CFG_APP_PIN_CELL_DTR -1, so you aren't using the DTR pin to control UART power saving, you can still have UART power saving of course, provided you have implemented uPortUartCtsSuspend(), which should be there for STM32F4. It may be slightly more clumsy, since the ubxlib code will be waking the module up by prodding it with AT if the module has not been talked-to for more than 6 seconds, but it should still save pretty much the same amount of power, unless your device is going in and out of UART power saving very frequently (e.g. every few 10's of seconds, as opposed to every few minutes).

Is this an approach you have tried?

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 2, 2024

Yes we are using this approach.

That's what I tried to say with "we set U_CFG_APP_PIN_CELL_DTR -1 instead of switching off power saving control completely with UPSV=0". Was not quite clear sorry.

The additional consumption is mainly noticeable after RRC connection ist released and the timer T3324 is running.
Any communication over UART during this time wakes up the module for 6 seconds, which is completely unnecessary.

I understand that the extra consumption doesn't look like much but if you're focusing on low power it's a crucial.
The low power features of the SARA R510s were one reason why we chose it.

@philwareublox
Copy link

For my own clarity, are you saying without using the DTR line to drop the consumption of the UART you're then seeing the basic 6 second timeout for the UART. Whereas if you were able to use the DTR line successfully every time without this CTS issue, you can immediately control the UART on/off state as you want to.

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 2, 2024

yes.

As an example: during the active time (T3324) when data can still be received, the ubxlib is polling for new incomning data.
this means that the 6 second timeouts for the UART are never reached and the module do not entering power saving.

If the DTR line is used for power saving control the module can switch to power save mode immediately after the response.

@RobMeades
Copy link
Contributor

RobMeades commented Feb 2, 2024

@philwareublox and I have been discussing this, maybe there is a way forward. Background first:

  • ubxlib is designed to present a sockets interface,
  • sockets [I think] doesn't have a concept of an "is there any data on a socket" type callback, so ubxlib doesn't have one,
  • u-blox modules will send a URC on the AT interface if there is received data available on a socket,
  • however, when uSockRead() or uSockReceiveFrom() is called, ubxlib always just asks the module for data, it doesn't simply trust a data count from the URC as the URC is not emitted again until there has been a read, and whether a read has been done is an application matter, out of ubxlib's control, there would be faaaar too much opportunity for misunderstandings,
  • you have no other option than to call uSockRead() or uSockReceiveFrom(), and that shags power saving in your scenario.

A fix, then, might be to offer a function something like uSockDataAvailableCallback(): if you set such a callback, we will call it when the URC lands. You can hook that in and not bother calling uSockRead() or uSockReceiveFrom() until your callback has been called.

Could that work?

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 2, 2024

I think ubxlib already offers a solution for this. We use uCellSockGetBytesPending() to get the pending bytes reported by URC.
Only when some data has been received, we call uSockRead() or uSockReceiveFrom()

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 2, 2024

However, this only solves the problem for the example I mentioned.
Any communication with the module during the active time (T3324) prevents the module from switching to power save mode for an unnecessarily long time.

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 2, 2024

or do you see a problem with the use of uCellSockGetBytesPending() for the case described in the example?

@RobMeades
Copy link
Contributor

RobMeades commented Feb 2, 2024

It should work, I guess the only problem might be if there is a lag between when the URC arrived to updated the number and when the application called uCellSockGetBytesPending(); that might cause the uSockRead() to fall into sleepy-time. But even in the best case, where you read the data immediately, you will still get the 6 second lag. And reducing the value of U_CELL_PWR_UART_POWER_SAVING_GSM_FRAMES will result in more laggy behaviour from the ubxlib APIs in general, since ubxlib will have to send an AT to wake the module up on more occasions.

We will continue thinking.

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 2, 2024

OK thanks. In the meantime, are there any further findings as to where the error comes from when DTR is active?

@RobMeades
Copy link
Contributor

The expert internally is trying to reproduce the problem, just now looking to find the same FW version as you are using, to be quite sure about it. It is a strange case: the CTS line was allowed to float high, indicating that the module has entered 32 kHz sleep, fully 3 milliseconds before the DTR line is asserted: that's quite a lot of time in module terms, hard to see what the relationship might be.

@RobMeades
Copy link
Contributor

RobMeades commented Feb 2, 2024

FYI, the expert internally has your FW version now and is trying to script some form of sliding-delay-window of "AT/OK followed by a delay and then DTR raised" to see if he can make the problem occur; nothing from the few stabs he's had at this yet.

Wondering idly: I guess you have a callback hooked into uCellNetSetBaseStationConnectionStatusCallback()? I understand that the +CSCON:x URC, which that callback is triggered by, marks the start of T3324 - is there something (admittedly probably hacky and complicated) you could do with that and some "usually correct" guess at T3324 to hold off or bounce modem-access as appropriate?

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 2, 2024

thanks for the update
yes we have a callback hooked into uCellNetSetBaseStationConnectionStatusCallback() which marks the start of T3324.
If I understand you correctly, you are thinking about a solution if the DTR problem cannot be solved and we have to fall back on the UART timeout right? That could be a possible option, but then we would still have the 6 second lag.

@RobMeades
Copy link
Contributor

RobMeades commented Feb 2, 2024

That could be a possible option, but then we would still have the 6 second lag.

True: if it is the case [as I think it is] that SARA-R5 cannot run the UART from a 32 kHz clock then every AT communication pulls the modem out of 32 kHz sleep and the only way to get back in is by letting DTR rise or by waiting for N (by default 6) seconds.

I was think that, if you could know that the module was not going to enter 32 kHz sleep anyway (which might be after +CSCON: 1) then, for those periods, you could just do anything on the AT interface, while for the times when the module could enter 32 kHz sleep (which might be after +CSCON: 0) you could, somehow, hold-off/batch-up/manage calls into ubxlib that might generate AT commands in order to maximize the power saved.

All a bit complex, and vague, I know.

@eeFLis
Copy link
Contributor Author

eeFLis commented Feb 2, 2024

I have done a few more tests.
I think the first assumption that a DTR edge will be missed cannot be confirmed.
The module is no longer responsive in this state even after the DTR Line is pulled high and low again.
But I still have to verify it with a scope.

@RobMeades
Copy link
Contributor

Hi @eeFLis: that's an excellent question. I've been prodding the HW people periodically and the last thing they said was that they would arrange a meeting with me to discuss the right way forward. I've just asked them again and added this question to the agenda.

I can only apologize for the extreme delay here, will try to increase the pressure.

@eeFLis
Copy link
Contributor Author

eeFLis commented Jun 18, 2024

Many thanks for your help.
We will produce another series of prototypes thats why I asked about the SARA-R52 series.

@RobMeades
Copy link
Contributor

RobMeades commented Jun 20, 2024

Just had a call with one of the project managers on the HW side and some of the module SW team. Not a huge update but guess is that SARA-R52 won't make a difference to the problem, since the problem appears to be on the baseband side and that is not changed in SARA-R52.

One thing that one of the SW guys has proposed is that maybe an external HW circuit could provide a fix: something which would gate DTR according to the state of CTS and delay the onwards transmission of DTR towards the module, a kind of HW workaround, hiding the problem from the module. It would likely require something like this:

https://www.digikey.co.uk/en/products/detail/onsemi/MC74HC02ADR2G-Q/23329595

Of course, we have not actually tried this ourselves yet, just wondering if it is something that you could do within the constraints of your product or not?

@eeFLis
Copy link
Contributor Author

eeFLis commented Jun 23, 2024

To be honest I don't understand how this should help, since the error occurs without the level of DTR altering.
#192 (comment)
Is clear what exactly is causing the problem?

@RobMeades
Copy link
Contributor

Hmmm, I had forgotten that case; the expert here made the problem state occur by toggling DTR constantly. When he did this he found a window in which the module would fail to return from sleep (indicated by CTS staying high). The implication of your trace is that something about being in "sleep if DTR lets you" mode can cause the problem, irrespective of the state of DTR, though I guess that would be more difficult to reproduce reliably.

I will add this specific data point to the internal ticket.

@RobMeades
Copy link
Contributor

[lack of an] update: another internal meeting, no significant progress to report I'm afraid, am persisting...

@RobMeades
Copy link
Contributor

@eeFLis: question - do you happen to recall if, when the problem occurs, the device ever spontaneously reboots (due to a watchdog timer going off) after a minute or not?

@eeFLis
Copy link
Contributor Author

eeFLis commented Jul 17, 2024

I don't think I've ever seen a reboot but I can't say that for sure as I cropped all my recordings. The longest time recorded after the problem occurred was 20 seconds.

@philwareublox
Copy link

Hi,
Sorry if I am missing something here which has already been discussed, but If you are using Flow control using the DTR line, why not use +UPSV=3 mode which uses the DTR line for power saving, instead of using +UPSV=1 mode which waits for the UART timeout?
(We are still looking into this).
Phil.

@philwareublox
Copy link

In our testing we are finding the watch dog resets the 'hung' module when in this state. The watch dog timer is about 45-60 seconds. Could you retest and wait for 90 seconds to see if the module automatically recovers?

@eeFLis
Copy link
Contributor Author

eeFLis commented Aug 12, 2024

Hi @philwareublox
I can do that but what is the intention?
To see if the module behaves the same in our tests or is it meant as a workaround?

@philwareublox
Copy link

If you are happy with your resetting of the module within 20 seconds then this doesn't matter.
We have tested this issue and found it does occur, but also recovers with a watch dog reboot after 60 seonds.

@eeFLis
Copy link
Contributor Author

eeFLis commented Aug 12, 2024

Unfortunately, we can't use either option. Resetting the module or having the watchdog expire could cause issues in the upper layers of the communication protocol.

Could a firmware update for the module resolve this problem?

@philwareublox
Copy link

Thanks for confirming. Yes it would be expensive and annoying to have to go through the communication yet-again because of a random reboot.

We are still looking into this.

@philwareublox
Copy link

philwareublox commented Aug 14, 2024

Hi @eeFLis,

We have some low level improvements which show our reproduction of the issue is reduced, and continue to investigate. I will ask R&D if this is in a state we could share an ENG version with you.

Out-of-interest, can I ask what the application is? What is the protocol to send the data over the UBXLIB Socket API?
Is there something you could implement that keeps track of the data already sent?

@eeFLis
Copy link
Contributor Author

eeFLis commented Aug 19, 2024

Hi @philwareublox

I can provide details about our application, but it would need to be done through a private channel to comply with company policy.

We use LWM2M at the application layer, and I'm concerned that any workaround at this level would increase power consumption, which is the opposite of our goal for this Module feature.

@philwareublox
Copy link

On the subject of the WatchDog, we can check what the reset/reboot reason was and then know if what you are seeing is the same thing we are seeing in R&D.
I have your email address from previous... I'll email.

@vaussard
Copy link

Hi @RobMeades @philwareublox

Sorry for jumping in, but I have the exact same issue as reported by @eeFLis with a SARA-R510 when using UPSV=3 and the DTR line. We are not using ubxlib, but we implemented the same workaround as U_CELL_PWR_UART_POWER_SAVING_DTR_HYSTERESIS_MS with some extra toggling of the DTR line when detecting that the modem is not waking up. This helps in some cases, but the modem still gets unresponsive a few times per day, which leads to annoying resets. We do not have such issues with the other u-blox modems or when disabling UPSV on the R5 (which is not a viable option for us).

If I understand correctly, you could have a FW update to mitigate that issue ? I would be more than happy to test on my side if this can help you. Do not hesitate to reach out to me if you need any information or more testing, since it is one of the biggest issue that we have with this modem.

@vaussard
Copy link

Thank you @philwareublox I will try to do that today, but this needs some adaptations in our firmware.
Is the hex dump sent before or after the greeting text ?

@u-blox u-blox deleted a comment from vaussard Aug 22, 2024
@philwareublox
Copy link

Hi @vaussard - sorry, it seems these commands are for development and not available/working for production devices.

@vaussard
Copy link

I still gave it a try, but it has indeed no effect with my production modem. Even the watchdog does not seem to be enabled because there is no activity on the RX line after 5 minutes on "unresponsiveness".

Here is an example capture of this issue;

The last 2 commands before the modem becomes unresponsive. Even if DTR is toggled multiple times, the CTS line stays high forever:
image

Last command before becoming unresponsive:
image

The last command was a AT+COPS? with a "OK" response, although we see the issue happening after pretty much any command:
image

@philwareublox what could we do to move forward with that issue ?

@vaussard
Copy link

I updated to the latest released FW (SARA-R510S-01B-01 / 03.30) but it made no difference in the number of times per day that we have to reset the modem to keep it working.

This time it got stuck after AT+UMETRIC?:

image

image

Unresponsive when toggling DTS:
image

I can easily test any new FW, once it is ready to be shared, on the prototype that I have on my desk.
Best regards

@philwareublox
Copy link

Thanks for the update. We have a meeting tomorrow about this issue.
It is very interesting there is no watchdog reset after 60-90 seconds though. Thanks for wait up to 5 mins to check this.

@eeFLis
Copy link
Contributor Author

eeFLis commented Sep 5, 2024

Hi @philwareublox
Is there any update for us following the meeting?

@philwareublox
Copy link

Hi @eeFLis, @vaussard

Unfortunately there are no plans for new firmware on the -01B modules.
Support is now continued with the -02B modules. The next schedule for firmware on -02B is mid next year. -01B modules cannot be upgraded to -02B because of HW changes. There is also the SARA-R520 module to consider.

We are continuing to investigate this issue on the R510-02B and R520 modules.

@vaussard
Copy link

Hello @philwareublox

Thank you for your answer, even if it is not what I was hoping for. We will look into the -02B once it is available.

Regarding the existing -01B modules, do you know of any workaround?

We are currently using UPSV=3, but could mode 1 or 2 avoid or alleviate that issue ? This would need some significant changes in our code, so I prefer to make sure that it is not a dead-end before jumping into it.

@eeFLis
Copy link
Contributor Author

eeFLis commented Sep 12, 2024

Hi @philwareublox

In this case, we will equip the upcoming prototypes with the SARA-R520 series, as it appears to be pin-compatible with the SARA-R510. Or is there anything specific we should take into account?

@RobMeades
Is the SARA-R520 series fully supported by the ubxlib?

@RobMeades
Copy link
Contributor

@RobMeades Is the SARA-R520 series fully supported by the ubxlib?

Yes, it is.

@eeFLis
Copy link
Contributor Author

eeFLis commented Sep 20, 2024

Hi @RobMeades @philwareublox

I saw in the half-year results report that u-blox has stopped the development of their own cellular chips.
Does this mean the SARA-R52 will be the last series to feature this chipset?
What can we expect regarding long-term support?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants