Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RF internals] Modules that work with NonOS 2.2.x don't work with RTOS SDK (GIT8266O-777) #1200

Open
2 tasks done
CarlosGS opened this issue Oct 13, 2022 · 14 comments
Open
2 tasks done

Comments

@CarlosGS
Copy link

  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

Environment

  • Development Kit: ESP-01
  • IDF version: d48c4c1
  • Development Env: Make
  • Operating System: Ubuntu
  • Power Supply: USB, AC-DC supply, tested with Battery too

Problem Description

TLDR: Many ESP8266 modules that work with NonOS SDK 2.2.x, don't work with RTOS SDK, until touching the chip. Asking to investigate internal RF parameters changes since that NonOS version.

Recently we found that some ESP-01 modules just wouldn't connect (prusa3d/Prusa-Firmware-Buddy#2303). Extensive testing included: speed changes (clock, flash freq, serial baudrate), flash parameters (qio/dout/etc), network changes (all parameters), power supply changes, erasing flash, etc. Prusa devs arranged a spreadsheet to annotate which modules work (spreadheet).

  • The problem replicates with the bare RTOS SDK example "getting started/station". Simply running that example and audibly pinging from a computer ping 192.168.1.11 -a -i 0.1 made evident that the connection only works when touching the module. The ping stops when removing the finger, and resumes instantly when touching the chip.
  • Exact setup using Arduino example, and also Platformio example, which rely on NonOS SDK 2.2.x, makes the same module perform flawlessly. So it's not a hardware issue.

Found that many projects are still using an old version of NonOS SDK because of connectivity problems esp8266/Arduino#6724. Moreover, @d-a-v and @Aircoookie pointed out it depends on hardware esp8266/Arduino#5736 (comment) and esp8266/Arduino#5784 (comment), fixed by using NonOS SDK 2.2.x.

Then found a discussion by @TD-er where he also noticed the module would work while touching it esp8266/Arduino#8163. He pointed to the RF calibration & antenna tuning, so I investigated the initialization routine and found some differences (see below). However, I have not been able to solve the problem by changing any RF parameters. Let me know your suggestions or any more testing needed.

Extra info

In case it was related to floating GPIO, also tested PRs #1071, #1024. Or to a rogue timer also tested #1102. Or SPI issues also tested #1128. They work but made no difference here :-/

Tried changing RF parameters. From Arduino to RTOS, there are a few changes in RF initialization:

  • this byte is an 8 instead of 0 (RTOS SDK). Tried changing it.
  • TX power settings have slightly different values. Tested this too.
  • rf_cal_use_flash is the same (1) in arduino and RTOS. I've also tried setting it to 3 to force full RF calibration.
  • freq_correct_en is the same (0) in arduino and RTOS. I've also tried setting it to 1 and 3, to allow frequency correction as suggested by @TD-er.

Expected Behavior

Regular ping responses with ping 192.168.1.11 -a -i 0.3

Actual Behavior

No connection, no ping responses. It only works while keeping a finger on top of the ESP8226 chip.
Exact setup, compiling with NonOS SDK 2.2.x, works flawlessly.

@github-actions github-actions bot changed the title [RF internals] Modules that work with NonOS 2.2.x don't work with RTOS SDK [RF internals] Modules that work with NonOS 2.2.x don't work with RTOS SDK (GIT8266O-777) Oct 13, 2022
@CarlosGS
Copy link
Author

Also, the Prusa Board looks like this:
image

Using jumpers to place the module away allows it to have some (very poor) connectivity. I think the Prusa Buddy board just makes this RF issue more evident because the ESP8266 is close to other metal components.

But again, this is a problem in RTOS SDK. Using NonOS SDK 2.2.x instead connects flawlessly. I hope these findings help pinpoint the underlying difference in RF settings.

@TD-er
Copy link

TD-er commented Oct 13, 2022

This also affects other boards like some Sonoff Basic units.

One other thing you may want to try is to reduce the TX power.
On some boards this also has some positive effect. But this then has to do with signal reflections.
Since you have quite a lot of metal close to the antenna, this may be comparable with the effects seen on an antenna with bad vswr.

One way to make sure the proximity of the metal parts might be an issue is by adding a number of pin headers stacked to place the board further away.

Anyway, kuddos for the amount of work researching the problem, given how many topics you already mentioned.

@CarlosGS
Copy link
Author

Thanks! TX power was set to 20 throughout the tests, did try 19 when looking into power rail stability but it didn't change much. Nice clue that it had a good effect in some boards. It definitely must be something within Espressif's libraries 🤔

@TD-er
Copy link

TD-er commented Oct 13, 2022

Keep in mind the max. set TX power is exactly what its name state... the max.

When connecting via 802.11b it can go up to that TX power.
But 802.11g already is limited to 17.5 (not sure, just what I remember) and 802.11n is max 14.

Setting the TX power "10" points lower, will reduce the power by a factor of 10.
Thus setting it to 0 will be 100x lower TX power than when 20 would have been used.

From 20 dBm to 19 dBm will not make much of a difference, and nothing at all when using WiFi standards from the last 15 years :)
Better set it to the max. minus 10, based on the used connection protocol.

@TD-er
Copy link

TD-er commented Oct 13, 2022

Lowering the TX power may only have an effect when:

  • Voltage regulation is bad
  • Lots of the send radio signals reflect back into the ESP.

If the antenna is badly tuned (metal close to the antenna, as well as some plastics like ABS and your hand may also de-tune an antenna) a lot of the sent out RF energy is reflected back into the radio.
This way you may make the radio less sensitive as it can actually become "deaf" by receiving strong signals.
This can be temporary, but also permanent with enough power. (e.g. sending without any antenna)

So by reducing the TX power, you may end up getting less energy sent back into the ESP.
Still if the antenna is badly tuned, you will actually be less efficient in sending and thus the access point will also receive less signal.
The RSSI value as reported by the AP will be less.
N.B. Do not consider the RSSI value as a single indicator of WiFi connection quality as it has barely any correlation with the SNR. Sometimes even the opposite where a really strong RSSI may be impossible to keep a good connection as the signal strength from the ESP to the AP also has to be good.

@CarlosGS
Copy link
Author

Nice! bringing TX power down to 12dB makes it work 🎉 (13dB just barely, and the default 20dB is out of the question).
Thank you for explaining it in such detail, I really appreciate your help. Definitely the right direction!

Could it be then, NonOS SDK 2.2.x had some internal routine to automatically throttle down TX power? (either during calibration or at runtime). And that routine got changed and arrived broken at RTOS SDK? 🤔

PS: Also checked with an oscilloscope every signal, voltages are rock solid at 3.3V and data pulses are very clean.

@TD-er
Copy link

TD-er commented Oct 13, 2022

I have no idea how the ESP internally may compensate for the antenna length/tuning.
Basically this type of antenna must have a length of 1/4 wavelength, or (1+2k)/4 wavelength, with k = 0, 1, 2, ..... to have the best resonance response.
But there is a number of factors affecting the actual resonance frequency of the antenna. The theoretical length is when the antenna is in ideal conditions.

A few factors affecting the resonance frequency of an antenna:

  • surrounding materials and their dielectric properties. (even the orientation of the fibers in the PCB material have an effect)
  • Conductive material near the antenna (that's how metal detectors work)
  • Orientation and size of the Ground plane

The effect of a mismatch in antenna length is that some of the signal will reflect back into the transmitter.
This is measured as VSWR. Optimal is a ratio of 1.
Upto a VSWR of 1.5 is often considered quite good, as this only results in a reflection of 4% of the energy. Getting higher will significantly increase the losses.

A very simple way of tuning an antenna to its perfect length is simply by changing the length of the antenna.
However that's very hard to do on a PCB etched antenna.
The resonance frequency of an antenna will be lower when it has a "capacitive" mismatch. It will be higher when it has an inductive mismatch. (I hope I did not mix this up in my mind....)

There are several ways to implement an antenna matching network.
The way how such a network is implemented depends on how the inductance or capacitance of the antenna must be corrected to get the best impedance.

I can imagine the ESP internally has a number of parts available to create such an antenna matching network, which can be "connected" or "disconnected" or "shorted".
However, the algorithm for detecting what's needed could have changed, or maybe it is trying in the wrong direction as it may be in a "local minimum". I can imagine such an algorithm may be very sensitive to initial parameters.

Also the amount of reflected energy back into the transmitter may have an effect on such a tuning algorithm.

Since the ESP typically does perform the RF tuning when starting the WiFi, you could start the WiFi with a low TX power and then after the calibration increase the TX power.
You can change the TX power while being connected.
In my project (ESPEasy) I have added an option to make the TX power dynamic.
It does base the TX power on the RSSI value from the AP.
Just assuming the AP will use 14 dBm when sending in 802.11n mode, you can compute the attenuation based on the RSSI value.
Then I just send with enough power to hit the AP with an RSSI right in the "sweet spot" of -60 ... -70.
You can then set some offset to compensate for the losses in your own setup due to imperfect antenna VSWR. (I named it "WiFi Sensitivity Margin")
See ESPEasy documentation

However, this is not enough to make all ESP WiFi related issues disappear.
But I'm glad to know our dearly beloved printers will perhaps get a properly working WiFi interface by reducing the TX power.
I am still waiting for some R'pi zero W units I ordered right away when you introduced the beta for Prusa Connect.

@CarlosGS
Copy link
Author

Not Prusa staff though! I'm just helping with this annoying bug, also for the longevity of plenty ESP8266s still around.

Thanks again for going into detail, it makes a lot of sense as you explain it. Really like your adaptive approach and will give it a try in future implementations 👌

Since the ESP typically does perform the RF tuning when starting the WiFi, you could start the WiFi with a low TX power and then after the calibration increase the TX power.

Sounds like a good next step. Will give it a try this weekend and see.

@TD-er
Copy link

TD-er commented Oct 13, 2022

Not Prusa staff though! I'm just helping with this annoying bug, also for the longevity of plenty ESP8266s still around.

Still, it may result in a proper interface for the Prusa printers :)

Just make sure to also test the WiFi performance with the enclosure closed as the presence of it may also affect the antenna's resonance frequency.

@CarlosGS
Copy link
Author

Tests done! It turns out esp_wifi_set_max_tx_power() is capped to the global ESP8266_PHY_MAX_WIFI_TX_POWER from sdkconfig, so it's not possible to calibrate at a lower power to later increase it.

However, it's interesting that leaving 20dB in sdkconfig (the max), and then setting esp_wifi_set_max_tx_power(48) (eq 12dB) at runtime allows the device to work.
That TX power change happens AFTER calibration... so the problem is not in the calibration routine, but in the low level RF transmission part 🤔

@CarlosGS
Copy link
Author

@espressif Please look into this, the problem is that wifi connection doesn't work until TX power is reduced.
This is an issue with the RF driver in RTOS SDK. And it occurs in ESP8226 modules with poor antenna matching that did work correctly with the old NonOS SDK 2.2.x. It seems your previous gain controller was more robust?

@TD-er
Copy link

TD-er commented Nov 10, 2022

Tests done! It turns out esp_wifi_set_max_tx_power() is capped to the global ESP8266_PHY_MAX_WIFI_TX_POWER from sdkconfig, so it's not possible to calibrate at a lower power to later increase it.

However, it's interesting that leaving 20dB in sdkconfig (the max), and then setting esp_wifi_set_max_tx_power(48) (eq 12dB) at runtime allows the device to work. That TX power change happens AFTER calibration... so the problem is not in the calibration routine, but in the low level RF transmission part 🤔

Hmm I had missed this post somehow.

Anyway, can you check by measuring the current consumption (best to use some lab power supply as this regulates the voltage after the current measurement) to see if you actually increase the TX power?
Would not surprise me when the actual setting is not resulting in a higher TX power, but rather a lower TX power.

@CarlosGS
Copy link
Author

Note the call esp_wifi_set_max_tx_power(48) has different units, it sets the TX power to 12dB, reducing power as we wanted.

@CarlosGS
Copy link
Author

Got confirmation from a couple users, the problem and the patch replicate behavior exactly.
Now awaiting @espressif to tackle this bug when possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants