Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core 2.5.0] Loop Count much lower than with 2.4.2 #5513

Closed
5 of 6 tasks
TD-er opened this issue Dec 17, 2018 · 60 comments
Closed
5 of 6 tasks

[core 2.5.0] Loop Count much lower than with 2.4.2 #5513

TD-er opened this issue Dec 17, 2018 · 60 comments
Assignees
Labels
waiting for feedback Waiting on additional info. If it's not received, the issue may be closed.

Comments

@TD-er
Copy link
Contributor

TD-er commented Dec 17, 2018

Basic Infos

  • This issue complies with the issue POLICY doc.
  • I have read the documentation at readthedocs and the issue is not addressed there.
  • I have tested that the issue is present in current master branch (aka latest git).
  • I have searched the issue tracker for a similar issue.
  • If there is a stack dump, I have decoded it.
  • I have filled out all fields below.

Platform

  • Hardware: [ESP-12|ESP-01|ESP-07|ESP8285 device|other]
  • Core Version: [0fd86a0]
  • Development Env: [Platformio]
  • Operating System: [Windows|Ubuntu]

Settings in IDE

  • Module: [Generic ESP8266 Module|other]
  • Flash Mode: [dio]
  • Flash Size: [4MB/1MB]
  • lwip Variant: [PIO_FRAMEWORK_ARDUINO_LWIP2_LOW_MEMORY_LOW_FLASH]
  • Reset Method: [-]
  • Flash Frequency: [40Mhz]
  • CPU Frequency: [80Mhz]
  • Upload Using: [OTA|SERIAL]
  • Upload Speed: [115200] (serial upload only)

Problem Description

Loop Count in loops per sec. is much lower than with core 2.4.2. It was also mentioned here: #5476 (comment)

I have several test setups running core 2.4.2 and core 2.5.0 (see latest nightly builds of ESPeasy)
For example 2 Sonoff POW r2 nodes running the exact same configuration, the loopcount (per sec) of both differs quite a lot:

Core 2.4.2:
image

Core 2.5.0 beta1:
image

N.B. Not sure yet why in the node running 2.4.2 less things are being shown.

In general (among my test nodes) the loop-count is:

  • 4000 - 11000 when using core 2.4.2
  • 1100 - 2300 when using core 2.5.0 (4200 when doing nothing)

As can be seen from both Sonoff POW r2 nodes, the loop-count is:

  • 4200 for core 2.4.2
  • 1148 for core 2.5.0
@d-a-v
Copy link
Collaborator

d-a-v commented Dec 17, 2018

SensorSendTask and sendData are highlighted in 2.5 but not present in 2.4, but you posted them before.

What's strange to me is

lwip Variant: [PIO_FRAMEWORK_ARDUINO_LWIP2_LOW_MEMORY_LOW_FLASH]

because it is Arduino IDE's option "lwIP-v2 (lower memory) (no features)", which has the same/default settings as in release 2.4.2 (although upstream lwIP has changed from 2.0 to 2.1.2 in between).

Do you have the same results with PIO_FRAMEWORK_ARDUINO_LWIP2_LOW_MEMORY (which is current default Arduino IDE configuration)?

You can also try to use previous lwIP version in core 2.5:

cd tools/sdk/lwip2
make download
cd builder
git checkout arduino-2.4.2
cd ..
make clean
make install

and replace liblwip-536 by liblwip_536 in tools/platformio-build.py L164

Next step would be to try the same SDK version as we had in 2.4.2.

@d-a-v d-a-v self-assigned this Dec 17, 2018
@d-a-v d-a-v added the waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. label Dec 17, 2018
@TD-er
Copy link
Contributor Author

TD-er commented Dec 17, 2018

SensorSendTask and sendData are highlighted in 2.5 but not present in 2.4, but you posted them before.

I already found why that was in my (new installed) core 2.4.2 node.
I forgot to set the plugin interval timer for that one (ESPeasy config related), after setting that it is now reporting the timers for these function calls too.
image

and replace liblwip-536 by liblwip_536 in tools/platformio-build.py L164

Isn't that the same or do I need some more coffee?
Can I also use a different define to switch LWIP or do I need to build it like you described?

@d-a-v
Copy link
Collaborator

d-a-v commented Dec 17, 2018

Isn't that the same or do I need some more coffee?

Get more coffee :) (edit: dash / underscore)

@TD-er
Copy link
Contributor Author

TD-er commented Dec 17, 2018

Running with PIO_FRAMEWORK_ARDUINO_LWIP2_LOW_MEMORY is the same:

Loop   244840 1187.64 0.391 0.816 512.279

I will now test with previous LWIP in core 2.5.0 (just finished my coffee ;) )

@TD-er
Copy link
Contributor Author

TD-er commented Dec 17, 2018

Hmm, I cannot run make download like described.

gijs@Imagine:~/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/tools/sdk/lwip2$ make
all               builder/          clean             download          downloadmaster    downloadupstream  install           latestmaster      latestupstream
gijs@Imagine:~/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/tools/sdk/lwip2$ make download
make: Nothing to be done for 'download'.

@d-a-v
Copy link
Collaborator

d-a-v commented Dec 17, 2018

Then is it already downloaded in builder/
Is this directory empty ?

@TD-er
Copy link
Contributor Author

TD-er commented Dec 17, 2018

No it is not empty
Seems I don't have the complete build toolchain?

gijs@Imagine:~/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/tools/sdk/lwip2$ cd builder/
gijs@Imagine:~/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/tools/sdk/lwip2/builder$ git checkout arduino-2.4.2
M       lwip2-src
Previous HEAD position was c434c6f less verbose make (use standard V=1 to make it verbose)
HEAD is now at 163bb82 fix setting down interface on the esp side (ip must not be set to 0, and properly re/set FLAG_UP)
gijs@Imagine:~/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/tools/sdk/lwip2/builder$ cd ..
gijs@Imagine:~/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/tools/sdk/lwip2$ make clean
make -C builder -f Makefile.arduino clean
make[1]: Entering directory '/home/gijs/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/tools/sdk/lwip2/builder'
make -f makefiles/Makefile.build-lwip2 \
                target=arduino \
                SDK=../.. \
                LWIP_ESP=../../lwip/include \
                LWIP_LIB=liblwip2.a \
                LWIP_LIB_RELEASE=../../lib/liblwip2.a \
                LWIP_INCLUDES_RELEASE=../include \
                TOOLS=../../../xtensa-lx106-elf/bin/xtensa-lx106-elf- \
                TCP_MSS=536 BUILD=build-536 \
                clean
make[2]: Entering directory '/home/gijs/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/tools/sdk/lwip2/builder'
rm -rf build-536 liblwip2.a glue-lwip/lwip-err-t.h glue-lwip/lwip-git-hash.h
make[2]: Leaving directory '/home/gijs/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/tools/sdk/lwip2/builder'
make -f makefiles/Makefile.build-lwip2 \
                target=arduino \
                SDK=../.. \
                LWIP_ESP=../../lwip/include \
                LWIP_LIB=liblwip2_1460.a \
                LWIP_LIB_RELEASE=../../lib/liblwip2_1460.a \
                LWIP_INCLUDES_RELEASE=../include \
                TOOLS=../../../xtensa-lx106-elf/bin/xtensa-lx106-elf- \
                TCP_MSS=1460 BUILD=build-1460 \
                clean
make[2]: Entering directory '/home/gijs/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/tools/sdk/lwip2/builder'
rm -rf build-1460 liblwip2_1460.a glue-lwip/lwip-err-t.h glue-lwip/lwip-git-hash.h
make[2]: Leaving directory '/home/gijs/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/tools/sdk/lwip2/builder'
make[1]: Leaving directory '/home/gijs/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/tools/sdk/lwip2/builder'
gijs@Imagine:~/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/tools/sdk/lwip2$ make install
make -C builder -f Makefile.arduino install
make[1]: Entering directory '/home/gijs/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/tools/sdk/lwip2/builder'
make -f makefiles/Makefile.build-lwip2 \
                target=arduino \
                SDK=../.. \
                LWIP_ESP=../../lwip/include \
                LWIP_LIB=liblwip2.a \
                LWIP_LIB_RELEASE=../../lib/liblwip2.a \
                LWIP_INCLUDES_RELEASE=../include \
                TOOLS=../../../xtensa-lx106-elf/bin/xtensa-lx106-elf- \
                TCP_MSS=536 BUILD=build-536 \
                install
make[2]: Entering directory '/home/gijs/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/tools/sdk/lwip2/builder'
( \
echo "// script-generated, extracted from espressif SDK's lwIP arch/cc.h"; \
echo "#define LWIP_NO_STDINT_H 1"; \
grep -e LWIP_ERR_T -e ^typedef ../../lwip/include/arch/cc.h \
) > glue-lwip/lwip-err-t.h
makefiles/make-lwip-hash
make -f makefiles/Makefile.glue-esp BUILD=build-536 TCP_MSS=536
make[3]: Entering directory '/home/gijs/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/tools/sdk/lwip2/builder'
../../../xtensa-lx106-elf/bin/xtensa-lx106-elf-gcc -std=gnu99 -Wall -Wextra -c -Os -g -Wpointer-arith -Wl,-EL -fno-inline-functions -nostdlib -mlongcalls -mtext-section-literals -falign-functions=4 -MMD -ffunction-sections -fdata-sections -D__ets__ -DICACHE_FLASH -U__STRICT_ANSI__ -DLWIP_OPEN_SRC -DLWIP_BUILD -DUSE_OPTIMIZE_PRINTF -DTARGET=arduino -DTCP_MSS=536 -Ibuild-536 -I../../include -I../../lwip/include -Iglue glue-esp/lwip-esp.c -o build-536/glue-esp/lwip-esp.o
make[3]: ../../../xtensa-lx106-elf/bin/xtensa-lx106-elf-gcc: Command not found
makefiles/Makefile.rules:7: recipe for target 'build-536/glue-esp/lwip-esp.o' failed
make[3]: *** [build-536/glue-esp/lwip-esp.o] Error 127
make[3]: Leaving directory '/home/gijs/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/tools/sdk/lwip2/builder'
makefiles/Makefile.build-lwip2:47: recipe for target 'liblwip2.a' failed
make[2]: *** [liblwip2.a] Error 2
make[2]: Leaving directory '/home/gijs/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/tools/sdk/lwip2/builder'
Makefile.arduino:6: recipe for target 'install' failed
make[1]: *** [install] Error 2
make[1]: Leaving directory '/home/gijs/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/tools/sdk/lwip2/builder'
Makefile:3: recipe for target 'install' failed
make: *** [install] Error 2

@d-a-v
Copy link
Collaborator

d-a-v commented Dec 17, 2018

Ah this is platformio, builder/Makefile.arduino would need an update of this variable
TOOLS=../../../xtensa-lx106-elf/bin/xtensa-lx106-elf- pointing to the right gcc directory.

You can also take already compiled binaries from 2.4.2 release (liblwip2.a - sorry in fact there is no dash/underscore, I'll get some coffee too)

In any case you need to add this line in tools/sdk/ld/eagle.app.v6.common.ld.h

    *liblwip2.a:(.literal .text .literal.* .text.*)                <- to add
    *liblwip2-536.a:(.literal .text .literal.* .text.*)            <- already present

@TD-er
Copy link
Contributor Author

TD-er commented Dec 17, 2018

I have downloaded the binary, added it to the ld.h file and also changed this line

and replace liblwip-536 by liblwip_536 in tools/platformio-build.py L164
, only then into 'liblwip2
Also reset the PIO_FRAMEWORK_ARDUINO_LWIP2_LOW_MEMORY_LOW_FLASH parameter.

But the resulting binary does not boot.
Any other tips?

@d-a-v
Copy link
Collaborator

d-a-v commented Dec 17, 2018

Any other tips?

I tried myself, and indeed it is not as simple as it seems. I will find some time to provide a lwIP-2.0.3 binary compatible with core 2.5.0 if the other leads are not conclusive.

Can you point to the source code of these sendData and SensorSendTask to check which parts of the core are involved ?

@TD-er
Copy link
Contributor Author

TD-er commented Dec 17, 2018

Can you point to the source code of these sendData and SensorSendTask to check which parts of the core are involved ?

These are my own functions and they take about the same time on both cores.

SensorSendTask does call sendData. So if sendData takes quite some time, the other one will too.

In short:

SensorSendTask does collect the data from a plugin and performs some string replacements + calculates some formulas if needed.
This collecting data can also involve some reading from the sensor. Not all plugins in ESPeasy are yet converted to do the reading in the background, so some may do quite a lot while reading.

sendData does send the sensor data to one or more selected controllers and triggers some rules.
The rules are our own script language. At least it is about reading from SPIFFS and parsing rules (could benefit from some optimizations).
The rules parsing may also do a lot using String objects.
Apart from the rules, some parts of the settings file are also read from SPIFFS.

A controller is something which sends data to a remote computer, formatted for a specific use case (e.g. Domoticz MQTT, OpenHAB, Http, UDP, ESPeasy-p2p, etc.)
For almost all controllers, I have added some kind of buffer system to allow the controllers to send data (and retry) at some predefined pace specific for that controller. (e.g. Thingspeak only allows a message per N seconds for the free accounts and you don't want to wait for that)
So for most controllers, this sendData is mainly about storing it into a buffer.
The stored messages will later be processed by our own scheduler, so the actual network traffic is not being counted in the sendData call (except for the few plugins not yet using a buffer)

@TD-er
Copy link
Contributor Author

TD-er commented Dec 17, 2018

About the impact of those functions:
image
These timings are covering about 7900 seconds (2h19m)
There were 132 calls to these functions, which is about once a minute.
That's about 100 msec per minute. The rest of the time is spent in other areas of the code.

In a loop() it is mainly a call for the scheduler.
This scheduler does use STL containers. You had moved some stuff of that to flash? Could that affect performance?
See msecTimerHandlerStruct which is essentially the scheduler's heart.

By the way, these functions should be in the standard core lib:

long timeDiff(unsigned long prev, unsigned long next);
long timePassedSince(unsigned long timestamp);
boolean timeOutReached(unsigned long timer);
long usecPassedSince(unsigned long timestamp);
boolean usecTimeOutReached(unsigned long timer);

@TD-er
Copy link
Contributor Author

TD-er commented Dec 17, 2018

As a test, I have changed the flash frequency to 80 MHz

Frequency Description #calls call/sec min (ms) Avg (ms) max (ms)
40 MHz Loop 10806208 1367.80 0.516 0.668 896.423
80 MHz Loop 441145 2554.65 0.303 0.374 258.775

Same node running same config @ core 2.5.0

@earlephilhower
Copy link
Collaborator

This scheduler does use STL containers. You had moved some stuff of that to flash? Could that affect performance?

No, std::* was always in flash until I made it appear in IRAM it when I added the -nox library. That wouldn't explain a difference.

@d-a-v
Copy link
Collaborator

d-a-v commented Dec 19, 2018

Can you instrument separately in sendData sub-bocks like (at least) this one CPlugin_ptr[event->ProtocolIndex](CPLUGIN_PROTOCOL_SEND, event, dummyString); and these two ones LoadTaskSettings(event->TaskIndex); (they are potentially taking time).

@TD-er
Copy link
Contributor Author

TD-er commented Dec 19, 2018

Adding timers you mean, so we can compare those parts?

@d-a-v
Copy link
Collaborator

d-a-v commented Dec 19, 2018

Yes

@TD-er
Copy link
Contributor Author

TD-er commented Dec 19, 2018

I added a bit more (commit will be pushed in a minute)
image

image

As can be seen, the backgroundTasks() and handle_schedule() function are taking quite a bit longer.

@TD-er
Copy link
Contributor Author

TD-er commented Dec 19, 2018

Running for about 10 minutes:

Core 2.5.0:

Description Function #calls call/sec min (ms) Avg (ms) max (ms)
Load File   11 0.02 19.153 19.551 21.246
Loop   842552 1309.42 0.479 0.737 391.926
Plugin call 50 p/s   32103 49.89 2.291 2.385 2.727
Plugin call 10 p/s   6429 9.99 2.572 2.633 2.924
Plugin call 10 p/s U   6429 9.99 0.276 0.299 0.340
Plugin call 1 p/s   644 1.00 2.928 3.301 22.692
SensorSendTask()   11 0.02 98.892 102.294 107.163
sendData()   11 0.02 96.084 99.625 104.560
Compute formula   11 0.02 0.014 0.014 0.015
setNewTimerAt()   42468 66.00 0.175 0.197 0.269
Delay queue MQTT   11 0.02 1.847 1.868 1.885
WiFi.isConnected()   2488246 3867.00 0.015 0.024 0.113
rulesProcessing()   55 0.09 14.259 15.177 17.924
backgroundtasks()   1643099 2553.55 0.138 0.189 391.306
handle_schedule() idle   800107 1243.45 0.243 0.317 391.463
handle_schedule() task   42446 65.97 0.383 2.767 107.564

Core 2.4.2:

Description Function #calls call/sec min (ms) Avg (ms) max (ms)
Load File   10 0.02 18.154 18.473 19.223
Loop   2454654 4049.56 0.151 0.237 259.272
Plugin call 50 p/s   30267 49.93 1.323 1.409 1.856
Plugin call 10 p/s   6060 10.00 1.456 1.550 2.033
Plugin call 10 p/s U   6060 10.00 0.155 0.210 0.262
Plugin call 1 p/s   606 1.00 2.017 2.295 15.314
SensorSendTask()   10 0.02 67.307 72.162 78.346
sendData()   10 0.02 65.414 70.261 76.438
Compute formula   10 0.02 0.010 0.010 0.011
setNewTimerAt()   40034 66.05 0.146 0.177 0.248
Delay queue MQTT   10 0.02 1.567 1.615 1.644
WiFi.isConnected()   7326792 12087.36 0.004 0.004 0.111
rulesProcessing()   50 0.08 7.650 9.592 13.117
backgroundtasks()   4869693 8033.77 0.031 0.059 258.065
handle_schedule() idle   2414639 3983.55 0.065 0.098 3.018
handle_schedule() task   40014 66.01 0.264 1.707 78.615

@TD-er
Copy link
Contributor Author

TD-er commented Dec 19, 2018

And the same nodes, running for 24'447 sec:

image

image

Some counters probably have had some overflow, since the average value displayed is lower than the minimum.
But the trend is very well visible.
Running code on the core 2.5.0 is slower than on core 2.4.2

@earlephilhower
Copy link
Collaborator

Just a note, the graphic image posts are much harder to use than the text ones you started with. Can't dump it into a .ODS or .XLS and compare automatically.

That said, since you know your code better than we do, do you have one specific section that's egregious and can be isolated? I know your app is reporting fewer loops/s, but doing perf analysis at that level from the outside is not a fast process.

If there's some small, isolated thing (i.e. this routine X takes 100 numbers, averages them, and sends to a MQTT endpoint) where it's more approachable we'd stand a better chance. Especially if it's something you can hack out into a separate sketch (even if the input and out data its sending is garbage), that means it can be repro'd and instrumented at the low-level needed to identify what's up and if it's in the core, a lib, of the SDK. The smaller the code bit and fewer libs needed, the better, for this kind of thing.

@TD-er
Copy link
Contributor Author

TD-er commented Dec 19, 2018

I understand.
One of the striking things in these timing results to me is the delay in the rules handling.
The active rules are not communicating to the outside world, just reacting on an event and setting a GPIO pin.
There are 2 main things at work then:

  • Reading from SPIFFS
  • Parsing Strings (and lots of String operations)

Those String operations can do a lot of memory allocations too, which could also explain the difference when sending data to a controller, since it is then placed in some buffer.

So my 2 main suspects right now are:

  • SPIFFS handling
  • Memory allocation

I think for both a fairly small test could be written.
Did the SDK change or the core 2.5.0 update also include some changes in moving code to flash?
If more code is now executed from flash, that could also explain these slow-downs.

@earlephilhower
Copy link
Collaborator

To make it clear, code was always in flash. I introduced a bug in the -noexceptions option where instead of placing those std:: functions in flash it ate up IRAM. IRAM is 16KB total, ~10KB taken by the hidden SDK functions, and the rest is used for interrupt-safe routines, certain library routines that may be called from the WiFi stack, etc. Today the code is all landing in flash unless you say "place in IRAM" which is the correct behavior, and matches prior releases.

Memory management lives in cores/esp8266/umm_malloc/umm_malloc.c and is pretty much untouched in the past 2+ years AFAICT. Couple bug fixes in the log, but not seeing any red flags.

SPIFFS can be very variable and slow in my experience. I didn't dig into it, but it seems to have very high variability in response time even on the same hardware (so it's not flash speed related). The git log's (cores/esp8266/spiffs_api.c) is also pretty much untouched for 3 years now.

Would love to get a test that can be used for isolated A/B comparisons, because I think it's elsewhere...

@devyte
Copy link
Collaborator

devyte commented Dec 20, 2018

@earlephilhower could something be hitting the SDK exception handler for reading unaligned addresses?

@igrr
Copy link
Member

igrr commented Dec 20, 2018

Might be worth disabling unaligned access handler, at least in debug builds?

@earlephilhower
Copy link
Collaborator

@devyte, if @TD-er's running the same code on both builds then the code would crash hard on the older version, so I think it's probably not too likely.

@igrr's point about disabling in debug builds might be good, but @TD-er's app most likely can't even build in debug mode (umm_malloc grows a lot w/heap checking and there are many IRAM/IRQ user routines in his build).

@d-a-v
Copy link
Collaborator

d-a-v commented Feb 3, 2019

I measured loop itself (time between end of loop and beginning of next loop and it is unchanged, around 5.8us, with 2.4.2 and 2.5.0beta). Then the cause is not the firmware update. It is somewhere inside our core library.

@TD-er can you isolate one single relevant test or function call for us to reproduce and dig into ?

@TD-er
Copy link
Contributor Author

TD-er commented Feb 4, 2019

@d-a-v
Sadly today I didn't have the time I hoped to have for these tests, so for now I don't have a single test which shows the impact of the delay.

@d-a-v
Copy link
Collaborator

d-a-v commented Feb 17, 2019

Will you be able to try with #5763 ?

@TD-er
Copy link
Contributor Author

TD-er commented Feb 17, 2019

So I just build with a define for NONOSDK221 ?

@TD-er
Copy link
Contributor Author

TD-er commented Feb 17, 2019

I tried building it several times (PlatformIO staged, so it should use your master branch), but I keep getting this as a build: ESP82xx Core 2.6.0-dev, NONOS SDK 3.0.0-dev(c0f7b44), LWIP: 2.1.2 PUYA support
N.B. The "Core 2.5.0-dev" is something I add in the build.
I added the NONOSDK221 as a define in the build.
I also tried to add board_build.sdk = NONOSDK221 to the platform.ini section for this build, but it keeps reporting the NONOS SDK 3.0.0-dev.

So can you give me some hints on what I need to do to make it building a NONOS SDK 2.2.1?

@d-a-v
Copy link
Collaborator

d-a-v commented Feb 17, 2019 via email

@TD-er
Copy link
Contributor Author

TD-er commented Feb 17, 2019

Not sure why, but the files I get when using PlatformIO feature/stage is not having the latest changes you have in that PR.

Sometimes I just feel a bit lost in all abstractions made for PlatformIO to make it all "easier" to work with.
I still don't know what is configured where, what files are defined where and collected from where.
It is a bit too obfuscated to see through at once and I feel like I need to make some drawing or something to get where all parts are coming from.
These abstractions are great for normal workflows, but if you want to use a specific part from the core libs, it isn't that clear where to set things to get it right.

@TD-er
Copy link
Contributor Author

TD-er commented Feb 17, 2019

Ah, now I get it, it wasn't merged yet.
Will fetch this PR manually and try to build then

@TD-er
Copy link
Contributor Author

TD-er commented Feb 17, 2019

OK, I did a checkout of your PR and built it.
But still the "NONOS SDK 3.0.0-dev(c0f7b44), LWIP: 2.1.2 PUYA support"
So what do I need to change, and where is it located?
This is the top of my git log :

commit d0e49a65500c0528bc4063f841971bfd1b203b55 (HEAD)
Merge: 14f1b1d4 3dd54fd3
Author: david gauchard <[email protected]>
Date:   Sun Feb 17 04:29:06 2019 +0000

    Merge 3dd54fd3689636d95fa3eb3328d39f2f536979b9 into 14f1b1d4a74a2e65207f23aedabcc7038ffae652

@d-a-v
Copy link
Collaborator

d-a-v commented Feb 17, 2019

With platformio,
you need to change in tools/platformio-build.py the two occurences of NONOSDK3V0 to NONOSDK221.
I don't know how it could be made better, not being a pio user.

edit: maybe by using boards.txt.py like for arduino ide

@TD-er
Copy link
Contributor Author

TD-er commented Feb 17, 2019

That worked :)

ESP82xx Core 2.6.0-dev, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.1.2 PUYA support

I will let it run for a few minutes to see what the loop-count is and get you an Excel sheet with before- and after timings of that node.

Edit:
First impression:

  • core 2.4.2: 12.90% (LC=6134)
  • SDK 3.0.0: 19.20% (LC=2471)
  • SDK 2.2.1: 12.40% (LC=6902)

@TD-er
Copy link
Contributor Author

TD-er commented Feb 17, 2019

And here the comparison of the 3 versions I already mentioned in the previous post.
Compare_core250_SDK3-0-0_vs_SDK2-1-1.zip

And test build used for this

@Swiftnesses
Copy link

Do you also get slow ping responses on SDK 3.0.0?

@TD-er
Copy link
Contributor Author

TD-er commented Feb 17, 2019

@Swiftnesses
Both nodes run the same hardware & config, main difference is the core lib version.
These are both Sonoff POW r2 nodes.

C:\Users\gijs>ping -n 10 192.168.1.112

Pinging 192.168.1.112 with 32 bytes of data:
Reply from 192.168.1.112: bytes=32 time=2ms TTL=255
Reply from 192.168.1.112: bytes=32 time=2ms TTL=255
Reply from 192.168.1.112: bytes=32 time=2ms TTL=255
Reply from 192.168.1.112: bytes=32 time=3ms TTL=255
Reply from 192.168.1.112: bytes=32 time=4ms TTL=255
Reply from 192.168.1.112: bytes=32 time=5ms TTL=255
Reply from 192.168.1.112: bytes=32 time=2ms TTL=255
Reply from 192.168.1.112: bytes=32 time=18ms TTL=255
Reply from 192.168.1.112: bytes=32 time=3ms TTL=255
Reply from 192.168.1.112: bytes=32 time=2ms TTL=255

Ping statistics for 192.168.1.112:
    Packets: Sent = 10, Received = 10, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 2ms, Maximum = 18ms, Average = 4ms

C:\Users\gijs>ping -n 10 192.168.1.98

Pinging 192.168.1.98 with 32 bytes of data:
Reply from 192.168.1.98: bytes=32 time=4ms TTL=255
Reply from 192.168.1.98: bytes=32 time=1ms TTL=255
Reply from 192.168.1.98: bytes=32 time=1ms TTL=255
Reply from 192.168.1.98: bytes=32 time=3ms TTL=255
Reply from 192.168.1.98: bytes=32 time=1ms TTL=255
Reply from 192.168.1.98: bytes=32 time=3ms TTL=255
Reply from 192.168.1.98: bytes=32 time=2ms TTL=255
Reply from 192.168.1.98: bytes=32 time=1ms TTL=255
Reply from 192.168.1.98: bytes=32 time=3ms TTL=255
Reply from 192.168.1.98: bytes=32 time=2ms TTL=255

Ping statistics for 192.168.1.98:
    Packets: Sent = 10, Received = 10, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 1ms, Maximum = 4ms, Average = 2ms
  • .98 => ESP82xx Core 2_4_2, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.0.3 PUYA support
  • .112 => ESP82xx Core 2.5.0-beta2, NONOS SDK 3.0.0-dev(c0f7b44), LWIP: 2.1.2 PUYA support

@d-a-v
Copy link
Collaborator

d-a-v commented Feb 17, 2019

First impression:
    * core 2.4.2: 12.90% (LC=6134)
    * SDK 3.0.0: 19.20% (LC=2471)
    * SDK 2.2.1: 12.40% (LC=6902)

Per #5736, this thread, my personal experience with esp8266Audio (with which performance difference can be heard), I think we should revert.

However current sdk-v3 brings some wifi fixes especially noticed by tasmota project @ascillato .

If we had to revert, it could be with #5763 (default: sdk2.2.1, allowing to use sdk-pre-3.0.0 when needed).

@TD-er
Copy link
Contributor Author

TD-er commented Feb 17, 2019

Is it possible to create some 'board' definition which will force it to use the SDK 2.2.1?
Then we can build both and just switch between the SDK versions.
This way the building of both will be tested for a longer period and users will be able to test both.
Or some flag in PlatformIO definition to make it easy to define a build to use a specific SDK version.

What specific WiFi changes/patches are included in SDK 3.x, which are so useful?
I know @arendst mentioned deep sleep mode a few times, are these fixes related to that?

@d-a-v
Copy link
Collaborator

d-a-v commented Feb 18, 2019

Is it possible to create some 'board' definition which will force it to use the SDK 2.2.1?

A menu only for generic esp8266 board could be added. And sdk back to 2.2.1 for all other boards.

Or some flag in PlatformIO definition to make it easy to define a build to use a specific SDK version.

A flag can be added in pio.

@d-a-v
Copy link
Collaborator

d-a-v commented Feb 19, 2019

#5763 is ready to test.

some flag in PlatformIO definition to make it easy to define a build to use a specific SDK version.

Included (search for SDK in platformio-build.py). I am not able to test with PIO.

@TD-er
Copy link
Contributor Author

TD-er commented Feb 19, 2019

Juist building based on PR #5763 results in a build with this SDK:
ESP82xx Core 2.6.0-dev, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.1.2 PUYA support

And by just adding this single define in the platformio.ini PIO_FRAMEWORK_ARDUINO_ESPRESSIF_SDK3 I get a build with this SDK:
ESP82xx Core 2.6.0-dev, NONOS SDK 3.0.0-dev(c0f7b44), LWIP: 2.1.2 PUYA support

Just like this in our platformio.ini file:
build_flags = ${esp82xx_2_5_0.build_flags} -DPIO_FRAMEWORK_ARDUINO_ESPRESSIF_SDK3

So it seems to be working and quite easy to switch.
As soon as this has been merged, I will add the SDK3.0 to our nightly builds (marked as 2.6.0 alpha)

Will this merge lead to a "2.5.1" label?

@d-a-v
Copy link
Collaborator

d-a-v commented Feb 19, 2019

Thanks for testing

Will this merge lead to a "2.5.1" label?

Yes I think there will be a fix release, though I can't exactly say when

d-a-v added a commit that referenced this issue Feb 19, 2019
…eneric board only (#5763)

This commit allows switching SDK firmware:

nonos-sdk-pre-v3 shipped with release 2.5.0 has issues:

    * Some boards show erratic behavior (radio connection is quickly lost), with an unknown cause.
      These boards work well with previous nonos-sdk-2.2.1 firmware (#5736)

    * Overall performances seem to have decreased (#5513)

This PR restores sdk2.2.1 (as in core-2.4.2).

SDK-pre-3.0 - which has brought long awaited fixes (WiFi sleep modes) - is still available through a menu option available only with generic board.

BREAKING

    * new define `-DNONOSDK221=1` or `-DNONOSDK3V0=1`

    * for external build systems: new library directory: `tools/sdk/lib/<version>/lib`

    * PIO: variable `PIO_FRAMEWORK_ARDUINO_ESPRESSIF_SDK3` is needed for sdk-pre-v3.


Fix #5736
@dm5xx
Copy link

dm5xx commented Apr 6, 2019

Hi guys. I tried to change to 2.5.0 today and realized the same thing. 2.4.1 all was fast, switching to 2.5.0, wemos d1 mini was slooooow and with high latency. mike.

@devyte
Copy link
Collaborator

devyte commented Nov 10, 2019

@TD-er this issue doesn't apply to our current builds due to our using SDK 2.2.x . It would apply if we were to switch to SDK 3.x, but I rather cross that bridge when we get to it.
I'm closing this for now. Let's revisit when we look at SDK 3.x again.

@devyte devyte closed this as completed Nov 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting for feedback Waiting on additional info. If it's not received, the issue may be closed.
Projects
None yet
Development

No branches or pull requests

7 participants