Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP8266WebServer hangs after updating to 2.5.0 #5736

Closed
4 of 6 tasks
moose4lord opened this issue Feb 7, 2019 · 22 comments · Fixed by #5763
Closed
4 of 6 tasks

ESP8266WebServer hangs after updating to 2.5.0 #5736

moose4lord opened this issue Feb 7, 2019 · 22 comments · Fixed by #5763

Comments

@moose4lord
Copy link

Basic Infos

  • This issue complies with the issue POLICY doc.
  • I have read the documentation at readthedocs and the issue is not addressed there.
  • I have tested that the issue is present in current master branch (aka latest git).
  • I have searched the issue tracker for a similar issue.
  • If there is a stack dump, I have decoded it.
  • I have filled out all fields below.

Platform

  • Hardware: ESP8266
  • Core Version: 2.5.0
  • Development Env: Arduino IDE
  • Operating System: MacOS

Settings in IDE

  • Module: NodeMCU1.0
  • Flash Mode: n/a
  • Flash Size: 4MB/3MB
  • lwip Variant: v2 Lower Memory
  • Reset Method: n/a
  • Flash Frequency: n/a
  • CPU Frequency: 80Mhz
  • Upload Using: SERIAL
  • Upload Speed: 115200

Problem Description

After updating to v2.5.0 my sketch can't get a reliable connection to the ESP8266WebServer server. Trying to connect to the ESP from a browser succeeds occasionally, but most of the time the browser hangs and eventually gives up.

I loaded the simple HelloServer example sketch that comes with ESP8266WebServer and it does the same thing. The browser just sits there. Tried various lwIP variants, but none helped. Oddly, turning on all debug options made the problem happen less often.

It seems related to issue #5725, although for me ping requests constantly timeout.

Revert back to v2.4.2 and the problem goes away.

MCVE Sketch

Use the HelloServer example sketch provided with ESP8266WebServer.

Debug Messages

Problem goes away (or at least gets much better) when all debug options are enabled.

@moose4lord
Copy link
Author

Another clue, if I comment out the MDNS.begin("esp8266") and the MDNS.update() statements in the HelloServer sketch, the problem goes away. Here's the sketch:

#include <ESP8266WiFi.h>
#include <WiFiClient.h>
#include <ESP8266WebServer.h>
#include <ESP8266mDNS.h>

#ifndef STASSID
#define STASSID "your-ssid"
#define STAPSK  "your-password"
#endif

const char* ssid = STASSID;
const char* password = STAPSK;

ESP8266WebServer server(80);

const int led = 13;

void handleRoot() {
  digitalWrite(led, 1);
  server.send(200, "text/plain", "hello from esp8266!");
  digitalWrite(led, 0);
}

void handleNotFound() {
  digitalWrite(led, 1);
  String message = "File Not Found\n\n";
  message += "URI: ";
  message += server.uri();
  message += "\nMethod: ";
  message += (server.method() == HTTP_GET) ? "GET" : "POST";
  message += "\nArguments: ";
  message += server.args();
  message += "\n";
  for (uint8_t i = 0; i < server.args(); i++) {
    message += " " + server.argName(i) + ": " + server.arg(i) + "\n";
  }
  server.send(404, "text/plain", message);
  digitalWrite(led, 0);
}

void setup(void) {
  pinMode(led, OUTPUT);
  digitalWrite(led, 0);
  Serial.begin(115200);
  WiFi.mode(WIFI_STA);
  WiFi.begin(ssid, password);
  Serial.println("");

  // Wait for connection
  while (WiFi.status() != WL_CONNECTED) {
    delay(500);
    Serial.print(".");
  }
  Serial.println("");
  Serial.print("Connected to ");
  Serial.println(ssid);
  Serial.print("IP address: ");
  Serial.println(WiFi.localIP());

  if (MDNS.begin("esp8266")) {
    Serial.println("MDNS responder started");
  }

  server.on("/", handleRoot);

  server.on("/inline", []() {
    server.send(200, "text/plain", "this works as well");
  });

  server.onNotFound(handleNotFound);

  server.begin();
  Serial.println("HTTP server started");
}

void loop(void) {
  server.handleClient();
  MDNS.update();
}

Can someone try this out on their hardware/network? I'd like to know if the problem is reproducible elsewhere, or some kind of quirk with my hardware/network. Although, everything works fine with v2.4.2, so I don't think it's something I did.

@d-a-v
Copy link
Collaborator

d-a-v commented Feb 9, 2019

Have you tried to "erase flash" in tools menu ?

@moose4lord
Copy link
Author

Yes, I've tried the "Erase Flash: All Flash Contents" option. No change.

@d-a-v
Copy link
Collaborator

d-a-v commented Feb 10, 2019

Reproduced in master with HelloServer example.
@LaborEtArs @hreintke initializing m_GotIPHandler and m_DisconnectedHandler in MDNS.begin() breaks something. Everything is fine including mDNS resolution when these initializations are commented.

@moose4lord
Copy link
Author

Thanks @d-a-v. I'm relieved I wasn't doing something dumb. :)

@hreintke
Copy link
Contributor

@d-a-v : test results on mdns gitter, please take a look.

@LaborEtArs
Copy link
Contributor

@moose4lord @d-a-v: Is one of the registered event callbacks ever called?

@TD-er
Copy link
Contributor

TD-er commented Feb 10, 2019

OK, also good to know I'm not loosing my mind too :)
It seems to be related to some change made in the last month (or maybe even less, haven't had much time for my projects last weeks)
In the latest builds I have running here the webserver crashes the node on several specific pages (ofcourse one of the pages related to something I was working on) and when I went back to a build of around Jan 10, it was working fine.

@moose4lord
Copy link
Author

@LaborEtArs initially the requests from the browser are accepted by the ESP, the callback is called and runs successfully, but the response back to the browser is very delayed. But if I keep hitting refresh in the browser, the ESP eventually gets overwhelmed, the callbacks don't get called, and the ESP stops responding all together. Sometimes it will recover and start working again, but only for a short time.

It seems to get better if I add a small delay(10) in the callback. If I set the debug level to log everything, it seems to get better too.

Sorry I can't be more specific. It's a pretty erratic fail.

@d-a-v
Copy link
Collaborator

d-a-v commented Feb 11, 2019

I have news, long story short:

  • I have an ESP module on which HelloServer.ino works (LEAmDNS responding, webserver responding)
  • I have another one on which it doesn't work with the following symptoms:
    (at least CORE debug option needs to be enabled)
IP address: 10.0.1.154
MDNS responder started
HTTP server started
bcn_timout,ap_probe_send_start
ap_probe_send over, rest wifi status to disassoc
state: 5 -> 0 (1)
rm 0
IP address: 10.0.1.154
MDNS responder started
HTTP server started
bcn_timout,ap_probe_send_start
ap_probe_send over, rest wifi status to disassoc
state: 5 -> 0 (1)
rm 0
...

However, webserver works and no more disconnection when I comment MDNS.update() from loop().
It doesn't mean LEA mDNS does not work (because it works with another esp).

In every case I used Generic board, DOUT, 4MB (both are) and the same binary flashed onto both of them, used esptool.py, erase flash, flash with verify, and watch it run on serial console. I also tried with an external power supply.

edit:


update: Issue disappears by using previous nonos-sdk FW (SDK:2.2.1(cfd48f3))


update2: to test: (this will keep 2.5.0/master core sources, only FW is downgraded)

git remote add dav https://github.com/d-a-v/Arduino.git
git fetch dav
git checkout --track dav/fwregress

back to master:

git checkout master

@TD-er
Copy link
Contributor

TD-er commented Feb 11, 2019

Could it be somehow related to the issues seen when the wifi gets disconnected, but the open sockets are not closed because the disconnect is not (always) being noticed outside the core libraries?

The crashes I've seen were due to Exception 28 issues, that's why I was looking into my own code first.

@d-a-v
Copy link
Collaborator

d-a-v commented Feb 11, 2019

Could it be somehow related to the issues seen when the wifi gets disconnected, but the open sockets are not closed because the disconnect is not (always) being noticed outside the core libraries?

#5703 is merged and included in 2.5.0.

In this issue, WiFi is not supposed to go off, and there's no freeze, reboot, or error. Just WiFi loosing its connection and trying to reconnect, and not on all boards.

@moose4lord
Copy link
Author

Different results for different boards sounds weird. Are you sure you have the Debug Port set the same way when you're testing? I find that changing Debug Port from "Disabled" to "Serial" (even while keeping Debug Level set to "none") makes a big difference. With Debug Port="Disabled", HTTP requests almost always time out. But with Debug Port="Serial", the HTTP connection kind'a works, although sometimes it takes a long time to respond.

Just my observations. I have no idea why it would behave that way.

@d-a-v
Copy link
Collaborator

d-a-v commented Feb 11, 2019

Different results for different boards sounds weird.

Yes. I will try and see if I can force rf-calibration.

Are you sure you have the Debug Port set the same way when you're testing?

Yes:

and the same binary flashed onto both of them

@hreintke
Copy link
Contributor

@moose4lord : Are you running doing your webrequests from a windows system ?

@moose4lord
Copy link
Author

No, I'm on MacOS. But I do have a Windows 10 box that I tried it on, and it does the same thing. The HTTP server hangs on Chrome or Firefox.

@LaborEtArs
Copy link
Contributor

LaborEtArs commented Feb 11, 2019

Tried on Wemos D1 mini now; changed a lot of settings, but wasn't able to reproduce the issue... Using Safari on MacOS.

@denis-stepanov
Copy link

I observe a similar problem. Board Witty Cloud Dev (D1 R2 & mini in Arduino). Removing mDNS puts things back in order, but adding some other code (digitalRead() in my case) in the loop next to server.handleClient() renders ESP unresponsive again (like 70-100% ping loss). The code not related to networking seems to continue running fine.

@moose4lord
Copy link
Author

moose4lord commented Feb 11, 2019

Well, I'll be a monkey's uncle. @d-a-v is right, it does depend on the hardware you're using. I have three NodeMCU development boards and two of them have the issue, but one does not. The two that are "bad" have an AI Thinker ESP8266 (ESP-E) soldered on the development board, and the "good" one has a DOIT ESP8266 (ESP-F) soldered on the dev board. The "bad" boards have a red LED_BUILTIN and the "good" board has a blue LED_BUILTIN, but other than that, they look identical.

@denis-stepanov
Copy link

update: Issue disappears by using previous nonos-sdk FW (SDK:2.2.1(cfd48f3))

@d-a-v I confirm that your branch solves the connectivity issue. However, I could not make mDNS work with either 2.5.0 copy.

d-a-v added a commit to d-a-v/Arduino that referenced this issue Feb 15, 2019
This commit allows switching SDK firmware:

Some boards show erratic behavior (radio connection is quickly lost), with
an unknown cause, when using nonos-sdk-pre-v3 (shipped with release 2.5.0).

These boards work well with previous nonos-sdk-2.2.1 firmware.

Current firmware, which has brought long awaited fixes (WiFi sleep modes),
stays as default.

To switch:
           ./tools/boards.txt.py --sdk=NONOSDK221 --allgen
(default)  ./tools/boards.txt.py --sdk=NONOSDK3V0 --allgen

BREAKING for external build systems:
    new directories to add
        lib:     tools/sdk/lib/<version>/lib
        include: tools/sdk/lib/<version>/include

Fix esp8266#5736
@Swiftnesses
Copy link

@d-a-v I guess this might also be related to the issue I reported when testing the 2.5.0 beta. Can this option be made available via the Arduino IDE menu in a new release? I have a feeling the new SDK is causing a myriad of problems for many boards...

@d-a-v
Copy link
Collaborator

d-a-v commented Feb 17, 2019

I have a feeling the new SDK is causing a myriad of problems for many boards...

It seems to.

Can this option be made available via the Arduino IDE menu in a new release?

The possibility of a new menu has been internally discussed and is not advised (too many menus already, and an option to make a board work is not a good option). However #5763 allows switching.
Discussion about reverting for good (for now) is happening in #5513.

d-a-v added a commit that referenced this issue Feb 19, 2019
…eneric board only (#5763)

This commit allows switching SDK firmware:

nonos-sdk-pre-v3 shipped with release 2.5.0 has issues:

    * Some boards show erratic behavior (radio connection is quickly lost), with an unknown cause.
      These boards work well with previous nonos-sdk-2.2.1 firmware (#5736)

    * Overall performances seem to have decreased (#5513)

This PR restores sdk2.2.1 (as in core-2.4.2).

SDK-pre-3.0 - which has brought long awaited fixes (WiFi sleep modes) - is still available through a menu option available only with generic board.

BREAKING

    * new define `-DNONOSDK221=1` or `-DNONOSDK3V0=1`

    * for external build systems: new library directory: `tools/sdk/lib/<version>/lib`

    * PIO: variable `PIO_FRAMEWORK_ARDUINO_ESPRESSIF_SDK3` is needed for sdk-pre-v3.


Fix #5736
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants