Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spiffs fwrite "bad file number" after mqtt disconnect (IDFGH-693) #3149

Closed
ghost opened this issue Mar 9, 2019 · 5 comments
Closed

spiffs fwrite "bad file number" after mqtt disconnect (IDFGH-693) #3149

ghost opened this issue Mar 9, 2019 · 5 comments

Comments

@ghost
Copy link

ghost commented Mar 9, 2019

Environment

  • Development Kit: ESP32-Wrover-Kit
  • Kit version: v4
  • Module or chip used: ESP32-WROOM-32
  • IDF version: v3.3-beta1-512-g89ae5908d
  • Build System: Make
  • Compiler version: (crosstool-ng-1.22.0-80-g6c4433a5) 5.2.0
  • Operating System: Windows 10
  • Power Supply: USB

Problem Description

When mqtt disconnects (esp-tls fail) something goes wrong with spiffs during a file write running on a different task. errno says "Bad file number" after failed fwrite (which returns 0)

mqtts pem file is embedded in flash using COMPONENT_EMBED_TXTFILES := key.pem and access should not affect spiffs file operations. If I reopen the spiffs file after failure I can write to it again.

The problem does not happen after mqtt disconnects, but when the file is being written while it disconnects if that makes any sense.

Steps to repropduce

I've attached a project where I stripped down the code to minimum.

1- change TEST_WIFI_SSID & TEST_WIFI_PASS in main.cpp to match your setting
2- make -j8 erase_flash flash

you don't need to change the hard coded mqtt server. I've setup a server which has firewall access on port 8443 but nothing is listening which is when the bug shows it self. I'll make sure the server is running until this issue is resolved.

WARNING: the code has an infinite flash write loop. DO NOT leave it running or it will wear out the flash on your device.

Things I tried

Blocking the port with firewall or changing the server to an invalid one produce different mqtt disconnect codes which doesn't seem to affect fwrite operations. However I do get fwrite to fail with errno "No such file or directory" but that happens very rarely and I'm not sure it's relevant.

Setting a time delay for fwrite operations works (most of the time). But that's only because the race condition doesn't happen as often.

If mqtt successfully connects to server and I manually disconnect (server or client). works like expected.

I let an infinite loop connects/disconnects mqtt during a parallel fwrite. works like expected.

The only situation I could find where the problem is occurring is when the port connection is rejected.

Code to reproduce this issue

esp32_bug.zip

Debug Logs

fwrite ok
fwrite ok
fwrite ok
fwrite ok
fwrite ok
fwrite ok
E (7108) esp-tls: Failed to connnect to host (errno 104)
E (7128) esp-tls: Failed to open new connection
E (7128) TRANS_SSL: Failed to open a new connection
E (7128) MQTT_CLIENT: Error transport connect
fwrite ok
E (7138) TEST: fwrite error: Bad file number <<<<<<<<<<<<<<<<<<<<
fopen ok
fwrite ok
fwrite ok
fwrite ok
fwrite ok
fwrite ok
fwrite ok
fwrite ok
@ghost ghost changed the title spiffs fwrite "bad file number" after mqtt disconnect "errno 104" spiffs fwrite "bad file number" after mqtt disconnect Mar 11, 2019
@projectgus projectgus changed the title spiffs fwrite "bad file number" after mqtt disconnect spiffs fwrite "bad file number" after mqtt disconnect (IDFGH-693) Mar 12, 2019
@igrr
Copy link
Member

igrr commented Mar 14, 2019

Could you please try enabling CONFIG_USE_ONLY_LWIP_SELECT https://docs.espressif.com/projects/esp-idf/en/latest/api-reference/kconfig.html#config-use-only-lwip-select and see if the issue still occurs? That is to narrow down the possible cause.

@ghost ghost closed this as completed Mar 19, 2019
@ghost ghost reopened this Mar 19, 2019
@ghost
Copy link
Author

ghost commented Mar 19, 2019

Could you please try enabling CONFIG_USE_ONLY_LWIP_SELECT https://docs.espressif.com/projects/esp-idf/en/latest/api-reference/kconfig.html#config-use-only-lwip-select and see if the issue still occurs? That is to narrow down the possible cause.

tried enabling CONFIG_USE_ONLY_LWIP_SELECT. same result.

i closed and reopened by mistake. sorry about that.

@lebtron
Copy link

lebtron commented Mar 24, 2019

Deleted my old Github account and forgot that it's linked to this issue. I'm here.

@igrr
Copy link
Member

igrr commented Apr 8, 2019

Thanks for the code @lebtron, we were able to reproduce the issue. Will update this thread when the fix is available.

@igrr igrr closed this as completed in 7764547 Apr 17, 2019
@lebtron
Copy link

lebtron commented Apr 18, 2019

Thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants