Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NRF52 default append write mode of files #5858

Merged
merged 3 commits into from
Jan 15, 2025
Merged

Fix NRF52 default append write mode of files #5858

merged 3 commits into from
Jan 15, 2025

Conversation

thebentern
Copy link
Contributor

@esev suggestion on the truncate after the seek, instead of deleting the whole file first.

Seems to be working well so far between nodedb resets and normal writes. I'll continue to monitor, but I'm cautiously optimistic.

Copy link
Contributor

@todd-herbert todd-herbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried using truncate in a few different ways here too but no luck so far.
It feels like the trouble starts when a file hits that 512 byte littlefs block size.

When testing this PR, collecting one node in nodedb and then shutting down was enough to set off the bad block problem. In this case, the LoRa config was dropped, but the exact symptoms seem to vary depending on exactly what data is in the fs at the time.

INFO  | 01:21:50 107 Shut down from admin command
INFO  | 01:21:50 107 Shutting down
INFO  | 01:21:50 107 Enter deep sleep forever
INFO  | 01:21:50 107 Disable NRF52 bluetooth
INFO  | 01:21:50 107 NRF52 bluetooth disconnecting handle 0
INFO  | 01:21:50 107 OFF: NeoPixel Ambient lighting
DEBUG | 01:21:50 107 SX126x entering sleep mode
INFO  | 01:21:50 107 Turn off screen
DEBUG | 01:21:50 107 Opening /prefs/config.proto, fullAtomic=1
INFO  | 01:21:50 107 Save /prefs/config.proto
DEBUG | 01:21:50 107 FromRadio=STATE_SEND_PACKETS
DEBUG | 01:21:50 107 phone downloaded packet (id=0xba696da8 fr=0xe1924e80 to=0xffffffff, WantAck=0, HopLim=1 Ch=0x0 Portnum=67 rxtime=1736904110 priority=10)
lfs debug:617: Bad block at 4
lfs debug:617: Bad block at 18
lfs debug:617: Bad block at 20
lfs debug:617: Bad block at 21
lfs debug:617: Bad block at 22
lfs debug:617: Bad block at 23
INFO  | 01:21:55 112 BLE Disconnected, reason = 0x22
DEBUG | 01:21:55 112 PhoneAPI::close()
INFO  | 01:21:55 112 BLE Connected to TH-Phone
lfs debug:639: Relocating 4 5 to 28 5
lfs debug:2526: Found desync 28 5
DEBUG | 01:21:56 113 Opening /prefs/module.proto, fullAtomic=1
INFO  | 01:21:56 113 Save /prefs/module.proto
INFO  | 01:21:56 113 BLE connection secured
DEBUG | 01:21:56 113 Opening /prefs/channels.proto, fullAtomic=1
INFO  | 01:21:56 113 Save /prefs/channels.proto
DEBUG | 01:21:56 113 Opening /prefs/db.proto, fullAtomic=0
INFO  | 01:21:56 113 Save /prefs/db.proto
Disconnected (ClearCommError failed (PermissionError(13, 'The device does not recognize the command.', None, 22)))
Reconnecting to COM4 ....        Connected!
��@INFO  | ??:??:?? 2 

//\ E S H T /\ S T / C

INFO  | ??:??:?? 2 Wait for peripherals to stabilize
DEBUG | ??:??:?? 3 Filesystem files:
DEBUG | ??:??:?? 3  prefs (directory)
DEBUG | ??:??:?? 3    config.proto (170 Bytes)
DEBUG | ??:??:?? 3    channels.proto (57 Bytes)
DEBUG | ??:??:?? 3    db.proto (490 Bytes)
DEBUG | ??:??:?? 3    module.proto (93 Bytes)
DEBUG | ??:??:?? 3  adafruit (directory)
DEBUG | ??:??:?? 3  bond_prph (directory)
DEBUG | ??:??:?? 3  bond_cntr (directory)
DEBUG | ??:??:?? 3 Power::lipoInit lipo sensor is not ready yet
DEBUG | ??:??:?? 3 Use analog input 4 for battery level
INFO  | ??:??:?? 3 Scan for i2c devices
DEBUG | ??:??:?? 3 Scan for I2C devices on port 2
DEBUG | ??:??:?? 3 Scan for I2C devices on port 1
INFO  | ??:??:?? 3 No I2C devices found
DEBUG | ??:??:?? 3 acc_info = 0
INFO  | ??:??:?? 3 S:B:69,2.5.20.dd9ab7f0
DEBUG | ??:??:?? 3 Reset reason: 0x0
DEBUG | ??:??:?? 3 Set random seed 2702563647
INFO  | ??:??:?? 3 Init NodeDB
INFO  | ??:??:?? 3 Load /prefs/db.proto
INFO  | ??:??:?? 3 Loaded /prefs/db.proto successfully
INFO  | ??:??:?? 3 Loaded saved devicestate version 23, with nodecount: 2
INFO  | ??:??:?? 3 Load /prefs/config.proto
INFO  | ??:??:?? 3 Set default channel and radio preferences!
DEBUG | ??:??:?? 3 Expand short PSK #1
INFO  | ??:??:?? 3 Wanted region 0, using UNSET
INFO  | ??:??:?? 3 Load /prefs/module.proto
INFO  | ??:??:?? 3 Loaded /prefs/module.proto successfully
INFO  | ??:??:?? 3 Loaded saved moduleConfig version 23
INFO  | ??:??:?? 3 Load /prefs/channels.proto
INFO  | ??:??:?? 3 Loaded /prefs/channels.proto successfully
INFO  | ??:??:?? 3 Loaded saved channelFile version 23
ERROR | ??:??:?? 3 Could not open / read /prefs/uiconfig.proto
DEBUG | ??:??:?? 3 cleanupMeshDB purged 0 entries
DEBUG | ??:??:?? 3 Use nodenum 0xe1924e80
INFO  | ??:??:?? 3 Generate new PKI keys
DEBUG | ??:??:?? 3 Generate Curve25519 keypair
DEBUG | ??:??:?? 3 Expand short PSK #1
INFO  | ??:??:?? 3 Wanted region 0, using UNSET

@thebentern
Copy link
Contributor Author

thebentern commented Jan 15, 2025

When testing this PR, collecting one node in nodedb and then shutting down was enough to set off the bad block problem. In this case, the LoRa config was dropped, but the exact symptoms seem to vary depending on exactly what data is in the fs at the time.

Strange. I have a device that previously was throwing a CF#12 and lost its lora region. On this branch after a nodedb reset and a save of the config, it's not throwing any spurious LFS logs or other issues. It's been going for a couple of hours now, but I'll leave it on tio and see if I can capture anything suspicious.

@esev
Copy link
Contributor

esev commented Jan 15, 2025

FWIW: I think the "bluetooth issue" might be separate from the size doubling issue. I think this PR is aimed at fixing the latter.

@todd-herbert
Copy link
Contributor

I'll see if I can reproduce the issue using the CLI only, to rule out bluetooth as a factor.

src/SafeFile.cpp Outdated
@@ -57,6 +59,9 @@ bool SafeFile::close()
{
if (!f)
return false;
#ifdef ARCH_NRF52
f.truncate();
#endif

spiLock->lock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this lock happen before the truncate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Pushed the change

@todd-herbert todd-herbert dismissed their stale review January 15, 2025 02:41

Unrelated issue

@todd-herbert
Copy link
Contributor

Okay yeah, seems like Bluetooth reconnect is the culprit for the corruption I've been seeing. I didn't expect that!
On the plus side, seems like I might have stumbled onto a reliable way to reproduce that Bluetooth issue.

@esev
Copy link
Contributor

esev commented Jan 15, 2025

seems like I might have stumbled onto a reliable way to reproduce that Bluetooth issue.

Yes! I noticed that too. :) You're issuing a shutdown command?

Edit: Let's move that discussion to #5839, if that's okay with you.

@esev
Copy link
Contributor

esev commented Jan 15, 2025

Finally got logs over USB :)

This PR results in a lot fewer flash operations for me.

Before (with the temp file name logic):

INFO | 04:53:22 52 Save /prefs/db.proto
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#

After (with basically the same nodes)

INFO | 04:59:08 58 Save /prefs/db.proto
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#
[SOC ] NRF_EVT_FLASH_OPERATION_SUCCESS#

@thebentern thebentern merged commit 85de193 into master Jan 15, 2025
54 checks passed
@thebentern thebentern deleted the fix-append branch January 15, 2025 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants