Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve STM32F4 Flash Behavior #17946

Merged

Conversation

sjasonsmith
Copy link
Contributor

Description

Improves several issues related to FLASH_EEPROM_EMULATION for STM32F4 environments using HAL/STM32.

Hopefully several people will volunteer to test this prior to it being merged. While I believe this resolves the issues stated, I have only tried it on my SKR Pro board, and perhaps it causes issues for other configurations of non-STM32F4 boards using this HAL.

1. Clear flash error bits prior to writing to flash

Apparently STM32F4 MCUs come up with flash error bits set, that need to be cleared prior to flash operations. This caused the erase operation to fail, resulting in no data being stored. A second M500 was previously needed to succesfully store settings.

This is discussed in this post on the ST forums

2. Disallow PRINTCOUNTER when using FLASH_EEPROM_EMULATION with STM32F4

It takes just over one second on my SKR Pro to erase the 128kB flash sector used for EEPROM emulation. During this time the flash cannot be accessed at all, including to execute code from other sectors. This greatly interferes with timing and would be a generally bad idea during prints.

3. Pause servo output during flash erase

As mentioned above, erasing the flash sector takes just over one second on my board. This disrupts servo (BLTOUCH) output to the extent that it does not recover.

Prior to performing a flash erase, this now pauses all active Servo output and disables interrupts, to avoid scenarios where interrupts misbehave if they run while flash is inaccessible. Previous Servo output is restored after the flash operation is complete. My BLTouch seems perfectly happy with this arrangement.

The following image demonstrates the duration of the gap in Servo output:
image

4. Add sanity check to report error if FLASH_EEPROM_LEVELING is specified incorrectly

FLASH_EEPROM_LEVELING is very useful to reduce/hide the impacts of the issues this PR addresses. This is because it only erases the flash every 33rd time it is written. I wanted to be sure people could not specify it hoping for those benefits, and have it be silently disabled.

Benefits

This makes FLASH_EEPROM_EMULATION much more usable on STM32F4 boards, in the two primary ways:

  • Does not require two M500 commands to store settings for the first time after boot.
  • Does not kill BLTouch after an M500.

Related Issues

#17627 - SKR PRO, M502 M500 twice to update EEPROM
#16169 - SKR PRO Bltouch doesn't work after M502/M500/M501 (flash emulation related)

FLASH_EEPROM_EMULATION takes about a second to erase the flash page
on STM32F4 hardware, which will interfere with any ongoing activities.
Disallow PRINTCOUNTER and FLASH_EEPROM_EMULATION to co-exist for
STM32F4 hardware using HAL/STM32, to avoid issues during prints.
@sjasonsmith
Copy link
Contributor Author

I'm hoping some of the users who have reported issues will be able to help test this and report back. All of these users have clearly indicated in one of the referenced issues that they encountered one or more of the problems this addresses.
@DroneMang @BastR @dch1921 @Dev5994 @danym21 @GhostlyCrowd

Note that FLASH_EEPROM_LEVELING hides these issues because it doesn't erase the flash as often. The problems probably still occurred on every 33rd write. Make sure you have that disabled in your pins file if you are testing these.

@GhostlyCrowd
Copy link
Contributor

I'm hoping some of the users who have reported issues will be able to help test this and report back. All of these users have clearly indicated in one of the referenced issues that they encountered one or more of the problems this addresses.
@DroneMang @BastR @dch1921 @Dev5994 @danym21 @GhostlyCrowd

Note that FLASH_EEPROM_LEVELING hides these issues because it doesn't erase the flash as often. The problems probably still occurred on every 33rd write. Make sure you have that disabled in your pins file if you are testing these.

Ill give this a test tomorrow when i get to my office and printers with SKR Pro's Does this change effect the SD eeprom emulation? Or did you want me to switch to flash emulation to test this?

Also, to clarify you need us to undefine "FLASH_EEPROM_LEVELING" in the pin, the SKR pro doesnt appear to have this Only the FYSETC S6 does as far as i can tell on the STM32F4 platform.

@sjasonsmith
Copy link
Contributor Author

@GhostlyCrowd

Ill give this a test tomorrow when i get to my office and printers with SKR Pro's Does this change effect the SD eeprom emulation? Or did you want me to switch to flash emulation to test this?

All of these changes only impact FLASH_EEPROM_EMULATION.

Also, to clarify you need us to undefine "FLASH_EEPROM_LEVELING" in the pin, the SKR pro doesnt appear to have this Only the FYSETC S6 does as far as i can tell on the STM32F4 platform.

That is currently correct. I was going to enable it by default as part of this PR, but realized that would interfere with testing. I fully expect all of these issues would have existed on the S6, GTR, or any other STM32F4 board.

@thisiskeithb
Copy link
Member

The BTT002 is printing right now, so I’ll give this PR a go later today.

@GhostlyCrowd
Copy link
Contributor

GhostlyCrowd commented May 10, 2020

@GhostlyCrowd

Ill give this a test tomorrow when i get to my office and printers with SKR Pro's Does this change effect the SD eeprom emulation? Or did you want me to switch to flash emulation to test this?

All of these changes only impact FLASH_EEPROM_EMULATION.

Also, to clarify you need us to undefine "FLASH_EEPROM_LEVELING" in the pin, the SKR pro doesnt appear to have this Only the FYSETC S6 does as far as i can tell on the STM32F4 platform.

That is currently correct. I was going to enable it by default as part of this PR, but realized that would interfere with testing. I fully expect all of these issues would have existed on the S6, GTR, or any other STM32F4 board.

Understood, I will test tomorrow Morning when i get to the hardware.

@randellhodges
Copy link
Contributor

I wonder if maybe the printcounter should not be flushed to the underlying eeprom storage (regardless of eeprom type) if there are any moves in the planner? Maybe a way to accumulate that data in memory and flush it out once only at the end.

@randellhodges
Copy link
Contributor

randellhodges commented May 11, 2020

FYI, I did find this in the documentation, confirming what you said:

"Any attempt to read the Flash memory on STM32F4xx while it is being written or erased,
causes the bus to stall. Read operations are processed correctly once the program
operation has completed. This means that code or data fetches cannot be performed while
a write/erase operation is ongoing
"

I wonder if instead of using the last 128k sector, if we can use the linker to adjust the start of the code to Sector 1. Then store the EEPROM in Sector 0.

The layouts of sectors are:
0 thru 3: 16k
4: 64k
5 thru 11: 128k

While we'd lose the ability to have like 100, 10x10 meshes stored and flash leveling wrapping around for every 32nd time, but, we'd be erasing a much smaller and hopefully faster sector?

@thinkyhead
Copy link
Member

I wonder if maybe the printcounter should not be flushed to the underlying eeprom storage (regardless of eeprom type) if there are any moves in the planner?

It sounds like it would be wise in-general to only write that once per Z height (i.e., once per layer) and then again if the print ends for any reason, and just accept the potential of the stored values being one layer off if the power goes out.

@randellhodges
Copy link
Contributor

Writing it out once per Z height would actually increase the EEPROM writes. Currently it writes it out once per hour, which align with people reporting layer shifts and other issues on the hour...

The quick fix I was thinking was change saveInterval to something like 5/10 seconds and in the tick, have it look at the planner, and if it is empty, do nothing, otherwise save.

I'm assuming the start/stop methods are already appropriately being called to persist the data if the print ends for any other reason....

Now that I type this all out, I realize doing it this way could be negated by only persisting when stop is called and remove it from being an interval save.

Just my random thoughts.

@thinkyhead
Copy link
Member

if it is empty, do nothing, otherwise save.

That's a good idea.

@thinkyhead thinkyhead merged commit 25aade1 into MarlinFirmware:bugfix-2.0.x May 11, 2020
@GhostlyCrowd
Copy link
Contributor

So far everything seems to be fine for me SKR Pro STM32F4 Bltouch. Flash eprom Emu.
It was pretty apparent rapidly since I didn't have to M500 twice to save settings and my Bltouch worked afterwords 👍

@DroneMang
Copy link

I think you need to add #define FLASH_EEPROM_EMULATION outside the if statement as well because if you #define EEPROM_SETTINGS then how would you guarantee that FLASH_EEPROM_EMULATION will be turned on?

#if NO_EEPROM_SELECTED
//#define SRAM_EEPROM_EMULATION // Use BackSRAM-based EEPROM emulation
#define FLASH_EEPROM_EMULATION // Use Flash-based EEPROM emulation
#endif

I propose to add the below line in pins for the SKR PRO:
#define FLASH_EEPROM_EMULATION // Use Flash-based EEPROM emulation

Great work with this, will save me from crashing into the bed after an M500!

@DroneMang
Copy link

DroneMang commented May 12, 2020

In octoprint when setting up new eeprom, I get these errors. So how many slots for meshes are available now? And why does it say eeprom saved to slot 26? What happened to the mesh I have/had stored in slot 6? Is it gone? Thanks!

Send: G29 L6
Recv: ?Invalid storage slot.
Recv: ?Use 0 to 2

Send: M500
Recv: echo:Settings Stored (671 bytes; crc 50497)
Recv: EEPROM saved to slot 26.

@sjasonsmith
Copy link
Contributor Author

#if NO_EEPROM_SELECTED
//#define SRAM_EEPROM_EMULATION // Use BackSRAM-based EEPROM emulation
#define FLASH_EEPROM_EMULATION // Use Flash-based EEPROM emulation
#endif

I propose to add the below line in pins for the SKR PRO:
#define FLASH_EEPROM_EMULATION // Use Flash-based EEPROM emulation

It's already there, it just only happens if you haven't already defined some other EEPROM or EEPROM Emulation. If the user has already configured some other form of EEPROM, you don't want to define FLASH_EEPROM_EMULATION.

@sjasonsmith
Copy link
Contributor Author

So how many slots for meshes are available now? ... What happened to the mesh I have/had stored in slot 6? Is is gone? Thanks!

Apparently there are three slots now. If you need to store more than that you will need to turn FLASH_EEPROM_LEVELING back off. I didn't realize that with that disabled it was giving you access to the entire 128KB as EEPROM space.

It is safe to assume that they are gone if you have done an M500.

And why does it say eeprom saved to slot 26?

I enabled FLASH_EEPROM_LEVELING by default because erasing the 128 KB sector take a long time, and 4K is the normal size reserved for EEPROM use on most boards. FLASH_EEPROM_LEVELING stores multiple times into the same sector and only requires an erase cycle when space is exhausted. This should be an overall better experience for most people and for the life of the hardware.

It did not occur to me when I made the change that people might have a bunch of UBL meshes saved in there. Sorry about that.

@DroneMang
Copy link

Its ok, I had done a G29 S-1 on that mesh so I just read it back in into slot 0.

@sjasonsmith
Copy link
Contributor Author

sjasonsmith commented May 12, 2020

Its ok, I had done a G29 S-1 on that mesh so I just read it back in into slot 0.

If you need more UBL slots, it looks like you can keep FLASH_EEPROM_LEVELING enabled and define EEPROM_SIZE to something larger. This is how it is defined in the flash code, so it looks like it is configured to be easily overridden.

  #ifndef EEPROM_SIZE
    #define EEPROM_SIZE           0x1000  // 4kB
  #endif

@randellhodges
Copy link
Contributor

randellhodges commented May 12, 2020

Without FLASH_EEPROM_LEVELING you are using the framework EEPROM emulation, which I believe defaults to 16kB, at least for some chips.

I don't believe meshes would have access to the entire sector, unless FLASH_PAGE_SIZE was set to a much larger size (for the non-FLASH_EEPROM_LEVELING out of the box framework eeprom code)

Edit: Keep in mind, the EEPROM data is cached in memory. It seems boards using the F4 chip usually have it to spare, but if you wanted a bunch of meshes and set the EEPROM_SIZE to 64kB, that is going to hoard 64kB of memory

@DroneMang
Copy link

In updating to the newest bugfix from the non-bugfix production version, I am now getting a BLTouch error:

Changing monitoring state from "Error: !! STOP called because of BLTouch error - restart with M999" to "Offline (Error: !! STOP called because of BLTouch error - restart with M999)"
Connection closed, closing down monitor

This happens when I home and run G28. X and Y home just fine with sensorless homing.

The BLTouch deploys and raises for 2 cycles and then throws this error.

@GhostlyCrowd
Copy link
Contributor

In updating to the newest bugfix from the non-bugfix production version, I am now getting a BLTouch error:

Changing monitoring state from "Error: !! STOP called because of BLTouch error - restart with M999" to "Offline (Error: !! STOP called because of BLTouch error - restart with M999)"
Connection closed, closing down monitor

This happens when I home and run G28. X and Y home just fine with sensorless homing.

The BLTouch deploys and raises for 2 cycles and then throws this error.

What board, can you upload your configs and also how is your probe connected? to Z min end-stop?

@DroneMang
Copy link

DroneMang commented May 12, 2020 via email

@DroneMang
Copy link

Here are the current configs and they produce the M999 issue.
Marlin-bugfix-2.0.x-11-5160.zip

@GhostlyCrowd
Copy link
Contributor

Here are the current configs and they produce the M999 issue.
Marlin-bugfix-2.0.x-11-5160.zip

Interesting you have SW mode enabled and force 5v mode both of which shouldn't be needed on the SKR Pro, you also don't have any delay defined for your probe #define BLTOUCH_DELAY 500 should be defined the probe needs some sort of time to react.

I assumed this worked before? I'd suggest enabling #define DEBUG_LEVELING_FEATURE in configuration.h Turn on with the command 'M111 S32' and seeing what it complains about when it fails.

@DroneMang
Copy link

DroneMang commented May 13, 2020 via email

@GhostlyCrowd
Copy link
Contributor

My thought with the 5V mode and force sw mode was to remove as much noise as possible. This is on an FT6 and the cable run is kind of long plus goes through the Folger Tech breakout boards for Ethernet and doing a 15x15 mesh I need to do multiple probes it seems (10 right now, takes 12 hrs!) in order to get an accurate bed mesh. Do you think the 5V and sw mode would help with these? I know that doing multiple probes helps with the bed accuracy but perhaps there are some settings that I could enable or disable to reduce the number of multiple probes. I already have turn heaters off and wait for bed heater and turn steppers off and fans off plus I unplug the hot end cooling fan during probing. I'll try the settings above and see. Thanks.

On Wed, May 13, 2020, 8:46 AM GhostlyCrowd @.***> wrote: Here are the current configs and they produce the M999 issue. Marlin-bugfix-2.0.x-11-5160.zip https://github.com/MarlinFirmware/Marlin/files/4619286/Marlin-bugfix-2.0.x-11-5160.zip Interesting you have SW mode enabled and force 5v mode both of which shouldn't be needed on the SKR Pro, you also don't have any delay defined for your probe #define BLTOUCH_DELAY 500 should be defined the probe needs some sort of time to react. I assumed this worked before? I'd suggest enabling #define DEBUG_LEVELING_FEATURE in configuration.h Turn on with the command 'M111 S32' and seeing what it complains about when it fails. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#17946 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIZZJSAYGKWOR6AR2SCKZ5DRRKJCBANCNFSM4M5MMGFQ .

The fastest way for us to see what's going on is to debug the probe with the mentioned settings.
But yes the SKR pro doesn't need sw mode or 5v mote exclusively so try without it.

@DroneMang
Copy link

DroneMang commented May 13, 2020 via email

@DroneMang
Copy link

DroneMang commented May 13, 2020 via email

@GhostlyCrowd
Copy link
Contributor

GhostlyCrowd commented May 14, 2020

I figured it out just needed to turn off Z_MIN_PROBE_USES_Z_MIN_ENDSTOP_PIN. If that is off do I need this on?: #define USE_PROBE_FOR_Z_HOMING On Wed, May 13, 2020 at 3:47 PM Arion Mangio [email protected] wrote:

I believe you're correct. If you're not using Z min for the probe you have to tell the firmware it's using the probe as zmin or it will assume you still have a Z min endstop.

@DroneMang
Copy link

Its working now.

One curious thing I have noticed is that when I save the EEPROM, it increments to a smaller and smaller number, say after an M502 M500 M501 sequence. But if I just do M500 then it says it saved the current mesh in the current slot. So will the eeprom slots run out of storage? Or does FLASH_EEPROM_LEVELING take care of this?

@sjasonsmith
Copy link
Contributor Author

@DroneMang if you do M500 without changing anything it will say it stored settings, but it actually skips the unnecessary write.

When the slots run out it will erase the flash sector and start over.

@randellhodges
Copy link
Contributor

randellhodges commented May 14, 2020

@DroneMang The last sector is 128kB and we "use" 4kB for EEPROM. The leveling subdivides this 4kB into 32 "slots". Assuming the 128kB sector is empty (all 0's) it starts at the back and works forward so it can find the first non-empty slot. It starts at the beginning of the sector, scans until it hits non-empty data, that's how it knows what the current "slot" is (no other way figure it out), then subtracts one and fills that slot. Once it runs out, it erases the entire sector (can only erase full sectors) and starts over again.

When you execute an M500, it evaluates each byte in "memory" vs what is stored in the EEPROM. If no bytes change, then nothing is done. No sense to "waste" a slot and an erase/write cycle if nothing changed.

I stole this idea from the LPC code by p3p and just hacked it into the STMF4 HAL.

Edit: Note that mesh data and "slots" used by the meshes is a subset of the 4kB EEPROM. When it says what EEPROM "slot" it saved into, that doesn't mean mesh slot.

@DroneMang
Copy link

DroneMang commented May 14, 2020 via email

@randellhodges
Copy link
Contributor

randellhodges commented May 14, 2020

Make note of your settings, then in your configuration file,

Add
#define EEPROM_SIZE 0x4000 // 16kB
That'll give you a bunch of meshes, but then you'd have only have 8 EEPROM writes before the sector is erased. Still, better than nothing:

Or
#define EEPROM_SIZE 0x2000 // 8kB
That'll still still give you a lot more mesh slots and you'll get 16 EEPROM writes before the sector is erased.

Just keep EEPROM_SIZE a multiple of 4kB. If you do not, I do not know how it will respond to the change. It should evenly divide into 128 and be more than 4.

Edit: You only get 3 meshes with the default 4kB? I am using an LPC1769 and I thought it was doing 4kB and gave a few more meshes. I'll have to double check, now I'm curious. How many points are you using?

@sjasonsmith sjasonsmith deleted the PR/STM32_FlashError branch May 25, 2020 22:31
vgadreau pushed a commit to vgadreau/Marlin that referenced this pull request May 29, 2020
jmp0x0000 pushed a commit to jmp0x0000/Marlin that referenced this pull request Aug 7, 2020
njibhu pushed a commit to njibhu/Marlin that referenced this pull request Aug 24, 2020
HairingX pushed a commit to HairingX/Marlin that referenced this pull request Jun 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants