Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] STM32F1 watchdog is not enabled. Boards freeze with heaters on. #18226

Closed
minosg opened this issue Jun 8, 2020 · 33 comments
Closed

[BUG] STM32F1 watchdog is not enabled. Boards freeze with heaters on. #18226

minosg opened this issue Jun 8, 2020 · 33 comments

Comments

@minosg
Copy link

minosg commented Jun 8, 2020

Bug Description

The watchdog initialization function in the HAL for STM32F is commented out, resulting in non operational watchdog. When Marlin crashes on those boards, they stop mid-print with the heaters on. This affects among others the following board:

  • BTT SKR Mini e3
  • Malyan M200
  • GTM32 Pro
  • MKS Robin

Steps to Reproduce

  1. Compile latest Marlin release (2.5.3.0 or bugfix release)
  2. Run a high retraction model with low E-Jerk ( Extruder Jerk set to 2) or do anything else that is guaranteed to crash the FW.
  3. Check the nozzle stuck in the model and verify that heaters and fans are running.

Expected behavior:

The watchdog should kick in and reset the board

Actual behavior:

The nozzle bakes some pla cake

Additional Information

The code can be seen at:

watchdog.cpp

I can confirm that on BTT SKR mini e3 the watchdog can be enabled by removing comments from this line

/**
 * @brief  Initialized the independent hardware watchdog.
 *
 * @return No return
 *
 * @details The watchdog clock is 40Khz. We need a 4 seconds interval, so use a /256 preescaler and 625 reload value (counts down to 0)
 */
void watchdog_init() {
  iwdg_init(IWDG_PRE_256, STM32F1_WD_RELOAD);
}

I have no knowledge why this line was commented out but it appears it has been so since that particular has been introduced in e9acb63290f068cb9dd75acb3c042bc6bccd6616

@xC0000005
Copy link
Contributor

The maple core hard hangs if the watchdog is enabled via option bytes (the ST core does not). That's the root cause of this, but I agree, that should be enabled.

@xC0000005
Copy link
Contributor

If you have a way to test a change, I have a branch with a fix that reads the option byte and enables it correctly.

@minosg
Copy link
Author

minosg commented Jun 21, 2020

I will be testing the change from your branch. I will report back on how it worked on my system but some Stm core users may have to test it as well

@boelle boelle closed this as completed Jun 23, 2020
@thisiskeithb
Copy link
Member

I think this is important enough to keep open for now.

@thisiskeithb thisiskeithb reopened this Jul 12, 2020
@thisiskeithb thisiskeithb added the no-locking Prevent Lock-bot from locking the issue. label Jul 12, 2020
@MarlinFirmware MarlinFirmware deleted a comment from boelle Jul 12, 2020
@MarlinFirmware MarlinFirmware deleted a comment from boelle Jul 12, 2020
@minosg
Copy link
Author

minosg commented Jul 12, 2020

@thisiskeithb thanks. When I'm back to civilization I will check @xC0000005's fork and verify if the changes he is bringing in works in my boards

@xC0000005
Copy link
Contributor

xC0000005 commented Jul 12, 2020 via email

@thinkyhead
Copy link
Member

thinkyhead commented Jul 12, 2020

I can confirm that on BTT SKR mini e3 the watchdog can be enabled…

Which hardware permits the use of…

iwdg_init(IWDG_PRE_256, STM32F1_WD_RELOAD);

…and which hardware does not? It sounds like a new flag is needed in pins_MY_BOARD.h or as a -D option on the appropriate targets.

@thinkyhead thinkyhead removed the no-locking Prevent Lock-bot from locking the issue. label Jul 12, 2020
@xC0000005
Copy link
Contributor

xC0000005 commented Jul 12, 2020 via email

@minosg
Copy link
Author

minosg commented Jul 19, 2020

@xC0000005 As discussed offline, I enabled the hardware watchdog using option bytes, and and can confirm that the maple core hung as expected. Which means that enabling this line in boards with the hardware watchdog is not an option.

@xC0000005
Copy link
Contributor

xC0000005 commented Jul 19, 2020 via email

@minosg
Copy link
Author

minosg commented Jul 20, 2020

@xC0000005

I may fail to see the scope of the above patch. According to this document the OPTCR and FLASH_OPTCR_WDG_SW register references are for the STM32F2 not the F1

For this platform shouldn't it be written as:

void watchdog_init() {
  if (FLASH_BASE->OBR) & FLASH_OBR_WDG_SW == 0) {
    iwdg_init(IWDG_PRE_256, STM32F1_WD_RELOAD);
  }
}

?

And would the idea be that if the hardware watchdog is enabled by option bytes do nothing ( not activate the watchdog) while if it is not, use the software wathdog init?

At any rate I have tested the proposed code which fails to compile on latest bugfix, and my version of the equivalent which did compile but not work. Enabling the option byte still stopped the board from booting up.

@minosg
Copy link
Author

minosg commented Jul 25, 2020

Ok coming back after having worked on this issue a bit more something is very off. For this platform you can use the ST's Software or Hardware watchdog, which is enable by Option Bytes set in in the OBR register. The problem which @xC0000005 described occurs when you enable the hardware watchdog which hangs.

Now the fun part. What he proposed was a simple test, to verify if the option byte is not-set,indicative of using a hardware watchdog, and if so do nothing. Or if the option byte is set, using a software watchdog to initialize it using the following line of code:

wdg_init(IWDG_PRE_256, STM32F1_WD_RELOAD);

This could be achieved by a test described above as:

  if (((FLASH_BASE)->OBR& (FLASH_OBR_WDG_SW)) != 0) {
    iwdg_init(IWDG_PRE_256, STM32F1_WD_RELOAD);
  }

The problem

The problem lies that the above code does not work. And the reason which it does not work is that for at least my platform, the flash reg map is malformed, probably missing one register entry

/** @brief STM32F1 Flash register map type */
typedef struct flash_reg_map {
    __IO uint32 ACR;            /**< Access control register */
    __IO uint32 KEYR;           /**< Key register */
    __IO uint32 OPTKEYR;        /**< OPTKEY register */
    __IO uint32 SR;             /**< Status register */
    __IO uint32 CR;             /**< Control register */
    __IO uint32 AR;             /**< Address register */
    __IO uint32 OBR;            /**< Option byte register */
    __IO uint32 WRPR;           /**< Write protection register */
} flash_reg_map;

#define FLASH_BASE                      ((struct flash_reg_map*)0x40022000)

Since all members are 32bit registers there is no padding involved.

The OptionByte register should be in 0x4002201C location while it in our case it points to 0x40022018

This can be manually verified by using pointed to peek at the register and toggling the nRST_STOP ( it is a safe bit to test with )

volatile uint32_t * obr_shortcut = (volatile uint32_t *)0x4002201C;
volatile uint32_t wgd_val= *obr_shortcut & FLASH_OBR_WDG_SW;
volatile uint32_t_nrst_val= *obr_shortcut & FLASH_OBR_nRST_STOP;

Rewriting the above conditional with the incorrect structre member(which points to the right address) and using the nRST as a check, the logic has been verified to be working .

  if (((FLASH_BASE)->WRPR & (FLASH_OBR_nRST_STOP)) != 0) {
    iwdg_init(IWDG_PRE_256, STM32F1_WD_RELOAD);
  }

But trying the same for the watchdog big still crashes maple core

  if (((FLASH_BASE)->WRPR & (FLASH_OBR_WDG_SW)) != 0) {
    iwdg_init(IWDG_PRE_256, STM32F1_WD_RELOAD);
  }

Proposal

I am not fully aware of which platforms come with the option byte and weather that is worth the safety risk of disabling the watchdog the most common platforms we currently have. We could guide the users on how to disable it using ST's software if their new platform crashes on boot-up.

But I am not quite sure if it worth untangling and debugging yet another maple core framework issue, which is intended to be phased out gradually if I am not wrong.

Again though I am a Marlin noob, so I am just sharing my findings.

@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label / comment or this will be closed in 5 days.

@Plasmarobo
Copy link

This seems to effect Creality v4.2.2 boards (Ender 3, Ender 5, Ender 5 Pro).

Any chance of a fix? I really don't want to burn my house down, my printer hung in the middle of a print with both the headed bed and extruder on, when I power cycled it Thermal Runaway was triggered, but too little too late.

I plan to experiment with these settings/hack a fix into the Maple core, but this seems pretty critical to me: not fixing it can and will cause a fire eventually.

@sjasonsmith
Copy link
Contributor

Now that we have fixed a major cause of hangs, I think we need to come back to this and try to get the watchdog working everywhere. Re-opening.

@Plasmarobo please make sure you are using Marlin 2.0.7 (or latest bugfix). Hangs involving serial ports were recently fixed for these boards.

@qwewer0
Copy link
Contributor

qwewer0 commented Oct 8, 2020

Are there settings or a sequence of tasks that could bring the issue out, for testing?

@sjasonsmith
Copy link
Contributor

@qwewer0 if you want a way to trigger the watchdog, you could modify some GCode to simply disable interrupts then spin in a loop. If Marlin is behaving properly it will never hang on it’s own, so you have to contrive your own hang.

@minosg
Copy link
Author

minosg commented Oct 8, 2020

@sjasonsmith I believe the core issue is the enabling the watchdog part not the testing it actually works when enabled.

As @xC0000005 described above you can have this plaftom configured to use a hardware or a software watchdog.

The commented out line will work just fine and enable watchdog in platforms configured for software watchdog, but it will not even boot marlin if you use a hardware wd.

The idea is to create a logic fork which works like that:

  • If config flags are set to use software watchdog --> use the commented out code to init it
  • if config flags are set to hardware watchdog - - > use a different way or disable it and display warning.

@sjasonsmith
Copy link
Contributor

Do any actual boards in use enable the HW Watchdog in the option bytes? When I enable it on my SKR E3 DIP is immediately hangs on boot, it doesn't matter whether or not I am initializing the watchdog through software.

If we are disabling watchdogs for every STM32F1 user just so that one person can run a printer from a Bluepill, then we should just turn it on leave it up to that one Bluepill user to fix it for their board, or to just change their option bits.

If we actually know about real 3D printer boards enabling the HW Watchdog, I'd like to know what they are.

I have checked all the following boards, and none of them enable HW Watchdog in option bits:

  • SKR E3 DIP
  • SKR Mini V1.1
  • MKS Robin Nano V2.0
  • MKS Robin Lite3 V1.0
  • MKS Robin E3D V1.0
  • MKS Robin E3 V1.0

The following were inconclusive because I wasn't able to easily connect to them over SWD. Most likely their firmwares have debug pins disabled and without reset buttons I cannot easily catch them in a reboot.

  • FYSETC AIO II
  • Creality 4.2.7

@sjasonsmith
Copy link
Contributor

Personally, I am willing to risk hanging some boards at boot as long as we are not bricking them. Nobody is going to start a fire because their printer won't boot.

@minosg
Copy link
Author

minosg commented Oct 9, 2020

@sjasonsmith I am with you on this one. BTT and MKS appear to be the most popular boards, and most likely to attract new users because of the support by youtube tutorials. Those users are not guaranteed to be familiar with a watchdog and it's purpose so the risk is non uniformly distributed.

I tried to find out which issue commented out that line, to follow any past discussions, but I believe that it was inherited as is from Marlin4ST.

Is it possible to enable the watchdog and then add a health-check to display a Platform Specific warning, letting users know that if their board comes with the hardware watchdog, do xyz?

@Plasmarobo
Copy link

I can try to connect to my Creality 4.2.2 via SWD, think I have a hw debugger lying around someplace.

I will say I built a version of bugfix with the proposed changes to watchdog.cpp (the if-statement that reads the option bytes). the USE_WATCHDOG flag is set in my config, but I'm not experienced enough to know if it's actually using the hardware watchdog. Could be that codepath is just never being hit.

However it doesn't hang on boot, and I'm currently testing it out to see if it'll print benchy.

@xC0000005
Copy link
Contributor

xC0000005 commented Oct 9, 2020 via email

@sjasonsmith
Copy link
Contributor

@xC0000005 are there Lerdge boards using this HAL? I thought they all used HAL/STM32.

I don’t have an F1-based Malyan board to play with. The one I have appears to be dead.

@rhapsodyv
Copy link
Member

The M200 and Lerdge boot loaders enable the boot loader even if the option byte doesn’t, so perhaps what is needed is a board-specific #define that enables only those that need it to skip init if so.

But we already have a user option to enable/disable watchdog, right? The board pins file could undef it, and warn the user.

#define USE_WATCHDOG

@sjasonsmith
Copy link
Contributor

But we already have a user option to enable/disable watchdog, right? The board pins file could undef it, and warn the user.

I believe for the boards in question we only need to disable initializing the watchdog. We still need to service it. Someone please correct me if I have misunderstood.

@xC0000005
Copy link
Contributor

xC0000005 commented Oct 9, 2020 via email

@rhapsodyv
Copy link
Member

rhapsodyv commented Oct 11, 2020

I took the @xC0000005 suggestion and created a PR adding an option to disable the init, by the board: #19693

@sjasonsmith
Copy link
Contributor

I think we should wait to close this until some users try it and verify that it works on a few different boards. If you have verified that the watchdog is now working for you, please reply with the hardware you tested with.

@sjasonsmith
Copy link
Contributor

sjasonsmith commented Oct 12, 2020

I just tested and verified the watchdog works on a BTT SKR E3 DIP.

I posted a PR to make watchdog testing easier. If you add the change from #19697, you can enable MARLIN_DEV_MODE and simply send a D100 command through serial. It will either reboot in about 5 seconds, or print an error after ~10-15 seconds.

@qwewer0
Copy link
Contributor

qwewer0 commented Oct 12, 2020

Without #19693 changes, for (#19697) D100, the printer hangs, then I get FAILURE: Watchdog did not trigger board reset., without reboot, and with #19693 changes, after D100 the printer hangs for a few sec, then reboots, so it looks good.

SKR Mini E3 v1.2

@sjasonsmith
Copy link
Contributor

We've seen it working on a few different MKS and BTT boards now. I am going to close this bug.

If anyone encounter issues related to the watchdog on an STM32F1 boards, please report a new issue specific to your board.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants