-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Firmware resets while homing #24927
Comments
Using the same firmware build, does recompiling with Input Shaping disabled fix the issue? |
Yes, it does.
…On Tue, Oct 25, 2022, 1:45 AM Keith Bennett ***@***.***> wrote:
Using the same firmware build, does recompiling with Input Shaping
disabled fix the issue?
—
Reply to this email directly, view it on GitHub
<#24927 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANCHFOL2A5F4AOHP3TAWZDWE5XYRANCNFSM6AAAAAARNTF3XY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hi, I had this problem and it was related to the lack of RAM. The end of compilation showed me 97% usage and it was not enough. |
Already checked that from provided configs
|
I have same board but also have about 10 hours of prints with input shaping enabled and no resets seen. I do see the homing grinding into the edge when it starts close to edge or if I abort a print using menu but I see that with input shaping not compiled in as well and its not 100% reproducible. The main difference I saw in configs is I do not have UBL enabled so it might help to isolate if you test with bilinear enabled. Also, just an FYI. I noticed you had a typo in BLTOUCH_HS_MODE define so not sure how that is being treated during compile. |
@cbagwell and @thespooniest are you both using sensorless homing? @thespooniest Is homing speed the same with and without IS enabled? And when the firmware resets, do you get a message saying "homing failed"? @thespooniest What default_env do you have in platform.ini? And once we have the answer, @cbagwell do you have the same? |
I am not using sensorless homing. As part of this process I did discover that I'd forgotten to remove the DIAG jumper on my mainboard (as you're supposed to do with the SKR Mini e3 v3), but removing it does not seem to have fixed the problem: the motor still grinds if homing starts too close to zero, and the board still resets after a second or two either way. Homing speed seems to be the same with and without IS enabled, at least on X and Y. I don't know about Z, because the firmware resets before reaching that part. I don't get any message, either onscreen or through OctoPrint: the firmware just resets. My default_env is default_envs = STM32G0B1RE_btt |
I'm not using sensorless homing as well and also default_envs = STM32G0B1RE_btt. My wild guess is if I disabled QUICK_HOME then it might resolve the homing issue (but I haven't taken time to test that idea yet). That's because It moves diagonal until it hits the X end stop then Y movement stops while X starts grinding. I haven't reviewed the code yet but I assume that means it has entered some state where its should be backing back up but is confused and maybe its that concurrent Y movement that is causing the confusion. I see the reference BTT Mini E3 V3 config has QUICK_HOME commented out but I carried it over from my CrealityV422 config file. Maybe its commented out for a reason? In my case, I eventually get a homing failed message but I do not see a reset and have to power cycle manually to recover. But mine of course is just an inconvenience since an average home does work and I can use IS. |
I tried disabling QUICK_HOME. It didn't solve the reset issue. The motor grinding changes, but isn't fixed. If I start far from the zero point, the board resets a second or so after the homing starts, even though it hasn't hit the zero point, just like before. If I start a medium distance from the zero point, the head eventually hits zero and begins to grind, but the board resets a second or two after homing starts, again like before. But if the head starts very close to zero, it will grind until it generates a "Homing Failed - Please Reset" error. This halts the printer, which prevents the automatic reset that the other situations have. |
Possible clue. As I attempted to change the nozzle, I told the printer to move the extruder -10mm. This also caused a spontaneous reset. Perhaps it has something to do with input shaping while moving an axis into the negatives? (And I also broke my thermistor, so I may not be able to do tests for myself for a couple days. New one arrives Thursday ). |
I have a SKR Mini v3 coming my way, so will try to replicate. |
But for now @cbagwell can you build with @thespooniest's exact config and see if it replicates for you? |
I was able to reproduce the reboot on first try with attached config files. Still the biggest difference between my config is UBL is enabled so that would be my next guess to toggle off. I'll try to do more testing in a few hours or at least upload a diff of my working config. |
My printer ended up being used for prints so I wasn't able to debug... so I'll attach diff between my no-reset config and the reset config. In addition to UBL, here are some items that jumped out at me that are X/Y related:
|
Okay, here's a hypothesis: ADAPTIVE_STEP_SMOOTHING causes the stepper ISR to be called so often that when you add extra ISR calls for input shaping, the CPU spends all its time in ISRs. Consequently endstop logic is not processed and the watchdog timer never gets reset, resulting in a reboot. What happens if you disable |
It will probably be tomorrow before I have access to my printer to test but I wanted to add some support to the hypothesis. ADAPTIVE_STEP_SMOOTHING was recently disabled in config files for this specific board because that appeared to solve some cases with BLTOUCH failing. Others have found another solution to allow ADAPTIVE_STEP_SMOOTHING with BLTOUCH by increase interrupt priority of PWM timer (#24922 ) while others needed to disable both ADAPTIVE_STEP_SMOOTHING and ENDSTOP_INTERRUPTS_FEATURE (mentioned in #24351 ) for BLTOUCH stability. The common theme in those solutions are to reduce or rearrange interrupts to solve bltouch issues so the board is likely already struggling with the interrupts it has and adding INPUT_SHAPING on top of ADAPTIVE_STEP_SMOOTH may be pushing it into watchdog territory. I bought a MAX7219 at same time as this board specifically to use with your profiling code with it but it's currently collecting dust in a corner. That would likely be useful info in this case so I'll try to make that my next project. |
Oh a MAX7219 would really help. I am glad you have one. |
I was able to run some more tests and you where right.
So now its clear that enabling both ADAPTIVE_STEP_SMOOTH and INPUT_SHAPING causes the board to spend to much time in interrupts. I think I may have also gotten some clues that my home grinding even when I have both ADAPTIVE_STEP_SMOOTH and INPUT_SHAPING disabled could be from lost interrupts. Perhaps there is a burst of interrupts right when you initiate a home sequence and could explain why it often has issues when you start the home really close to zero. |
Excellent. Now with both INPUT_SHAPING and ADAPTIVE_STEP_SMOOTING, is the problem resolved by adding
before Marlin/Marlin/src/module/stepper.cpp Line 2468 in 6b4d7b9
|
That did indeed allow me to home X/Y without rebooting! Sample size of 1 though. |
It halves the ASS ISR rate when IS is enabled. I think we may have solved this one. |
No sooner did I replace the thermistor than I broke the cable for my BLTouch. But I was able to confirm that the motors don't grind and the firmware doesn't reset, so I think this is fixed. |
I'm chasing down some other things at present but in due course I'll get this or a similar fix into my is_improvements branch. Thanks for reporting it and helping find the bug. |
I think there is probably another bug related to ADAPTIVE_STEP_SMOOTHING: the replay buffer is too small if ASS actually engages (and that depends on your mainbard speed) when printing or moving at high speed. This will result in some crude speed limiting kicking in in the stepper module which will slow your print and if you also have LINEAR_ADVANCE enabled, it will not know about the speed change and will overextrude. |
I was browsing around stepper.cpp looking for other things that might be contributing to ISR running to much. I wonder how accurate ISR_S_CURVE_CYCLES needs to be? The STM32G0 is a cortex-m0 and missing the umull instruction so it's falling back to the unoptimized logic. ISR_S_CURVE_CYCLES is hard coded to the 40-ish cycles of the optimized version though and I think this board is running closer to 200 cycles. I looked at the assembly generated and the 3 64-bit multiplications are causing _aeabi_lmul() to be called and that function is 40 instructions. I've seen versions of that function done in 25 instructions so probably room for long term improvement. Short term I'm wondering if its worth updating the define to use a higher value for this board? |
I think that makes sense (as a separate PR!). Make it depend on the same condition, i.e. the opposite of |
I retested with is_improvements branch and INPUT_SHAPING+ADAPTIVE_STEP_SMOOTHING+S_CURVE_ACCELERATION and unfortunately homing still was not working. It was still grinding into end stop and appears to reboot from watchdog after a couple of seconds. I even tested with and without my update to scurves cycles from 40 to 500; hoping that would back off the CPU usage enough; but no difference. I did finally hook up my MAX7219 and used it during this time. It appears that the instant the homing started that the display also stopped being updated (7 out of 8 LED's light on profiling and no LED's changed and the alive led stopped blinking). Once I reinstall without ADAPTIVE_STEP_SMOOTHING then things go back to normal. It stays very idle during homing (7 out of 8) but I get hints of some kind of bursty operations going on (at very end of Z home, I get a blip down to only 1 led for example). |
In case it helps, here is value of ISR_EXECUTION_CYCLES for these boards (with my +500 cycles for SCURVE instead of +40). I'm still trying to understand the usage of the CPU frequency divided by some fixed # since it will result in a smaller cycle count the slower your CPU. I'd think the value would be used to multiple against some fixed cycle count and not simply added in.
|
I now have an SKR Mini V3 so I can play around with this myself. How did you connect your MAX7219, i.e. which connector and what config? |
There is a set of pins labelled SPI1 you can get 5V and GND from but the remaining pins are also used by SD card so you can't use them freely. There is also a row of 5 pins labeled I/O that are free to use and I used the 3 closest to SPI1 power pins.
|
Okay, I have Marlin on my BTT SKR Mini E3 V3.0 with INPUT_SHAPING, ADAPTIVE_STEP_SMOOTHING and S_CURVE_ACCELERATION and it runs fine with G28. My MAX7219 is connected and shows that the processor is fairly quiet during homing. What processor does your board have? Mine has the STM32G0B1RET6. I do not have a UI. I wonder whether that might make a difference. Could you post your config? |
Interesting. Original reporting and I both have Ender 3 Pro's so using that style UI. The original poster posted a config zip at top and here is link to repo I keep my config in as well. My default config has ADAPTIVE_STEP_SMOOTHING disabled so please enable if you start from that; but it does also have MAX7219 enabled with above pins. |
One other thing I didn't have compiled was a probe. So I plugged in my new CRTouch and it burnt! Damned Z probe connector on the BTT was backwards. So dispiriting. |
Frustrating! I would be happy to donate a new one as thanks for all the new features. |
So it turns out the probe was not related to the problem anyway. But the display is, sort of. When I disable CR10_STOCKDISPLAY homing no longer triggers the watchdog timer. The stepper ISR is taking pretty much all the processor time and just updating the display pushes it over the edge. I have also established that both your changes and mine together change MIN_STEP_ISR_FREQUENCY from 27681 to 17021 which is less than the factor of 2 that is needed to give the CPU spare cycles. Even when I force ASS to run at half the frequency, by moving CodeProfiler from So the ASS calculation is far too inaccurate for the STM32G0B1RE. I wonder how much Stepper::calc_timer_interval() contributes to this. It calls
That's very kind, Chris, but it was already a spare which was donated to me and I was happy to have it on that basis for testing purposes. My actual printer uses a different sensor so I don't really need it for myself. It's just frustrating to have a perfectly good piece of hardware destroyed by one incompetent component placement in another piece of hardware. |
@cbagwell I have measured Also, I measure the call to the SCURVE logic at 544 cycles. So that's another 44. I am still measuring but there is still quite a lot more time taken in the ISR than is being counted. |
Will do.
I also see some calls to |
I suspect that will be
and the same for X. I have a change in the pipeline that will eliminate this code. Just need to resolve the SKR Mini and TMC2208 issues first. PS these divisions only happens once per segment so I am not overly concerned about them. |
Hmm, I suspect SCURVE, on the other hand, has a much lesser effect because it only occurs during accel and decl ramps. And it turns out using a float as a boolean is really expensive. I'll fix some of these and see where we end up. |
Hopefully, this patch will one day land in gcc to be better optimized for cortex-m0: https://www.mail-archive.com/[email protected]/msg294320.html We are just checking off each of those functions one-by-one in this ISR. It's still a long, long way off until that would be visible in platformio's gcc though. It, for example, optimized _ |
@cbagwell have a look at https://github.com/tombrazier/Marlin/tree/scurve_stm32g which prompts the compiler to use |
Clever. I was concentrating on how to optimize the original assembly and how small I could make a 32x32=64. I didn't consider optimizing the C path and taking advantage of the 32bit shift. Probably by using assembly, we can also use STM32G0's 32x32=32 instruction and feed it the 16bit values to reduce a bit more. I'll give it a test. |
I did not have as much free time as I hoped today so I tested it as-is. I rebased my config branch to latest bugfix-2.1.x then added is_improvements+scruve_stm32g and my scurve-cycles-cortexm0 branch. I also enabled ADAPTIVE_STEP_SMOOTHING+S_CURVE_ACCELERATION+INPUT_SHAPING. The good news is that I could successfully home with no reboots so I think we are on the right track with increased cycle counts and improving scurve speeds. I notice the MAX7219 has what looks like a 1.5 second freeze right when I initiate homing and also again after X/Y is done/moves to center of bed and is getting ready to lower the probe. Might be something to investigate deeper later on as the freeze might explain some probe failures. I also successfully printed a hollow calibration cube I use for flow tests. It prints slooow because of Cura defaulting to nothing faster than 10 seconds a layer. I then printed the ringing print from input shaping pull request (I did all the stuff to get full speed). This is the bad news. That print had some issues. The print would freeze every once in a while for like 60 seconds and then just resume on its own for unknown reasons. I also had severe under extrusion. I need to do some troubleshooting tomorrow to narrow down the cause since this wasn't a small chance since my last successful print. |
I reverted the scurve_stm32g change and the occasional 60 second pauses went away. I still have under extrusion on my ringing tower at 90mm/s but I'm now thinking that might be related to my extruder stepper motor going bad earlier this month and I replaced it with a backup stepper+extruder that I've never been happy with and likely needs more tuning. I'll switch back to testing at normal speeds until I get that worked out. |
Shouldn't this:
be this (high part shifted back up to high part)?
|
I don't think so. You need 2 16 bits shifts to make up the 32 bit shift. I have not tested this logic to see that it gives the same answers. It should be almost exactly the same with an additional tiny rounding error. |
I see where I went wrong. I was thinking about doing a 32bitx32bit=64 followed by >> 32 like this:
and since we know we are going to do that shift why not skip the full 32-bit shift in first place. I somehow translated that to still doing a 16 bit shift. For what I was thinking to work, it would reduce to this:
In my quick gcc test on Intel, all 3 versions do not match original (unit64_t(f) * t) >> 32 as close as I'd assumed they would. So I'll play around a bit more to understand why. |
My first version throws away the product |
I got a new stepper motor and now my under extrusions are gone. I'm also sure I did something wrong that caused the long pauses so I started fresh with latest is_improvements+scurve_stm32g branches and my scurve-cycles-cortexm0 branch. I also enabled ADAPTIVE_STEP_SMOOTHING+S_CURVE_ACCELERATION+INPUT_SHAPING. I do not normally have ASS enabled but this was to stress the board while doing IS ringing tower tests. This test run was the best looking and most stable print of a ringing tower to date on this board. This board has some homing issues that pre-date IS being merged in (it would sometimes grind into an end stop and that was with ASS disabled). With current is_improvements, I am now unable to reproduce those grinding issues and it used to be pretty easy to reproduce. And finally, I reviewed the assembly from scurve_stm32g and it dropped from 500 cycles down to 257 cycles. Nice reduction. There are 3 64-bit multiplications left and they are accounting for 170 of those remaining cycles. |
This was fixed by #24951 . |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Did you test the latest
bugfix-2.1.x
code?Yes, and the problem still exists.
Bug Description
The firmware resets early in the homing process. If the axes started near their zero point, the firmware actually acts as though the switches weren't there, continuing to grind into the zero point until the firmware resets. But if the axes start far from their zero points, the firmware will reset anyway, a few seconds after they begin to move, even if they haven't hit their endstops yet.
Bug Timeline
This started on 10/24/2022, not long after Input Shaping merged in.
Expected behavior
I expected the homing process to complete normally.
Actual behavior
If the axes start close to the zero point, they will grind against the endstops for a second or two before the firmware resets. If the axes start far from the zero point, the firmware still resets a second or two after movement begins, even if they have not reached the zero point.
Steps to Reproduce
Version of Marlin Firmware
bugfix-2.1.x, 2022-10-24
Printer model
Creality Ender 3 Pro
Electronics
BigTreeTech SKR Mini e3 v3
Add-ons
BLTouch 3.1
Bed Leveling
UBL Bilinear mesh
Your Slicer
Cura
Host Software
OctoPrint
Don't forget to include
Configuration.h
andConfiguration_adv.h
.Additional information & file uploads
Marlin.zip
The text was updated successfully, but these errors were encountered: