Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Bad print results with certain options combined #11205

Closed
Bergerac56 opened this issue Jul 5, 2018 · 70 comments
Closed

[BUG] Bad print results with certain options combined #11205

Bergerac56 opened this issue Jul 5, 2018 · 70 comments
Labels
T: Question Questions, generally redirected to other groups.

Comments

@Bergerac56
Copy link

As I had a spare MKS SBASE, I decided to give it a try and replaced the RE-ARM/RAMPS from my printer. I also gave a try to some options I did not really test till now. With the SBASE, I got a more stable temperature reading (no coils needed anymore) and a stable full graphic LCD (no lines appearing when the SD card is accessed) BUT... I got also very strange print results when playing with the 4 options mentionned below.

Hardware: MKS SBASE (no other changes to the printer which was fully functionnal using the RE-ARM and has already printed a lot !)
Software : Bugfix 2.0 8299ac1

The "set" of options tested:
S_CURVE_ACCELERATION
JUNCTION_DEVIATION
ADAPTIVE_STEP_SMOOTHING
LIN_ADVANCE

When none of them (or JUNCTION+ADAPTIVE or LIN_ADVANCE alone or ADAPTIVE alone or JUNCTION alone or S_CURVE alone) are activated, I get the "expected" print (not cleaned up)
image

When all are enabled (or ADAPTIVE+JUNCTION+LIN or ADAPTIVE+JUNCTION+S_CURVE), I get:
image

I made 8 prints with various combinations of the options. Basically all "bad" prints have the same distorsions along X (mainly) and Y (a bit less). It does not seem to affect Z

To summarize:
The 4 options unabled or only one of them or JUNCTION + ADAPTIVE = good (acceptable) print
At least 3 options or the 4 ones = bad print

It could be related to #11047. I do not know.

This was not tested with my old RE-ARM configuration (and difficult to replace the MKS SBASE only for tests)

Does this ring a bell?

My config files (all options enabled):
Config.zip

@ejtagle
Copy link
Contributor

ejtagle commented Jul 5, 2018

Try increasing MINIMUM_STEPPER_PULSE to 3, and test again. I know we have an issue there, it's very hard to predict the optimizations that are done to the code by GCC when compiling, so it is very hard to predict the pulse width we will get - I have already disassembled the compiled versions with those options enabled/disabled, and the code creating the pulses is exactly the same, so it is still subject to speculation the cause of the shortened pulses...

@Bergerac56
Copy link
Author

I will try tonight.

@Bergerac56
Copy link
Author

Bergerac56 commented Jul 6, 2018

@ejtagle As a test print takes more than one hour, I tested with a MINIMUM_STEPPER_PULSE = 4 (to be sure).
The print succeded and is as expected.

It seems to be the same problem than #11047.

In my case, with an expected MINIMUM_STEPPER_PULSE = 2, and when more than 2 of the 4 options are activated, it seems to corrupt the stepper_pulse.

I just ordered a logic analyser to be able to have a better view on what is going on.

@ejtagle
Copy link
Contributor

ejtagle commented Jul 6, 2018

The main dilemma I have with this, is that it is very, very hard to ensure (execution) timing on code that it optimized by GCC. Part of the pulse time is done by taking into account the time that the set/clear IO pin code takes to execute.

In 32bit boards, adding an extra delay is not as problematic, but on AVR, it's the difference between success and failure, and we cannot afford adding extra delays of any kind.

The only way to ensure timing would probably be to rewrite the pulse generation code in assembler, as, counting on the way GCC optimizes code, is a very fragile approach - prone to break as soon as a new version of GCC is released

I have already tried to tame the GCC optimizer by proper usage of the volatile keyword for several variables, but then, the code is too much suboptimized (but the timings stabilize, undoubtedly) ... Right now i am trying to find a combination of options that could work and ensure proper timing and also timing predictability, but trust me, it is not an easy task!

@p3p
Copy link
Member

p3p commented Jul 6, 2018

I've also been trying a few things, forcing 16byte alignment of the functions, changing the optimisation to 03 instead of 0s for just the stepper methods, even moved the ISR functions into RAM just to see, investigation and testing continues slowly and is difficult to verify,

In order to avoid the issue completely I also played around with having an unstep isr, works really well with very clean 50% duty cycle step clock, I did a quick hack implementation to investigate whether the idea was tenable, unstep only appears to take about 3us on AVR, < 1us on ARM, .. hacks here if you want to look, obviously I broke features it was just a ISR test and the entire idea may be a dead end, and we would still need to fix the pulse timing for double/quad step.

@ejtagle
Copy link
Contributor

ejtagle commented Jul 6, 2018

@p3p: At some point, we considered with @thinkyhead to just double the ISR rate and start the step pulse at one ISR and stop it at the second. On 32bits boards it is possible, and would completely avoid delays in the ISR. On AVR it would severely impact performance... :( ...
Another option (but probably unfeasible) was to use a PWM to create the pulse width by hw. But unfortunately, most boards are not mapping step pins to hw pwms...
I am inclined to rewrite just the loop that produces the stepping in assembler..

@ejtagle
Copy link
Contributor

ejtagle commented Jul 6, 2018

Grbl IS using the unstep ISR idea that you are trying, for slow drivers is the only answer...

@p3p
Copy link
Member

p3p commented Jul 6, 2018

Removing the logic I did from the Pulse isr actually took more time from the ISR time than was added by the unstep so it was overall a performance improvement.. for some reason, but like I said I'm pretty sure I broke absolutely everything apart from standard single pulse operation.

@ejtagle
Copy link
Contributor

ejtagle commented Jul 6, 2018

In ARM, the overhead of entering an ISR is just 12 cycles, so the unstep is probably a winning strategy there... On AVR the overhead is more than 200 cycles, UNLESS we write it in assembler to JUST stop the pulses... And even if we do, it would be a loss. Don´t take me wrong: This is one of the cases when ARM is a very superior solution... All approaches (unstep, isr doubling rate) are a win...

@p3p
Copy link
Member

p3p commented Jul 6, 2018

My previous comment was actually in reference to AVR, but I agree the timing shouldn't have come back that way and was probably in relation to a different change, but the unstep ISR was consistently 3us, where the step ISR was reduced by ~10us. Using the bitset for PULSE_START/PULSE_STOP may have been somewhat of an optimisation (~4us tested with just that change). (of course the extra 200 cycles are not taken into account in the timings, so that's an additional 12.5us)

@ejtagle
Copy link
Contributor

ejtagle commented Jul 6, 2018

Basically you don´t need at all to check any conditions in the unstep isr. Just turn off all step pulses. Probably would be even faster

@thinkyhead
Copy link
Member

And the unstep ISR doesn't have to be the same ISR as the stepper ISR. If there are enough spare timers, it can just be a "fire once" timer started from the stepper ISR around the point where pulses are being started.

@ejtagle
Copy link
Contributor

ejtagle commented Jul 7, 2018

@thinkyhead : Grbl uses a 2nd timer to "turn off" the stepping pulses. But we are using it for the temperature ISR, so, i don´t know if we have any extra unused timers on AVR.

I completely agree that probably that could save tons of cycles inside the stepper ISR, at least for slower drivers (DRV88xx, TBxxxx), because for all the other ones, they admit pulses so small that a delay in the stepper ISR is the best solution.

MKS Sbase "sports" DRV8825 that require 2us to reliably work, and it uses a 32bit MPU.

I have analyzed the disassembled code, i have tweaked it a lot to ensure timing (on a private branch), but i still have to figure out a way to avoid impacting the AVR generated code with a fix for the 32bit code...

@thinkyhead
Copy link
Member

Sorry, didn't mean to imply AVR.

@ejtagle
Copy link
Contributor

ejtagle commented Jul 7, 2018

The problem is mostly that... Trying to share code between both... :S

@Stephane-80
Copy link

hello, is there a solution for this problem?
it does not seem corrected in the last bugfix?

@Bergerac56
Copy link
Author

I suppose the problem is still there. I solved it by setting MINIMUM_STEPPER_PULSE = 4 (for RE-ARM and SBASE). When set, I am able to combine the different options without printing problems

@ejtagle
Copy link
Contributor

ejtagle commented Jul 28, 2018

The problem is still there. I was working on a solution, but some other (unrelated) problems happened...
Eventually i will do the fix

@Stephane-80
Copy link

ok thank you
I'll wait
in the meantime I will use
MINIMUM_STEPPER_PULSE = 4

@Glod76
Copy link

Glod76 commented Oct 3, 2018

I'm getting very bad layer shifts and unreliable extrusion with s-curve, lin advance, JUNCTION_DEVIATION and ADAPTIVE_STEP_SMOOTHING all at once as well. does setting MINIMUM_STEPPER_PULSE=4 still fix this? should i be putting in drv8852 setting for MINIMUM_STEPPER_DIR_DELAY, MAXIMUM_STEPPER_RATE aswell or leave then to auto? also, I'm having lag around smaller circles. any help would be great or if someone could give me a working configuration.h and configuration_adv.h would be much appreciated. been working on this for a week now.

@thinkyhead
Copy link
Member

@Glod76 — Are you slicing with Simplify3D?

@AnHardt
Copy link
Contributor

AnHardt commented Oct 3, 2018

@Glod76
Configs please. Very short segments?

@Glod76
Copy link

Glod76 commented Oct 3, 2018

Yes with Simplify3D and segments @ 6. I've also noticed the with junction deviation on, my extruder motor is screeching. junction deviation was set to 0.02

@Glod76
Copy link

Glod76 commented Oct 3, 2018

My configs
Configuration.h.txt
Configuration_adv.h.txt

@thinkyhead
Copy link
Member

There's a known bug in Simplify3D that can cause strange pauses and potentially other issues. See if using another slicer gives better results.

@Glod76
Copy link

Glod76 commented Oct 3, 2018

I have tried cura with octoprint same weird layer shifts and stuttering on smaller curves as well as my extruder screeches. and it's not the motor, I've tried different motors all make the same screeching sounds.

@Glod76
Copy link

Glod76 commented Oct 3, 2018

I thought something was wrong with the board so switched back to smoothieware to compare and it works flawlessly, no layer shifts, slowdowns or extruder noises. so has to do with my marlin configs somehow. and I really don't want to use smoothieware. I like marlin much better. When it's working lol

@Glod76
Copy link

Glod76 commented Oct 3, 2018

I'm using this on my anet a8 atm and need it working so I can print my parts for my hypercube im building. i have the 2020 coming today and need to get printing lol. i have a FT-5 R2 aswell but its booked with prints and cant spare the down time for this project.

@Bergerac56
Copy link
Author

Bergerac56 commented Oct 3, 2018

@Glod76 Could you try to use the overwrites in configuration_adv.h uncomment them and try those values):

#define MINIMUM_STEPPER_DIR_DELAY 650
#define MINIMUM_STEPPER_PULSE 4
#define MAXIMUM_STEPPER_RATE 250000

@Glod76
Copy link

Glod76 commented Oct 3, 2018

Okay, thanks. also, why would Junction deviation cause my extruder to make horrible sounds?

@Bergerac56
Copy link
Author

@boelle No it is not solved. At least not on my side. I just completed some tests:

TEST 1: If the 4 options mentionned at start of this discussion (included LIN_ADVANCE) are active, the correct type of drivers (here DVR8825) are declared in configuration.h and no overrides for pulse are done in configuration_adv: It does not work. Still huggly prints.

TEST 2: Correct type of drivers declared in configuration.h, no overrides in configuration_adv.h and even no LIN_ADVANCE active (but well the 3 other options): Bad results. Same as test 1

TEST 3: correct type of drivers in configuration.h and #define MINIMUM_STEPPER_PULSE 4 declared: Good print. No noticable error. With or without LIN_ADVANCE or any mix of the 4 options described at the beginning of this discussion.

Conclusion: Problem not solved yet. It seems that, when a mix of more than 2 options among S_CURVE_ ACCELERATION, JUNCTION_DEVIATION (or the new setup for this option) ADAPTIVE_STEP_SMOOTHING and LIN_ADVANCE is used (does not seem specifically related to LIN_ADVANCE for me), you need absolutely to have #define MINIMUM_STEPPER_PULSE 4 (at least for SBASE)

Or perhaps I am missing something somewhere ?

@boelle
Copy link
Contributor

boelle commented Nov 24, 2019

Lack of Activity
This issue is being closed due to lack of activity. If you have solved the
issue, please let us know how you solved it. If you haven't, please tell us
what else you've tried in the meantime, and possibly this issue will be
reopened.

@boelle boelle closed this as completed Nov 24, 2019
@Bergerac56
Copy link
Author

@boelle Hello,
As indicated above, the issue is not solved. You closed it. Fine. Let's hope that somebody else will solve it...

@boelle boelle reopened this Nov 25, 2019
@sjasonsmith
Copy link
Contributor

@boelle, could you attach updated configs that reproduce the issue for you with the latest state of bugfix-2.0.x?

@boelle
Copy link
Contributor

boelle commented Nov 25, 2019

i dont have any issue

@sjasonsmith
Copy link
Contributor

sjasonsmith commented Nov 25, 2019

Oops, I meant to tag @Bergerac56.

@Bergerac56
Copy link
Author

@sjasonsmith Hello Jason,
Here are the config files based on commit 55595d4 (Last bugfix)
Those configurations files were tested on my printer (MKS SBASE) and produce a bad print (similar with the pictures at the beginning of this "issue").

Config.zip

@sjasonsmith
Copy link
Contributor

I have been playing with this today on an MKS sBase, where the problem was originally reported.
I am seeing pulses as low as 990 nanoseconds, when we are expecting 2 microseconds. They are almost always 990ns or 1.95us, there is rarely any in-between. This indicates to me that something is context switching out at a bad time, and interfering with the timing. I believe the issue is that the systick ISR is interleaving itself with the timing calculations. This causes half of the pulse time to be consumed by that ISR, before we have even changed the state of the pins.

I'm currently playing with some different ways to get around the issue. I'll post an update once I have a little more detail.

@sjasonsmith
Copy link
Contributor

I am seeing pulses as low as 990 nanoseconds

I was actually only measuring X when I said this. If I measure E I'm seeing pulses as narrow as 856 ns (Of the intended 2ms).

@sjasonsmith
Copy link
Contributor

@Bergerac56, I've reworked the ISR to ensure the proper minimum pulse length. This is only an experiment right now, more care would be needed to prepare it for general consumption. I haven't tried it with any other platforms or drivers so can't guarantee what the timing behavior will be there.

The quality problems were not as extreme on my printer as they are in your picture, so I'm not positive this is the only problem you are facing. You might just be printing a lot faster than me, which might exaggerate the problem. I am getting smoother prints with this change, and validated on the logic analyzer that the short pulses are gone. The minimum I'm seeing for pulses with this change is about 2.05 microseconds.

https://github.com/sjasonsmith/Marlin/tree/11205_ISR_Step_Length

@gloomyandy
Copy link
Contributor

@sjasonsmith Interesting, with your changes what is the longest pulse you see? How does that compare with the old code (I'm not saying there is anything wrong with your changes, I'm just curious).

@sjasonsmith
Copy link
Contributor

I'm sure there are some things wrong with it, but it works for this one particular platform right now.
The longest pulse is about 3 microseconds. Basically, we are still losing a microsecond to the systick (my theory, at least), but the timers use differently so that it will err by extending the pulse rather than shortening it.

There is a behavior change there. The original code looks at the timer at the beginning of the loop, and assumes it will never be interrupted. If you lose control for any period of time, it should't change the duration of the overall loop, it would just add jitter to the pulse edges controlled within the loop, leading to the original problem we faced.

With my change, the duration of the systick may be added to the overall loop duration, depending on when it occurs. While that provides a better pulse length, it can change the overall ISR timing. I don't know how much that microsecond matters here. I knew I was messing in timing sensitive code, which is why I'm only considering this an experiment and not necessarily a solution.

Thinking about this more, software serial can probably totally destroy this minimum step time in the original code, especially on AVR. I believe the serial ISR can execute while this ISR is active (I may be wrong). A single byte at 115200 takes about 87 microseconds to receive, which would probably cause this code to act as if there is no MINIMUM_STEPPER_PULSE at all. I guess that is yet another reason not to monitor TMC status during a print...

@gloomyandy
Copy link
Contributor

gloomyandy commented Nov 26, 2019

Yep software serial during printing on the AVR is a very bad idea. But anyway my understanding is that the original code did...

   end_time = calc_end_time_relative_to_now
   start_pulse
   wait_for_end_time
   stop_pulse

Your version does...

   start_pulse
   end_time = calc_end_time_relative_to_now
   wait_for_end_time
   stop_pulse

Which means that if there is an interrupt just before setting the start of the pulse then the old version will reduce the pulse length, with your version if you get an interrupt just after the pulse is started (but before sampling the timer), then the pulse length will be extended. Things get a little more funky if multiple pulses are generated in one go, but again I think your new code should give better minimum pulse lengths.

As you say one side effect of this is that the total time that the isr can execute may go up. I wonder if this could be a problem? I think there is code that calculates when the next timer interrupt should happen and also decides it it needs to generate multiple pulses per interrupt. It could be that with your new version this calculation needs to be revisited? Any thoughts?

EDIT: Fixed stupid copy and paste error!

@boelle
Copy link
Contributor

boelle commented Nov 26, 2019

looks the same?

end_time = calc_end_time_realtive_to_now
start_pulse
wait_for_end_time
stop_pulse



Your version does...

end_time = calc_end_time_realtive_to_now
start_pulse
wait_for_end_time
stop_pulse

@gloomyandy
Copy link
Contributor

Fixed it!

@Bergerac56
Copy link
Author

Bergerac56 commented Nov 26, 2019

@gloomyandy I gave a try to your solution by replacing the "stepper" files in my version of marlin. This solves the print problem. I get the same result than when I override the MINIMUM_STEPPER_PULSE to 3us.

@boelle
Copy link
Contributor

boelle commented Dec 2, 2019

@gloomyandy so does marlin have an error or is it a config error?

@sjasonsmith
Copy link
Contributor

@gloomyandy so does marlin have an error or is it a config error?

This is a Marlin problem. It can be worked around by increasing the minimum stepper pulse.
I am actively working on proposed changes to fix this, but it is taking time to make sure I don't worsen anything on AVR at the same time.

@boelle
Copy link
Contributor

boelle commented Dec 2, 2019

oki doke :-) one more will soon be goneski, patience is the keyword :-D

@boelle
Copy link
Contributor

boelle commented Dec 10, 2019

i think this one could continue on #16128

me and @Vertabreak have all features enabled and no issues at all

any protest against moving the discussion and testing to #16128 ?

@Vertabreak
Copy link
Contributor

yep Ive been using all the listed settings above plus many many more with no issue for nearly a year.

@Bergerac56
Copy link
Author

Bergerac56 commented Dec 11, 2019

On the MKS SBASE, the problem is not solved for the moment. But @sjasonsmith is working on it in #16128 .

Then effectively, we can close this one

@github-actions
Copy link

github-actions bot commented Jul 3, 2020

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators Jul 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
T: Question Questions, generally redirected to other groups.
Projects
None yet
Development

No branches or pull requests