Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix int32_t microseconds overflow in navigation. #6806

Merged
merged 1 commit into from
Apr 18, 2021

Conversation

pieniacy
Copy link
Contributor

@pieniacy pieniacy commented Apr 8, 2021

Bug and reproduction

There is a severe bug in navigation. To reproduce: after 40mins of waiting arm the quad in the althold mode and the motors will instantly spin at full throttle. Needless to say, do not try that with the props on (and don't ask how I found out about this issue).

Cause

The navigation uses something called deltaMicrosPositionUpdate and it is calculated from zero at first update or since last update. This delta can get big either if you wait before first arm or between arms. This value is held in int32_t type, and 2^31 microseconds equals ~36 minutes, so in the range from ~36 to ~72 minutes the value will overflow to a negative number. There is a check if the delta is not too big, but a negative number obviously always satisfies this check. Then, the negative value of delta is taken into computations causing a mess and causing altitude controller to output full throttle.

To be precise deltaMicrosPositionUpdate is of type timeDelta_t and its definition looks like so:

// time difference, 32 bits always sufficient
typedef int32_t timeDelta_t;

Well, not always sufficient.

Naive solution

The naive solution would be to just change timeDelta_t to int64_t, but as this type is widely used in the project this would introduce performance degradation, for example this shows that 32 bits value of timeDelta_t is handled by the FPU while 64 bits is delegated to the soft emulation!
Almost all uses of timeDelta_t deal with very small changes which do not present much risk of overflow. Moreover, for example in the scheduler performance is quite important. Therefore just changing timeDelta_t to int64_t does not seem right.

This solution

Important changes are at the top of time.h and in lines where timeDeltaLarge_t is added in navigation_*.

Firstly, the comment of timeDelta_t was changed to accurately reflect the capabilities of the type and a TIMEDELTA_MAX define was added. Secondly there is a new timeDeltaLarge_t which typedefs to int64_t, which overflows at 300 000 years, so this one can definitely can hold any practical interval.

Secondly, in navigation_fixedwing.c, navigation_multicopter.c and navigation_rover_boat.c appropriate deltaMicrosPositionUpdates are changed to timeDeltaLarge_t and since they are checked for maximum possible interval anyway they will not be passed as timeDelta_t if they are big enough to cause overflow.

The last somewhat important change is in navigation_private.h where the maximum interval is asserted to fit in the timeDelta_t.

By the way

I also looked at other occurences of timeDelta_t to figure out where overflow is possible/dangerous and apart from navigation I haven't found any cases requiring more than 32 bits, so I only did some minor refactors of related code here and there.

@pieniacy
Copy link
Contributor Author

pieniacy commented Apr 8, 2021

It seems CI build failed to setup its environement (compiler download failed) and write access to the repository is required to rerun checks, can I have a rerun please?

@JulioCesarMatias
Copy link
Collaborator

good job, but replacing in32_t with uint32_t would be better, wouldn't it?

@pieniacy
Copy link
Contributor Author

pieniacy commented Apr 8, 2021

It would change the overflow point from ~36mins to ~72mins, so this would only push the problem a little bit further, not solve it.
I would say that if you deal with intervals that can span minutes, you should already use 64 bits (or timeDeltaLarge_t) anyway.

And in terms of the delta being signed, I think that this may have some pros, for example:
If we have thing1 and thing2 and we expect thing2 happen after thing1 we would do: delta = thing2Time - thing1Time. Now what if thing1 happened just after thing2? Our delta would indicate 72 minutes if we used uint32_t and -1us if we used a signed type. Then consider things like:

if (delta > MAX_DELTA) { // this would wrongly trigger if order is reversed and type is unsigned
    report_too_big_delay();
}
if (delta < 0) {
    report_reversed_order();
}

I think the possibility of negative delta can be useful, while doubling the range with unsigned is not a huge gain - if you need it you should use bigger type anyway.

Copy link
Member

@digitalentity digitalentity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DzikuVx DzikuVx added this to the 3.0 milestone Apr 15, 2021
@DzikuVx DzikuVx merged commit 1fe9389 into iNavFlight:master Apr 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants