Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix time gap with RTC millis() overflow #1167

Merged

Conversation

jvasileff
Copy link
Contributor

@jvasileff jvasileff commented Nov 16, 2024

The millis state was internally represented with a 32 bit number, using timer_overflow_count as the high 16 bits, and RTC.CNT as the low 16 bits. The max value before overflow was 0xFFFFFFFF (or 4294967295) which corresponded to about 4194303999ms given the timer's frequency of 1024Hz. But then, one millisecond later, after overflow, the internal value 0x0 would correspond to 0ms, skipping all millisecond values between (inclusive) 4194304000 and 2^32-1, causing an apparent time warp into the future by 100663296ms or 28 hours.

After this commit, the internal representation is in milliseconds, which results in no time warp.


To test, the following code can be used, which both checks for gaps in the returned value of millis() and the accuracy of millis() over repeating intervals.

#include <Arduino.h>

const int INTERVAL_MS = 2000; // must be < 4195 for 16MHz clock
const int TOLLERANCE_MS = INTERVAL_MS / 20 + 1;

uint32_t lastIntervalMillis = 0;

void startTimerA0() {
    // raise TCA_SINGLE_OVF_bm every INTERVAL_MS

    uint16_t top = (F_CPU / 1000) * INTERVAL_MS / 1024; // Assuming a prescaler of 1024
    TCA0.SINGLE.CTRLA = 0; // Stop the timer
    TCA0.SINGLE.CTRLD = 0; // Ensure the timer is not in split mode
    TCA0.SINGLE.PER = top; // Set the period value
    TCA0.SINGLE.CNT = 0; // Initialize the count to 0
    TCA0.SINGLE.INTFLAGS = TCA_SINGLE_OVF_bm; // Clear the overflow flag
    TCA0.SINGLE.CTRLA = TCA_SINGLE_CLKSEL_DIV1024_gc | TCA_SINGLE_ENABLE_bm; // Set the prescaler and enable the timer
}

void setup() {
    Serial.begin(115200);
    Serial.println("Starting up");
    Serial.flush();

    startTimerA0();
    lastIntervalMillis = millis();
}

boolean timerTriggered() {
    if (TCA0.SINGLE.INTFLAGS & TCA_SINGLE_OVF_bm) {
        TCA0.SINGLE.INTFLAGS = TCA_SINGLE_OVF_bm;
        return true;
    }
    return false;
}

void loop() {
    static uint32_t lastMessageMillis = millis() / 1000UL * 1000UL;
    static uint32_t lastLoopMillis = millis();
    uint32_t currentLoopMillis = millis();

    if (currentLoopMillis - lastMessageMillis >= 1000) {
        Serial.printf("%lums: Running, all ok\n", currentLoopMillis);
        lastMessageMillis += 1000;
    }

    boolean error = false;

    if (timerTriggered()) {
        uint32_t elapsed = currentLoopMillis - lastIntervalMillis;
        lastIntervalMillis = currentLoopMillis;
        if ((elapsed > INTERVAL_MS + TOLLERANCE_MS)
                || (elapsed < INTERVAL_MS - TOLLERANCE_MS)) {
            Serial.println("Error: elapsed per millis() does not match interval");
            Serial.printf("    elapsed: %lu, interval: %i\n", elapsed, INTERVAL_MS);
            error = true;
        }
    }

    if (currentLoopMillis - lastLoopMillis > 1) {
        Serial.println("Error: millis() jumped");
        error = true;
    }

    if (error) {
        Serial.printf("    lastLoopMillis:    %lu\n", lastLoopMillis);
        Serial.printf("    currentLoopMillis: %lu\n", currentLoopMillis);
        Serial.println("halting.");
        while (true);
    }

    lastLoopMillis = currentLoopMillis;
}

Since the rollover problem only occurs after about 48 days, the following patch can be applied to wiring.c to queue rollover in 64 seconds:

diff --git i/megaavr/cores/megatinycore/wiring.c w/megaavr/cores/megatinycore/wiring.c
index 4fa43ae4..55ef9ecf 100644
--- i/megaavr/cores/megatinycore/wiring.c
+++ w/megaavr/cores/megatinycore/wiring.c
@@ -1431,6 +1431,7 @@ void __attribute__((weak)) init_millis()
       pTCD->INTCTRL        = 0x01; // enable interrupt
       pTCD->CTRLA          = TIMERD0_PRESCALER | 0x01; // set clock source and enable!
     #elif defined(MILLIS_USE_TIMERRTC)
+      timingStruct.timer_overflow_count = 0xffff;
       while(RTC.STATUS); // if RTC is currently busy, spin until it's not.
       // to do: add support for RTC timer initialization
       RTC.PER             = 0xFFFF;

With this, running the test program produces the following output:

...
4194300000ms: Running, all ok
4194301000ms: Running, all ok
4194302000ms: Running, all ok
4194303000ms: Running, all ok
0ms: Running, all ok
Error: millis() jumped
    lastLoopMillis:    4194303999
    currentLoopMillis: 0
halting.

After this PR is applied, the following can be used to queue rollover to verify the fix:

diff --git i/megaavr/cores/megatinycore/wiring.c w/megaavr/cores/megatinycore/wiring.c
index 3cd8ae59..7d09aaab 100644
--- i/megaavr/cores/megatinycore/wiring.c
+++ w/megaavr/cores/megatinycore/wiring.c
@@ -1391,6 +1391,7 @@ void __attribute__((weak)) init_millis()
       pTCD->INTCTRL        = 0x01; // enable interrupt
       pTCD->CTRLA          = TIMERD0_PRESCALER | 0x01; // set clock source and enable!
     #elif defined(MILLIS_USE_TIMERRTC)
+      timingStruct.timer_millis = 0xffffffffUL - 10000UL; // start 10 seconds before overflow
       while(RTC.STATUS); // if RTC is currently busy, spin until it's not.
       // to do: add support for RTC timer initialization
       RTC.PER             = 0xFFFF;

and the test program produces the following output:

...
4294964000ms: Running, all ok
4294965000ms: Running, all ok
4294966000ms: Running, all ok
4294967000ms: Running, all ok
704ms: Running, all ok
1704ms: Running, all ok
2704ms: Running, all ok
3704ms: Running, all ok
...

The millis state was internally represented with a 32 bit number, using
timer_overflow_count as the high 16 bits, and RTC.CNT as the low 16
bits. The max value before overflow was 0xFFFFFFFF (or 4294967295) which
corresponded to about 4194303999ms given the timer's frequency of
1024Hz. But then, one millisecond later, after overflow, the internal
value 0x0 would correspond to 0ms, skipping all millisecond values
between (inclusive) 4194304000 and 2^32-1, causing an apparent time warp
into the future by 100663296ms or 28 hours.

After this commit, the internal representation is in milliseconds, which
results in no time warp.
@SpenceKonde
Copy link
Owner

That is a nasty bug you found. Time travel is not permitted! Especially not by 28 hours, that's long enough to buy a winning lotto ticket.

@SpenceKonde SpenceKonde merged commit 786bdfa into SpenceKonde:master Dec 14, 2024
4 of 81 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants