Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Millis() Function Count Drift #3078

Closed
mrwgx3 opened this issue Mar 23, 2017 · 32 comments
Closed

Millis() Function Count Drift #3078

mrwgx3 opened this issue Mar 23, 2017 · 32 comments

Comments

@mrwgx3
Copy link
Contributor

mrwgx3 commented Mar 23, 2017

Basic Infos

Hardware

Hardware:     ESP12, Adafruit Huzzah Breakout and Feather Boards  
Core Version: 2.3.0

OS, IDE, and SDK

OS:   Windows 10 Professional and Linux Ubuntu 16.04LTS   
IDE:  1.6.12  
SDK:  1.5.3_16_04_18

Description

While studying how millis() and micros() work in order to extending them to 64-bits, I noticed an approximation in the millis() function which causes the returned millisecond value to incrementally drift 296us / ~71 minutes (2^32 us) of operation. This drift is cumlative, i.e., does not reset upon roll-over of millis(). The sketch included at the end of this post demonstrates the drift, and gives some corrections.

Settings in IDE

Module:          Adafruit HUZZAH ESP8266  
Flash Size:      4MB (3M SPIFFS)  
CPU Frequency:   80Mhz  
Debug Settings:  None presented by IDE  
Module:          Generic ESP8266 Module  
Flash Size:      4MB (3M SPIFFS)  
CPU Frequency:   80Mhz  
Flash Mode:      dio  
Flash Frequency: 40Mhz  
Upload Using:    Disabled or SERIAL  
Reset Method:    nodemcu  
Debug Port:      None
Debug Level:     None

Sketch Output

Millis() Drift Correction

1st Millis() overflow at ~49 days,  us overflow count = 1000
us_ovflow err: 296.00  ms
     Original     Corrected        Altn    Corr - Altn    Orig - Corr
      mills()      millis()     millis()    Difference     Difference     Heap
          88           384           384             0        -296       49328
        1095          1391          1391             0        -296       49352
        2095          2391          2391             0        -296       49352

At (1) year,  us overflow count = 7342
us_ovflow err: 2173.23  ms
     Original     Corrected        Altn    Corr - Altn    Orig - Corr
      mills()      millis()     millis()    Difference     Difference     Heap
  1468880042    1468882215    1468882215             0       -2173       49352
  1468881049    1468883222    1468883222             0       -2173       49352
  1468882049    1468884222    1468884222             0       -2173       49352

*** End ***

Sketch

//
// Millis() Drift Example
// This code illustrates how the ESP8266 implementation of Millis()
// incrementally drifts 296us / ~71 minutes (2^32 us) of operation.
// This drift is cumlative, i.e., does not reset upon roll-over of
// millis()
//
// 03/24/2017:  Edit to fix typo in routine 'us_overflow_tick()'
//                     Does not affect sketch output
//

#include <Arduino.h>
#include <ESP8266WiFi.h>
#include <stdio.h>

// Include API-Headers
extern "C" {                  // SDK functions for Arduino IDE access
#include "osapi.h"
#include "user_interface.h"
#include "espconn.h"
}  //end, 'extern C'

//---------------------------------------------------------------------------
// Globals

const int  LedBlu = 2, LedRed = 0, LedOff = 1, LedOn = 0;

// For 64-bit usec ring counter
static os_timer_t  us_ovf_timer;
static uint32_t    us_cnt = 0;
static uint32_t    us_ovflow = 0;
static uint32_t    us_at_last_ovf = 0;

//---------------------------------------------------------------------------
// Interrupt code lifted directly from "cores/core_esp8266_wiring.c",
// with some variable renaming
//---------------------------------------------------------------------------

// Callback for usec counter overflow timer interrupt
void  us_overflow_tick ( void* arg )
{
  us_cnt = system_get_time();

  // Check for usec counter overflow
  if ( us_cnt < us_at_last_ovf ) {
//  ++us_cnt;               // TYPO: should be ++us_ovflow
    ++us_ovflow;
  } //end-if
  us_at_last_ovf = us_cnt;

} //us_overflow_tick

//---------------------------------------------------------------------------
#define  REPEAT 1
void us_count_init ( void )
{
  os_timer_setfn( &us_ovf_timer, (os_timer_func_t*)&us_overflow_tick, 0 );
  os_timer_arm( &us_ovf_timer, 60000, REPEAT );

} //us_count_init

//---------------------------------------------------------------------------
// Original millis() function
unsigned long ICACHE_RAM_ATTR millis_orig ( void )
{
  uint32_t  m  = system_get_time();
  uint32_t  c  = us_ovflow + ((m < us_at_last_ovf) ? 1 : 0);
  return ( c * 4294967 + m / 1000 );

// Above equation appears to be an approximation for
//
// (c * 4294967296 + m) / 1000  ~= c * 4294967 + m / 1000
//
// Does this simplify the compilers work, i.e., is the multiply
// 'c * 4294967' less complex than 'c * 4294967296'? Both would
// seem to require a 64-bit result, although the former could 
// discard the most significant 32 bits.

} //millis_orig

//---------------------------------------------------------------------------
// Corrected millis(), 64-bit arithmetic, truncated to 32-bits by return
unsigned long ICACHE_RAM_ATTR millis_corr( void )
{
  uint32_t m = system_get_time();
  uint32_t c = us_ovflow + ((m < us_at_last_ovf) ? 1 : 0);
  return ( (c * 4294967296 + m) / 1000 );

} //millis_corr

//---------------------------------------------------------------------------
//  Alternate form, corrected millis()
//
//  Convert 64-bit form into 32-bit arithmetic, assuming that the compiler
//  is smart enough to return the least-significant 32-bit word for the
//  64-bit multiply, 'c * 4294967'
//
//  ( c * 4294967296 +  m ) / 1000
//  c * ( 429496700 + 296 ) / 1000  + (m / 1000)
//  c * 4294967 + ( m + c*296 ) / 1000
//
//  noting  c =  65536 * chi + clo,  where chi & clo are uint16_t quantities...
//
//  c*4294967 +            ( m + (65536 * chi + clo) * 296 )/1000
//    .....   + (m/1000) + ( clo*296 )/1000  +  chi*(65536*296/1000)
//    .....   + (m/1000) + ( clo*296 )/1000  +  chi*(19398 + 0.656)
//    .....   + (m/1000) + ( clo*296 )/1000  + (chi*1000*0.656)/1000 + chi*19398
//    .....   + (m/1000) + ( clo*296 )/1000  + (chi*656)/1000        + chi*19398
//    .....   + (m/1000) + ( clo*296 + chi*656 )/1000 + chi*19398
//
//   Original millis() form                 Additional terms
//
//  Truncated
//  64-bit?      32-bit              32-bit                 32-bit
//  c*4294967 + (m/1000)  +  ((clo*296 + chi*656 )/1000) + chi*19398
//
unsigned long ICACHE_RAM_ATTR millis_altn ( void )
{
  uint32_t m = system_get_time();
  uint32_t c = us_ovflow + ((m < us_at_last_ovf) ? 1 : 0);

  // High/low words of 'c'
  uint16_t chi = c >> 16, clo = c & 0xFFFF;

  // Note that truncation of 'c * 4294967' is explicitly enforced
  return (
           (uint32_t)(c * 4294967) + (m / 1000) +
           ((clo * 296 + chi * 656) / 1000) + chi * 19398
         ); //end-return

} //millis_altn

//---------------------------------------------------------------------------
// Output routines
//---------------------------------------------------------------------------

void millis_header ( void )
{
  Serial.print( "us_ovflow err: " ); Serial.print( 0.296 * us_ovflow );  Serial.println( "  ms" );
  Serial.println( F("     Original     Corrected        Altn    Corr - Altn    Orig - Corr") );
  Serial.println( F("      mills()      millis()     millis()    Difference     Difference     Heap") );

} //millis_header

//---------------------------------------------------------------------------
void millis_compare ( void )
{
  uint32_t ms_orig  = millis_orig();
  uint32_t ms_corr  = millis_corr();
  uint32_t ms_altn  = millis_altn();
  uint16_t heap     = ESP.getFreeHeap();

  char buf[60] = "";
  sprintf( buf, "%12d  %12d  %12d",
           ms_orig,  ms_corr,  ms_altn
         ); //end-sprint
  Serial.print( buf );

  Serial.print( "    " );

  sprintf( buf, "%10d  %10d     %7d",
           (int32_t)(ms_corr - ms_altn),
           (int32_t)(ms_orig - ms_corr),
           heap
         ); //end-sprint
  Serial.println( buf );

} //millis_compare

//---------------------------------------------------------------------------
//---------------------------------------------------------------------------
void setup ()
{
  pinMode(LedRed, OUTPUT);         // Turn off LEDs
  pinMode(LedBlu, OUTPUT);         // ...
  digitalWrite(LedRed, LedOff);    // ...
  digitalWrite(LedBlu, LedOff);    // ...

  Serial.begin(115200);
  while (!Serial) delay(250);      // Wait until Arduino Serial Monitor opens
  Serial.println( " " );
  Serial.println(F("Millis() Drift Correction"));

  WiFi.mode( WIFI_OFF );

  // Interrupt for overflow detection, millis() function tests
  us_count_init();

  // Test A
  Serial.println();
  Serial.println( F("1st Millis() overflow at ~49 days,  us overflow count = 1000") );
  us_at_last_ovf = 0;
  us_ovflow = 1000;      // Error is 0.296 ms * overflow count = 296 ms

  millis_header();
  for ( int i = 0; i < 3; i++ ) {
    millis_compare();
    delay(1000);
  } //end-for

  // Test B
  Serial.println();
  Serial.println( F("At (1) year,  us overflow count = 7342") );
  us_at_last_ovf = 0;
  us_ovflow = 7342;      // Error is 0.296 ms * overflow count = 2173 ms

  millis_header();
  for ( int i = 0; i < 3; i++ ) {
    millis_compare();
    delay(1000);
  } //end-for

  Serial.println();
  Serial.println( "*** End ***" );

} //setup

//---------------------------------------------------------------------------
void loop(void)
{
  yield();
} //loop

//---------------------------------------------------------------------------
//---------------------------------------------------------------------------
@bebo-dot-dev
Copy link

It sounds like you're observing inaccuracy drift in the onboard RTC which is a fairly widely documented and commented thing.

If you've not found a solution, you might find this interesting (and the linked espressif RTC sample): http://www.esp8266.com/viewtopic.php?p=38750#p38750

@igrr
Copy link
Member

igrr commented Mar 29, 2017

I don't think that millis drift has to do anything with RTC... We don't use RTC clock for anything other than deep sleep. The observation about the rounding error in the current code is the correct one.

@holla2040
Copy link

holla2040 commented May 23, 2017

I'm confirming the same drift in core code. Now trying the code above in my app.

@devyte
Copy link
Collaborator

devyte commented Aug 25, 2017

I confirm the same drift.

@devyte
Copy link
Collaborator

devyte commented Sep 7, 2017

@holla2040 were you able to test the above? It will be a while before I can get to it myself.

@holla2040
Copy link

Hate to say this, but I can't remember if I tested this. Nor do I remember what project I was working on where drift was important. Sorry for being a loser. Kinda busy around here.

@devyte
Copy link
Collaborator

devyte commented Sep 8, 2017

@holla2040 not a loser, you found a very subtle thing.
I understand about being busy, but can you test the above to confirm the proposed fix? If you confirm it, I think making a PR and merging would be rather straighforward.

@holla2040
Copy link

fair enough, I'll try working on it tomorrow.

@devyte
Copy link
Collaborator

devyte commented Jan 3, 2018

@holla2040 did you get around to testing this?

@holla2040
Copy link

No but thanks for reminding me of my slackerness. I have a esp8266 connected for another project. I'll work on it right now.

@holla2040
Copy link

I'm still taking data.

@holla2040
Copy link

Here's my testing approach. get time from ntpdate function as reference, then build table with ref time, millis_orig, millis_corr, millis_altn. see attached. I was expecting to see a drift over time of millis_orig, but didn't. So I think my approach is flawed or my setup is incorrect or millis() has been corrected in esp8266 source. I do have a question though, what does setting us_ovflow do?

esp8266NTP.ino.txt
ntpdate.txt

@devyte
Copy link
Collaborator

devyte commented Jan 7, 2018

Unfortunately, I can't test this anymore (my engineer who tested before no longer works for me).
I do remember seeing the code described above, and I don't remember any relevant fixes recently.
@mrwgx3 could you rerun your test with latest git?

@holla2040
Copy link

lastest git of esp8266?

@devyte
Copy link
Collaborator

devyte commented Jan 7, 2018

I meant for OP to rerun the original tests with latest git of the repo and compare results against what was reported in the first post.

@mrwgx3
Copy link
Contributor Author

mrwgx3 commented Jan 11, 2018

Hi @devyte and @holla2040,

Got you messages.

@mrwgx3 could you rerun your test with latest git?

By 'git', I'm assuming that you mean clone the arduino/esp8266 master repository into the appropriate Arduino IDE directories. Alas, I haven't figured out how to do that yet, as I did the esp8266 install as a 'portable' installation. Still learning the ropes...

It wouldn't matter anyhow, as the variables required to test the code are inaccessible. This is why I created the test sketch with duplicated timing routines from 'core_esp8266_wiring.c'. To simulate impractically long elapsed times, I only have to manipulate the now exposed (and renamed) 'us_overflow'.

Given that I'm working with the 'master' branch, I was surprised to see that no changes to the millis() function or its support routines have been made. I did see a new 'micro64()' function (#3830), but its used elsewhere.

Since millis() wasn't changed, my guess is that @iggr decided that for the timescales that millis() is used over, the reported drift is too small to bother with. So please confirm that I'm looking in the right repo branch, and if so, ascertain the reason millis() was not changed (and if it will ever be changed).

If everything is cool, I would would just close out this issue, as there is nothing to test.

@d-a-v
Copy link
Collaborator

d-a-v commented Jan 11, 2018

According to what I understand in @holla2040's test, there is no observable drift over ~40hours (a shift of ~300ms exists at the beginning and at the end of this test). During that time we would have had a drift of 10ms which cannot be observable without strong spectacles.
We would have to wait for 166 days (5~6 months) to observe a 1s drift.
This is all summarized in the OP's output section.

I do not question the validity of this issue, and it is interesting to clarify it. It is obviously worth the change or addition, for the sake of precision, but we need first to evaluate the increased cost of calling millis() in terms of cputime since it is extensively used.

Would it be interesting to make a new function available in the core (in a way it would not increase sketchs size if it is not used) with a different name like millis_caesium_133() ?

About how to use the current version of the core, it is explained there: (link). It is advised to make an install from scratch.
It will be useful when such function will be added.

@5chufti
Copy link
Contributor

5chufti commented Jan 11, 2018

for the problem of the OP it is not necessary to prove by sketch, it is a pure systematic (rounding) error that can be proven mathematically on paper as igrr allready stated.
So the "correction" is not trying to compensate a temperature drift or crystal inaccuracy but simple correcting a shortcut in (q'n'd) programming during the early days of arduino esp core.
The necessary changes to git would just be to replace this

// Original millis() function
  return ( c * 4294967 + m / 1000 );

by this

// Corrected millis(), 64-bit arithmetic, truncated to 32-bits by return
  return ( (c * 4294967296 + m) / 1000 );

the rest of the initial posted setch is only for proof and demonstration of error and effect

@devyte
Copy link
Collaborator

devyte commented Jan 12, 2018

@mrwgx3 yes, by latest git I meant master branch with latest changes, which (usually) means installing via git. Instructions are on readthedocs as linked by @d-a-v .
The relevant code hasn't been fixed due to other priorities, not because it was determined that a fix isn't worth it. I looked at your changes a while back, and I confirmed as @5chufti said: I looked at your explanation and at the code, studied the problem, and realized that you are correct. The drift may be tiny, but that doesn't mean it should be overlooked.
What I didn't quite understand was your test, which is what my (ex) engineer handled, hence my request to rerun in light of @holla2040 's report.
Also, I'm not sure which of the two solutions proposed by you is better. As stated, we should choose the one that is lighter in terms of execution time. The first seems to smaller code, but calculate partly with 64bits, while the second looks larger in code, but does only 32bit calculations. Is that correct?
To decide on one, how about executing each in a loop for a certain number of iterations, and measuring the system clock cycles around the loop?

@mrwgx3
Copy link
Contributor Author

mrwgx3 commented Jan 12, 2018

@devyte commented:

Also, I'm not sure which of the two solutions proposed by you is better. As stated, we should
choose the one that is lighter in terms of execution time. The first seems to smaller code, but
calculate partly with 64 bits, while the second looks larger in code, but does only 32bit
calculations. Is that correct?

To decide on one, how about executing each in a loop for a certain number of iterations, and
measuring the system clock cycles around the loop?

Neither do I. Normally, I look at the compiler's assembly source to make a decision, but I'm not up to speed on the Tensillica? processor the ESP8266 uses. You are correct that 'millis_altn()' tries to confine itself to 32-bit word arithmetic, with maybe (1) 64-bit interm generated by 'c*4294967'. The original 'millis()' deals with that issue as well. The intent was to avoid the 128-bit result from a 64bit x 64bit multiply.

Your proposed benchmark would resolve which approach is faster. I probably can get to it sometime next week.

@mrwgx3
Copy link
Contributor Author

mrwgx3 commented Jan 16, 2018

Description

As promised, here are some benchmarks comparing the run-times for various Millis() function candidates (see 'Sketch Output'). After some experimentation, I settled on the following function:

//---------------------------------------------------------------------------
// Alternate form, corrected millis, 2nd pass               (1.13 x Original)
//
//    One (1/1000) divide converted to right-shifts
//
//    (1.024 * N) * ((clo * 296 + chi * 656) / (N * 1024)  ) 
//  = ((clo * (1.024 * N) * 296 + chi * (1.024 * N) * 656) >> (N+10) ) 
//  = ((clo * 2425 + chi * 5374) >> 13)  for N = 3
//
unsigned long ICACHE_RAM_ATTR millis_test( void )
{
  uint32_t m = system_get_time();
  uint32_t c = us_ovflow + ((m < us_at_last_ovf) ? 1 : 0);

  // High words of 'c'
  uint16_t chi = c >> 16;    // clo = c & 0xFFFF;

  // Note that truncation of 'c * 4294967' is explicitly enforced
  return (
           (uint32_t)(c * 4294967) + (m / 1000) +
           (( (c & 0xFFFF) * 2425 + chi * 5374) >> 13) + chi * 19398
         ); //end-return

} //millis_test

It runs about 15% slower than the original function, and is still reasonably compact.
I have not tested it exhaustively, however.

Sketch Output

Millis() Drift Correction / Runtime Estimates

1st Millis() overflow at ~49 days, us overflow count = 1000
us_ovflow err: 296.00  ms
     Original     Corrected        Test    Corr - Test    Orig - Corr
      mills()      millis()     millis()    Difference     Difference     Heap
     4305321       4305617       4305617             0        -296       48592
     4306328       4306624       4306624             0        -296       48624
     4307328       4307624       4307624             0        -296       48624

At   1 year, us overflow count = 7348
us_ovflow err: 2175.01  ms
     Original     Corrected        Test    Corr - Test    Orig - Corr
      mills()      millis()     millis()    Difference     Difference     Heap
  1498955077    1498957252    1498957252             0       -2175       48624
  1498956084    1498958259    1498958259             0       -2175       48624
  1498957084    1498959259    1498959259             0       -2175       48624

At  10 years, us overflow count = 73476
us_ovflow err: 21748.90  ms
     Original     Corrected        Test    Corr - Test    Orig - Corr
      mills()      millis()     millis()    Difference     Difference     Heap
  2048694332    2048716081    2048716081             0      -21749       48624
  2048695339    2048717088    2048717088             0      -21749       48624
  2048696340    2048718089    2048718089             0      -21749       48624

At 100 years, us overflow count = 734758
us_ovflow err: 217488.37  ms
     Original     Corrected        Test    Corr - Test    Orig - Corr
      mills()      millis()     millis()    Difference     Difference     Heap
  3259682386    3259899874    3259899874             0     -217488       48624
  3259683393    3259900881    3259900881             0     -217488       48624
  3259684394    3259901882    3259901882             0     -217488       48624

*** End, Drift Test ***

Millis RunTime Benchmark

      usec   x Orig   Comment
Orig: 3.27   1.00     Original code
Corr: 9.80   3.00     64-bit reference code
Altn: 4.82   1.47     32-bit trunc. code
Test: 3.71   1.13     32-bit trunc. code, some divides rescaled to shifts

*** End, Bench Test ***```

Sketch

//
// Millis() Runtime Benchmark
//
// a) Include code from Millis_Drift
//    This code illustrates how the ESP8266 implementation of Millis()
//    incrementally drifts 296us / ~71 minutes (2^32 us) of operation.
//    This drift is cumlative, i.e., does not reset upon roll-over of
//    millis()
//
// b) Add code to determine the runtime in 'usec' of various millis()
//    function candidates.
//

#include <Arduino.h>
#include <ESP8266WiFi.h>
#include <stdio.h>

// Include API-Headers
extern "C" {                  // SDK functions for Arduino IDE access
#include "osapi.h"
#include "user_interface.h"
#include "espconn.h"
}  //end, 'extern C'

//---------------------------------------------------------------------------
// Globals

const int  LedBlu = 2, LedRed = 0, LedOff = 1, LedOn = 0;

// For 64-bit usec ring counter
static os_timer_t  us_ovf_timer;
static uint32_t    us_cnt = 0;
static uint32_t    us_ovflow = 0;
static uint32_t    us_at_last_ovf = 0;

//---------------------------------------------------------------------------
// Interrupt code lifted directly from "cores/core_esp8266_wiring.c",
// with some variable renaming
//---------------------------------------------------------------------------

// Callback for usec counter overflow timer interrupt
void  us_overflow_tick ( void* arg )
{
  us_cnt = system_get_time();

  // Check for usec counter overflow
  if ( us_cnt < us_at_last_ovf ) {
    ++us_ovflow;
  } //end-if
  us_at_last_ovf = us_cnt;

} //us_overflow_tick

//---------------------------------------------------------------------------
#define  REPEAT 1
void us_count_init ( void )
{
  os_timer_setfn( &us_ovf_timer, (os_timer_func_t*)&us_overflow_tick, 0 );
  os_timer_arm( &us_ovf_timer, 60000, REPEAT );

} //us_count_init

//---------------------------------------------------------------------------
// Original millis() function                                    (1.0 x Orig)
unsigned long ICACHE_RAM_ATTR millis_orig ( void )
{
  uint32_t  m  = system_get_time();
  uint32_t  c  = us_ovflow + ((m < us_at_last_ovf) ? 1 : 0);
  return ( c * 4294967 + m / 1000 );

// Above equation appears to be an approximation for
//
// (c * 4294967296 + m) / 1000  ~= c * 4294967 + m / 1000
//
// Does this simplify the compilers work, i.e., is the multiply
// 'c * 4294967' less complex than 'c * 4294967296'? Both would
// seem to require a 64-bit result, although the former could 
// discard the most significant 32 bits.

} //millis_orig

//---------------------------------------------------------------------------
// Corrected millis(), 64-bit arithmetic gold standard           (3.0 x Orig)
// truncated to 32-bits by return
unsigned long ICACHE_RAM_ATTR millis_corr( void )
{
  uint32_t m = system_get_time();
  uint32_t c = us_ovflow + ((m < us_at_last_ovf) ? 1 : 0);
  return ( (c * 4294967296 + m) / 1000 );

} //millis_corr

//---------------------------------------------------------------------------
//  Alternate form, corrected millis()                           (1.5 x Orig)
//
//  Convert 64-bit form into 32-bit arithmetic, assuming that the compiler
//  is smart enough to return the least-significant 32-bit word for the
//  64-bit multiply, 'c * 4294967'
//
//  ( c * 4294967296  +  m ) / 1000
//  c * ( 4294967000 + 296 ) / 1000  + (m / 1000)
//  c * 4294967 + ( m + c*296 ) / 1000
//
//  noting  c =  65536 * chi + clo,  where chi & clo are uint16_t quantities...
//
//  c*4294967 +            ( m + (65536 * chi + clo) * 296 )/1000
//    .....   + (m/1000) + ( clo*296 )/1000  +  chi*(65536*296/1000)
//    .....   + (m/1000) + ( clo*296 )/1000  +  chi*(19398 + 0.656)
//    .....   + (m/1000) + ( clo*296 )/1000  + (chi*1000*0.656)/1000 + chi*19398
//    .....   + (m/1000) + ( clo*296 )/1000  + (chi*656)/1000        + chi*19398
//    .....   + (m/1000) + ( clo*296 + chi*656 )/1000 + chi*19398
//
//   Original millis() form                 Additional terms
//
//  Truncated
//  64-bit?      32-bit              32-bit                 32-bit
//  c*4294967 + (m/1000)  +  ((clo*296 + chi*656 )/1000) + chi*19398
//
unsigned long ICACHE_RAM_ATTR millis_altn ( void )
{
  uint32_t m = system_get_time();
  uint32_t c = us_ovflow + ((m < us_at_last_ovf) ? 1 : 0);

  // High/low words of 'c'
  uint16_t chi = c >> 16, clo = c & 0xFFFF;

  // Note that truncation of 'c * 4294967' is explicitly enforced
  return (
           (uint32_t)(c * 4294967) + (m / 1000) +
           ((clo * 296 + chi * 656) / 1000) + chi * 19398
         ); //end-return

} //millis_altn

//---------------------------------------------------------------------------
// Corrected millis(), Replace 1st mult by shift                 (3.2 x Orig)
unsigned long ICACHE_RAM_ATTR millis_test_LSH( void )
{
  uint32_t  m = system_get_time();
  uint32_t  c = us_ovflow +  ( (c < us_at_last_ovf) ? 1 : 0);
  return(  ( ((uint64_t)c << 32) + m ) / 1000 );    

} //millis_test_LSH

//---------------------------------------------------------------------------
// Corrected millis(), Eliminate 1st mult by little endian union (3.4 x Orig)
unsigned long ICACHE_RAM_ATTR millis_test_LEU( void )
{
  union {
     uint64_t  sum;
     uint32_t  mc[2];
  } a;
  a.mc[0] = system_get_time();
  a.mc[1] = us_ovflow +  ( (a.mc[0] < us_at_last_ovf) ? 1 : 0);
  return( a.sum / 1000 );    

} //millis_test_LEU

//---------------------------------------------------------------------------
// Alternate form, corrected millis, 2nd pass               (1.13 x Original)
//
//    One (1/1000) divide converted to right-shifts
//
//    (1.024 * N) * ((clo * 296 + chi * 656) / (N * 1024)  ) 
//  = ((clo * (1.024 * N) * 296 + chi * (1.024 * N) * 656) >> (N+10) ) 
//  = ((clo * 2425 + chi * 5374) >> 13)  for N = 3
//
unsigned long ICACHE_RAM_ATTR millis_test( void )
{
  uint32_t m = system_get_time();
  uint32_t c = us_ovflow + ((m < us_at_last_ovf) ? 1 : 0);

  // High words of 'c'
  uint16_t chi = c >> 16;    // clo = c & 0xFFFF;

  // Note that truncation of 'c * 4294967' is explicitly enforced
  return (
           (uint32_t)(c * 4294967) + (m / 1000) +
           (( (c & 0xFFFF) * 2425 + chi * 5374) >> 13) + chi * 19398
         ); //end-return

} //millis_test

//---------------------------------------------------------------------------
// Output routines
//---------------------------------------------------------------------------

void millis_header ( void )
{
  Serial.print( "us_ovflow err: " ); Serial.print( 0.296 * us_ovflow );  Serial.println( "  ms" );
  Serial.println( F("     Original     Corrected        Test    Corr - Test    Orig - Corr") );
  Serial.println( F("      mills()      millis()     millis()    Difference     Difference     Heap") );

} //millis_header

//---------------------------------------------------------------------------
void millis_compare ( void )
{
  uint32_t ms_orig  = millis_orig();
  uint32_t ms_corr  = millis_altn();
  uint32_t ms_test  = millis_test();
  uint16_t heap     = ESP.getFreeHeap();

  char buf[60] = "";
  sprintf( buf, "%12u  %12u  %12u",
           ms_orig,  ms_corr,  ms_test
         ); //end-sprint
  Serial.print( buf );

  Serial.print( "    " );

  sprintf( buf, "%10d  %10d     %7d",
           (int32_t)(ms_corr - ms_test),
           (int32_t)(ms_orig - ms_corr),
           heap
         ); //end-sprint
  Serial.println( buf );

} //millis_compare

//---------------------------------------------------------------------------
//---------------------------------------------------------------------------

void Millis_DriftTest ( char *cmnt )
{
  Serial.println();
  Serial.print( cmnt );
  Serial.print( ", us overflow count = " );
  Serial.println( us_ovflow );
  
  us_at_last_ovf = 0xFFFFFFFE;  // Set close to max 
  millis_header();
  for ( int i = 0; i < 3; i++ ) {
    millis_compare();
    delay(1000);
  } //end-for

} //Millis_DriftTest

//---------------------------------------------------------------------------
void Millis_DriftAD ( void )
{
  // Test A
  us_at_last_ovf = 0xFFFFFFFE;
  us_ovflow = 1000;
  Millis_DriftTest( "1st Millis() overflow at ~49 days" );

  // Test B
  us_at_last_ovf = 0xFFFFFFFE;
  us_ovflow = 7348;
  Millis_DriftTest( "At   1 year" );

  // Test C
  us_at_last_ovf = 0xFFFFFFFE;
  us_ovflow = 73476;
  Millis_DriftTest( "At  10 years" );

  // Test D
  us_at_last_ovf = 0xFFFFFFFE;
  us_ovflow = 734758;
  Millis_DriftTest( "At 100 years" );

  Serial.println();
  Serial.println( F("*** End, Drift Test ***") );

} //Millis_DriftAD

//---------------------------------------------------------------------------
//---------------------------------------------------------------------------

void  Millis_RunTimes ( void )
{
  Serial.println();
  Serial.println( "Millis RunTime Benchmark" );
  
  uint32_t lc = 100000;          // Samples
  float    nsf = 1 / float(lc);  // Normalization
  uint32_t bgn, cnto = 0, cntc = 0, cnta = 0, cntx = 0; 

  us_at_last_ovf = 0xFFFFFFE;    // Slightly less than max
  us_ovflow = 1000;
  for (uint32_t i = 0; i < lc; i++ )
  {
    bgn = system_get_time();
    millis_orig();
    cnto += system_get_time() - bgn;

    bgn = system_get_time();
    millis_corr();
    cntc += system_get_time() - bgn;

    bgn = system_get_time();
    millis_altn();
    cnta += system_get_time() - bgn;

    bgn = system_get_time();
    millis_test();
    cntx += system_get_time() - bgn;
    
    yield();
  } //end-for

  Serial.println();
  Serial.println( "      usec   x Orig   Comment" );

  Serial.print( "Orig: " );
  Serial.print( nsf * (float)cnto );
  Serial.print( "   " );
  Serial.print( 1.00 );
  Serial.println( "     Original code" );
 
  Serial.print( "Corr: " );
  Serial.print( nsf * (float)cntc );
  Serial.print( "   " );
  Serial.print( (float)cntc / (float)cnto );
  Serial.println( "     64-bit reference code" );
  
  Serial.print( "Altn: " );
  Serial.print( nsf * (float)cnta );
  Serial.print( "   " );
  Serial.print( (float)cnta / (float)cnto );
  Serial.println( "     32-bit trunc. code" );

  Serial.print( "Test: " );
  Serial.print( nsf * (float)cntx );
  Serial.print( "   " );
  Serial.print( (float)cntx / (float)cnto );
  Serial.println( "     32-bit trunc. code, some divides rescaled to shifts" );

  Serial.println();
  Serial.println( F("*** End, Bench Test ***") );

} //Millis_RunTimes

//---------------------------------------------------------------------------
//---------------------------------------------------------------------------
void setup ()
{
  pinMode(LedRed, OUTPUT);         // Turn off LEDs
  pinMode(LedBlu, OUTPUT);         // ...
  digitalWrite(LedRed, LedOff);    // ...
  digitalWrite(LedBlu, LedOff);    // ...

  Serial.begin(115200);
  while (!Serial) delay(250);      // Wait until Arduino Serial Monitor opens
  Serial.println( " " );
  Serial.println(F("Millis() Drift Correction / Runtime Estimates"));

  WiFi.mode( WIFI_OFF );

  // Interrupt for overflow detection, millis() function tests
  us_count_init();

  Millis_DriftAD();       // Drift test, 49 days, 1, 10, & 100 years
  Millis_RunTimes();      // Runtime benchmarks

} //setup

//---------------------------------------------------------------------------
void loop(void)
{
  yield();
} //loop

//---------------------------------------------------------------------------
//---------------------------------------------------------------------------

@devyte
Copy link
Collaborator

devyte commented Jan 16, 2018

This is awesome. I didn't even know the trick of converting to right shift for speed.
I think a 13% slowdown in this function is acceptable for the correction.

@mrwgx3
Copy link
Contributor Author

mrwgx3 commented Jan 17, 2018

Hi @devyte
Did some exhaustive testing tonight, proposed solution breaks at around 7 years.
Wouldn't commit just yet. Will get back to you all when I understand why

@mrwgx3
Copy link
Contributor Author

mrwgx3 commented Jan 28, 2018

Hi All

It turns out that I had (2) problems with the proposed 'millis()' solution:

a) For high values of 'm' and 'c', the calculation exceeded the 32-bit word
   length used. This is why things seem to blow up around  (7) years; one
   which I consider unacceptably short.

b) The approximation occasionally did not match the gold-standard code, and was
   typically off by +/- 1, pretty much throughout the ranges for both 'm' and 'c'.

The only thing left to explore was to find a fast implementation for the 64-bit arithmetic used by the slow 'gold-standard' routine. After doing a little research, I found that one can do fixed-point division by a constant simply multiplying by a scaled multiplicative inverse of the divisor. These 'magic numbers' are often used by compilers when dividing by small constants. See this link for a good discussion:

http://ridiculousfish.com/blog/posts/labor-of-division-episode-i.html

Given a magic number of 64-bits precision and a divisor of 10-bits precision (1000), one can accurately divide a dividend with up to 54-bits precision simply by multiplying by the magic number. This corresponds a dividend range of (54-32 = 22 bits), or 0x400000, which translates to about 570 years of usec counter overflows.

//---------------------------------------------------------------------------
// millis() 'magic multiplier' approximation
// Input:
//    'm' - 32-bit usec counter,           0 <= m <= 0xFFFFFFFF
//    'c' - 16-bit usec overflow counter   0 <= c <  0x00400000
// Output:
//    Returns milliseconds in modulo 0x10000000  ( 0 to 0xFFFFFFFF)
//
// A 'magic multiplier', equal to '(2^n) / const', is used to
// approximate fixed-point division by a constant.  For this
// application, k = Ceiling[ 2^64 / 1000 ] (max. bits we can use)
//
// A distributed multiply with offset-summing is used find k( 2^32 c + m):
//  prd = (2^32 kh + kl) * ( 2^32 c + m )  
//      = 2^64 kh c + 2^32 kl c + 2^32 kh m + kl m
//           (d)         (c)         (b)       (a)
//
// Important: ** Don't mess with (uint64_t) type castings **

#define  MAGIC_1E3_wLO  0x4bc6a7f0    // LS part
#define  MAGIC_1E3_wHI  0x00418937    // MS part

unsigned long ICACHE_RAM_ATTR millis_test ( void )
{
  uint32_t  a[3];  // 96-bit accumulator, little endian
  a[2] = 0;        // Zero high-acc

  // Get usec system time, usec overflow counter
  uint32_t  m = system_get_time();
  uint32_t  c = us_ovflow + ((m < us_at_last_ovf) ? 1 : 0);

  ((uint64_t *)(&a[0]))[0]  =              // (a) Init. low acc
     ( m * (uint64_t)MAGIC_1E3_wLO );

  ((uint64_t *)(&a[1]))[0] +=              // (b) Offset sum, mid-acc
     ( m * (uint64_t)MAGIC_1E3_wHI );

  ((uint64_t *)(&a[1]))[0] +=              // (c) Offset sum, mid-acc
     ( c * (uint64_t)MAGIC_1E3_wLO );
  
  ((uint32_t *)(&a[2]))[0] +=              // (d) Truncated sum, high-acc
     (uint32_t)( c * (uint64_t)MAGIC_1E3_wHI );

  return ( a[2] );  // Extract result, high-acc

} //millis_test

I have exhaustively tested the above code using the Borland C++ Builder v4.0 compiler for 0 < m < (2^32-1) at c = 0x400000, and did sampled testing for 0 < 'c' <= 0x400000. Benchmark times for the ESP8266 Huzzah Feather were:

Millis RunTime Benchmark

         usec   x Orig   Comment
 Orig:   3.18   1.00     Original code

 Corr:  13.94   4.38     64-bit reference code, DEBUG
 Test:   5.61   1.76     64-bit magic multiply, 4x32 DEBUG

 Corr:  13.21   4.15     64-bit reference code
 Test:   4.66   1.46     64-bit magic multiply, 4x32

*** End, Bench Test ***

The magic multiplier ran ~3x faster than the gold-standard. Execution times, however, vary considerably with the numbers being multiplied, so one should probably lower this factor to the expected worst case of around x2.

Given the trade-offs between complexity and speed, this is best algorithm I've been able to come up with. Got any ideas for improvements? How do you wish to proceed?

@5chufti
Copy link
Contributor

5chufti commented Jan 28, 2018

NERD please take this as polite compliment <8)

@devyte
Copy link
Collaborator

devyte commented Jan 29, 2018

@mrwgx3 awesome work. It took me a while to understand the init of a[], it's... strange usage, but the init does look correct.
My one question, and it's one off the top of my head without testing: why the [0] index, instead of * dereference? E.g:

 *((uint64_t *)(&a[0]))  =              // (a) Init. low acc
     ( m * (uint64_t)MAGIC_1E3_wLO );

@mrwgx3
Copy link
Contributor Author

mrwgx3 commented Jan 29, 2018

@schufti

NERD please take this as polite compliment <8)

Thank you, I do have to work at it!

@devyte

My one question, and it's one off the top of my head without testing: why
 the [0] index, instead of * dereference?

Just a matter of personal preference. When proofing code, I find it much harder to spot a missing '*' de-reference at the beginning of a line. Both are equivalent, pick the one you like best.

It took me a while to understand the init of a[], it's... strange usage,
but the init does look correct.

The initialization / summing order is dictated by the little endian format. Here is the summing alignment presented graphically:

Little endian  LS -> MS
        32       64       96      128
|  a[0]  |  a[1]  |  a[2]  |  a[3]  |
|       m kl      |    0   |    0   |
|        |       m kh      |        |
|        |       c kl      |        |
|        |        |       c kh      |

If one replaces the 'm kl' product with

  ((uint64_t *)(&a[1]))[0]  =              // (a) Init. mid. acc
     ((uint64_t)( m * (uint64_t)MAGIC_1E3_wLO ) >> 32);     // 'uint64_t' typecast before shift is REQUIRED

the incidental save to a[0] could be eliminated, and the accumulator ultimately shortened by 32-bits. The right-shift, however, comes with a modest time penalty. If you need the ram space, it might be worth it. Is this what you meant by your question?

@mrwgx3
Copy link
Contributor Author

mrwgx3 commented Jan 29, 2018

Some additional comments after reviewing last night's post.

As C or C++ does not have an intrinsic test for addition overflow, one has to code specifically to detect one. As proposed millis() contains none of this because of speed requirements, it means that:

(highword( m kl ) + m kh + c kl)  < (2^64 -1), i.e., this addition can't overflow
(We don't have to worry about ant 'c kh' addition overflows, as we're discarding a[3])

To meet this criteria, not only do we have to pick 'k' to achieve our desired precision, we also have to split 'k' appropriately to avoid any addition overflows.

Ideally, 'k' must be also chosen as to align the various products on byte boundaries to avoid any 64-bit word shifts before additions. These shifts impose major time penalties for shifts not equal to multiples of 8, and modest ones for those which are, The 'k' chosen for this specific division by 1000 was picked primarily to avoid shifts as well as meeting reasonable precision requirements.

Hence this routine is NOT a general one, as picking another divisor may break the addition overflow requirement or greatly slow down the algorithm.

@devyte
Copy link
Collaborator

devyte commented Jan 29, 2018

@mrwgx3 My prior comment was literal: it's been a while since I've done pointer casting voodoo, so my voodoo eyes are rusty, hence why I found it strange and difficult to understand. The graphic in your previous comment helps, I think we should keep that as a comment close to the implementation.

Also, we should keep your explanation about how the code was derived, why this implementation was picked over the others (performance numbers, breaking at 7years, etc), and especially the criteria for picking the divisor to avoid shifts. I have no trouble imagining somebody (myself included) taking a look at this code in the future and having a hard time understanding it, so having as much detail explained in the code comments is a Good Thing.

Can you please prepare a PR with all this? I'd like to merge soon after 2.4.1 is out the door, to allow time for testing in the wild.

@mrwgx3
Copy link
Contributor Author

mrwgx3 commented Jan 31, 2018

@devyte

Can you please prepare a PR with all this? I'd like to merge soon after 2.4.1 is out the door,
 to allow time for testing in the wild.

Per your request, I've generated PR #4264 containing the corrected millis() function. I did go ahead and reduce the accumulator size, per the discussion a couple of comments back.

//---------------------------------------------------------------------------
unsigned long ICACHE_RAM_ATTR millis_test ( void )
{
  uint32_t  a[2];  // 96-bit accumulator, little endian
  a[1] = 0;        // Zero high-acc

  // Get usec system time, usec overflow counter
  uint32_t  m = system_get_time();
  uint32_t  c = us_ovflow + ((m < us_at_last_ovf) ? 1 : 0);

  // (a) Init. low-acc with high-word of 1st product. The right-shift
  //     falls on a byte boundary, hence is relatively quick.
  ((uint64_t *)(&a[0]))[0]  =
     ( (uint64_t)( m * (uint64_t)MAGIC_1E3_wLO ) >> 32 );

  ((uint64_t *)(&a[0]))[0] +=              // (b) Offset sum, low-acc
     ( m * (uint64_t)MAGIC_1E3_wHI );

  ((uint64_t *)(&a[0]))[0] +=              // (c) Offset sum, low-acc
     ( c * (uint64_t)MAGIC_1E3_wLO );
  
  ((uint32_t *)(&a[1]))[0] +=              // (d) Truncated sum, low-acc
     (uint32_t)( c * (uint64_t)MAGIC_1E3_wHI );

  return ( a[1] );  // Extract result, high-acc

} //millis_test

I was successful in installing the Arduino/esp8266 repo in my Arduino install, and tested the new millis() function using the 'BlinkWithoutDelay' example sketch.

Millis_BenchD_ino.txt

devyte pushed a commit that referenced this issue Mar 12, 2018
* Correct millis() drift, issue 3078

* Add 'test_millis_mm.ino' runtime benchmark

* Eliminate 'punning' warning

Add union 'acc' in millis() to eliminate 'punning' compiler warning

* Correct minor typo

* Eliminate 'punning' warning

 Add union 'acc' to eliminate 'punning' compiler warning  in 'millis_test'_DEBUG() and 'millis_test()'
bryceschober pushed a commit to bryceschober/Arduino that referenced this issue Apr 5, 2018
* Correct millis() drift, issue 3078

* Add 'test_millis_mm.ino' runtime benchmark

* Eliminate 'punning' warning

Add union 'acc' in millis() to eliminate 'punning' compiler warning

* Correct minor typo

* Eliminate 'punning' warning

 Add union 'acc' to eliminate 'punning' compiler warning  in 'millis_test'_DEBUG() and 'millis_test()'

(cherry picked from commit 3934d10)
@devyte devyte modified the milestones: 2.5.0, 2.4.2 Jul 6, 2018
@devyte
Copy link
Collaborator

devyte commented Jul 6, 2018

PR #4264 is already merged, pulling this into 2.4.2.

@Tonguc-Endem
Copy link

Hi, Sorry to barge in like this but I just can't find this anywhere else.
I only need to be able to reset a esp8266 microsecond timer once in a while.

with arduino uno i can do the following:

extern volatile unsigned long timer0_overflow_count;
unsigned long ticks() {return((timer0_overflow_count << 8) + TCNT0)*(64/16);}

to reset ticks(), which is practically same as micros(), I do a:
timer0_overflow_count = 0;

This way I don't have to care about the overflows.
The problem is that this only works with arudoino and I need to have this in ESP8266.

I would really appreciate if someone could help with this.

Thnx a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants