Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFE] support chrony or support NTPD as default instead of sntpd for AWS ami's #1340

Closed
shankar-vng opened this issue Feb 5, 2024 · 9 comments
Labels
kind/feature A feature request

Comments

@shankar-vng
Copy link

shankar-vng commented Feb 5, 2024

Current situation

Flat car Ami released for AWS cloud by default use SNTP as the time server instead of chrony or NTP which resolve upto multiple ms accuracy.

We checked 2 instance & noticed offset of about < 250 ms we did not notice any use SNTP config, atleast based on the OS config. The problem we noticed was with the interface in the path having resolution until Seconds but not in ms with SNTP

Flat car OS uses Systemd-timesyncd & i’m unable find any flag or config which can remove the offset to ms accuracy btw nodes. We could not find any way to set the time Precision with SNTP but in any case, the OS must resolve time to ms accuracy by default

$ timedatectl show-timesync --all
LinkNTPServers=
SystemNTPServers=
RuntimeNTPServers=
FallbackNTPServers=0.flatcar.pool.ntp.org 1.flatcar.pool.ntp.org 2.flatcar.pool.ntp.org 3.flatcar.pool.ntp.org
ServerName=0.flatcar.pool.ntp.org
ServerAddress=167.172.70.21
RootDistanceMaxUSec=5s
PollIntervalMinUSec=32s
PollIntervalMaxUSec=34min 8s
PollIntervalUSec=4min 16s
NTPMessage={ Leap=0, Version=4, Mode=4, Stratum=2, Precision=-23, RootDelay=1.296ms, RootDispersion=47.546ms, Reference=6D31CFAE, OriginateTimestamp=Thu 2024-02-01 11:23:41 UTC, ReceiveTimestamp=Thu 2024-02-01 11:23:41 UTC, TransmitTimestamp=Thu 2024-02-01 11:23:41 UTC, DestinationTimestamp=Thu 2024-02-01 11:23:41 UTC, Ignored=no, PacketCount=3, Jitter=20.034ms }
Frequency=-12022283

Impact

Machine time offset varies btw < 250 ms

Ideal future situation

Support chrony or enable NTPD by default in AWS ami to resolve the accuracy issue

Additional information

Addition github issues reported & references

@shankar-vng shankar-vng added the kind/feature A feature request label Feb 5, 2024
@shankar-vng shankar-vng changed the title [RFE] support chrony as default instead of sntpd for AWS ami's [RFE] support chrony or support NTPD as default instead of sntpd for AWS ami's Feb 5, 2024
@jepio
Copy link
Member

jepio commented Feb 5, 2024

hi @shankar-vng - this seems weird.

how did you determine that the instance clocks are off by 250ms?

have you checked if the situation is better when using ntpd? if so, please consider opening an issue with https://github.com/systemd/systemd because that may be an upstream issue.

@jepio
Copy link
Member

jepio commented Feb 5, 2024

can you paste timedatectl timesync-status from both instances?

@pothos
Copy link
Member

pothos commented Feb 6, 2024

We had a similar topic with Azure where we documented how to use chrony through docker:
https://www.flatcar.org/docs/latest/installing/cloud/azure/#use-the-azure-hyper-v-host-for-time-synchronisation-instead-of-ntp

@shankar-vng
Copy link
Author

shankar-vng commented Feb 7, 2024

@jepio Thank for your response. Reply in-line

  • how did you determine that the instance clocks are off by 250ms?

Our container logs running on different machine had timestamp difference of south of or < 200m (not always 200ms). The offset varies based on resolution & DNS. Here is the requested status.

Machine 1

 timedatectl timesync-status
       Server: 167.71.195.165 (0.flatcar.pool.ntp.org)
Poll interval: 34min 8s (min: 32s; max 34min 8s)
         Leap: normal
      Version: 4
      Stratum: 3
    Reference: 907EF2B0
    Precision: 1us (-24)
Root distance: 26.923ms (max: 5s)
       Offset: -732us
        Delay: 1.468ms
       Jitter: 1.238ms
 Packet count: 243
    Frequency: +6.202ppm
______________________________________
Machine 2
$ timedatectl timesync-status
       Server: 47.241.41.246 (0.flatcar.pool.ntp.org)
Poll interval: 34min 8s (min: 32s; max 34min 8s)
         Leap: normal
      Version: 4
      Stratum: 2
    Reference: 64643D58
    Precision: 1us (-24)
Root distance: 63.178ms (max: 5s)
       Offset: +1.901ms
        Delay: 2.661ms
       Jitter: 1.852ms
 Packet count: 243
    Frequency: +7.312ppm
_______________________________________
Machine 3

       Server: 172.104.44.120 (0.flatcar.pool.ntp.org)
Poll interval: 34min 8s (min: 32s; max 34min 8s)
         Leap: normal
      Version: 4
      Stratum: 2
    Reference: 768F1153
    Precision: 1us (-25)
Root distance: 38.428ms (max: 5s)
       Offset: +528us
        Delay: 1.391ms
       Jitter: 609us
 Packet count: 243
    Frequency: +7.618ppm
_______________________________________
Machine 4
$ timedatectl timesync-status
       Server: 106.10.186.200 (0.flatcar.pool.ntp.org)
Poll interval: 34min 8s (min: 32s; max 34min 8s)
         Leap: normal
      Version: 4
      Stratum: 2
    Reference: 6A0A9885
    Precision: 1us (-25)
Root distance: 221us (max: 5s)
       Offset: -627us
        Delay: 1.884ms
       Jitter: 1.540ms
 Packet count: 243
    Frequency: +20.541ppm

I understand that this is a systemD issue but iwhen it comes to ami's for cloud, then it is a wise option to use some of the cloud provider defaults used in ami's

@jepio
Copy link
Member

jepio commented Feb 7, 2024

I see the issue now: systemd-timesyncd only syncs with a single ntp server, and it implements SNTP not NTP. From man systemd-timesyncd:

       The systemd-timesyncd service implements SNTP only. This
       minimalistic service will step the system clock for large offsets
       or slowly adjust it for smaller deltas. Complex use cases that
       require full NTP support (and where SNTP is not sufficient) are
       not covered by systemd-timesyncd.

@pothos how about we rethink the default configuration to use? We might even want to add chrony to azure OEM for ptp and switch on AWS sync to the local NTP/PTP source https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html#ptp-hardware-clock-requirements.

@shankar-vng
Copy link
Author

@jepio please let me know if I can help in anyway in pushing the changes to Aws Ami's before end of Q1. Kindly point me to the relevant documentation 🙏

@jepio
Copy link
Member

jepio commented Feb 14, 2024

It's a matter of figuring out how to implement the change in the AWS OEM sysext without disrupting other platforms. To start you would need to build your own images for testing: https://www.flatcar.org/docs/latest/reference/developer-guides/sdk-modifying-flatcar/.

I can't promise that anyone will have time to look at this in Q1, we're all busy with other issues.

@jepio
Copy link
Member

jepio commented Mar 28, 2024

We merged flatcar/scripts#1792 which implements this change for GCP/AWS/Azure. This will be released in the alpha channel in april.

@dongsupark dongsupark moved this from 📝 Needs Triage to ✅ Testing / in Review in Flatcar tactical, release planning, and roadmap Mar 28, 2024
@sayanchowdhury sayanchowdhury moved this from ✅ Testing / in Review to Implemented in Flatcar tactical, release planning, and roadmap Jul 10, 2024
@jepio
Copy link
Member

jepio commented Aug 8, 2024

@shankar-vng this reached stable just now (3975.2.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature A feature request
Projects
None yet
Development

No branches or pull requests

3 participants