Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add non-blocking mode, notifications, retry, unittests. #41

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

mMerlin
Copy link
Contributor

@mMerlin mMerlin commented Oct 6, 2024

Fixes #40 (EDIT by @dhalbert)

Overview

This pull request introduces significant enhancements to the adafruit_ntp.py module, aiming to improve its functionality, efficiency, and usability in both CircuitPython and CPython environments. The key changes include:

Non-Blocking Mode: Allows the NTP client to operate with less blocking of main application thread, essential for real-time embedded systems.
Event Notification Callbacks: Enables users to register callbacks for specific NTP events, facilitating responsive behavior and power management strategies.
Exponential Backoff for Retry Logic: Implements exponential backoff timing for retries, reducing unnecessary load on networks and NTP servers during failure conditions.
Improved Documentation and Examples: Adds comprehensive docstrings, usage examples, and unittests to aid developers and users in understanding and utilizing the new features.

Motivation

The enhancements address several limitations in the existing NTP helper class:

Blocking Operations: The original class performed blocking network operations, which could lead to prolonged interruptions in applications, especially in resource-constrained embedded systems.
Excessive Network Load on Failures: In failure scenarios (e.g., network down, server busy), the lack of retry management could cause the class to repeatedly attempt synchronization without delay, adding unnecessary load to the network and NTP servers.
Lack of Event Notifications: Users had no way to respond programmatically to specific NTP events, such as synchronization success or failure, limiting the ability to manage power consumption or network connectivity dynamically.

By introducing non-blocking operations, implementing exponential backoff, and providing event notifications, the updated class offers greater flexibility, efficiency, and control to developers.

Detailed Changes

  1. Non-Blocking Mode
  • Implementation: Added a blocking parameter to the constructor, defaulting to True for backward compatibility. When set to False, NTP operations proceed in a non-blocking manner.
  • Behavior: In non-blocking mode, the class performs minimal (time consuming) operations per time request, returning control to the main application promptly. It maintains internal state to continue synchronization over subsequent calls. This is only 'less' blocking. A single operation can still take over 10 seconds to complete.
  1. Event Notification Callbacks
  • Implementation: Introduced the register_ntp_event_callback method, allowing users to register callbacks for specific events (SYNC_COMPLETE, SYNC_FAILED, LOOKUP_FAILED).
  • Usage: Callbacks can be used to manage power (e.g., turning off the radio after synchronization) or to initiate network connections upon failures.
  • Example:
from adafruit_ntp import NTP, EventType

ntp = NTP(socketpool, blocking=False)

def on_sync_complete(event_type, next_time):
    print(f"Synchronization complete. Next sync at {next_time}")
    wifi.radio.enabled = False
    # Set status (state) flag that allows main application to know this has happened.

ntp.register_ntp_event_callback(on_sync_complete, EventType.SYNC_COMPLETE)
  1. Exponential Backoff for Retry Logic
  • Implementation: Added internal methods _initialize_backoff_timing, _get_socket, and _server_dns_lookup to manage retries with exponential backoff.
    Behavior: On failures (e.g., DNS lookup failure), the class schedules the next retry based on an exponential backoff algorithm, reducing the load on networks and servers. Subsequent time request calls, prior to the end of that delay period, will return immediately.
  • Configuration: The backoff timing parameters (FIRST_WAIT, MAX_WAIT, WAIT_FACTOR) are configurable constants within the class.
  1. Improved Documentation and Examples
  • Docstrings: Updated all classes and methods with comprehensive docstrings, including parameters, return values, exceptions, and usage examples.
  • Unittests: Included extensive unittests covering various scenarios, ensuring the reliability and correctness of the new features. Much of the unittest code is 'brittle', in that it relies on knowledge of the internal state of the NTP class. The unittest code is written to run in both CPython and CircuitPython (with https://github.com/mytechnotalent/CircuitPython_Unittest)
    • Once the needed files are copied to the CIRCUITPY drive, executing the following sequence of commands in repl runs the tests:
import unittest; unittest.main('tests.test_eventtype')
import unittest; unittest.main('tests.test_mock')
import unittest; unittest.main('tests.test_ntp_instantiation')
import unittest; unittest.main('tests.test_ntp_registration')
import unittest; unittest.main('tests.test_ntp')
  • Examples: Added an example demonstrating the use of non-blocking mode and event notifications, providing practical guidance to users.

Specific Questions for Reviewers

Exponential Backoff Timing Configuration

The exponential backoff timing parameters are currently set as:

FIRST_WAIT = 3 * NS_PER_SEC (3 seconds)
MAX_WAIT = 60 * NS_PER_SEC (60 seconds)
WAIT_FACTOR = 1.5

Question: Are these values appropriate for typical network conditions and NTP server responsiveness, or should they be adjusted?

  • Context: The initial wait time and growth factor determine how aggressively the client retries synchronization after failures. Balancing these values is crucial to minimize unnecessary load while ensuring timely synchronization.
  • Request: We seek input from maintainers and reviewers with networking expertise to recommend optimal values or strategies for these parameters.

Requirements

Is there a standard way to specify requirements that only apply to the testing and development environment, that are CircuitPython specific modules? To run unittesting, in either environment, requires the adafruit_logging module. To run unittest in the CircuitPython environment requires the unittest module. The requirements.txt files are only for CPython dependencies.

Additional Notes

  • Time Zone Handling: The module accounts for the lack of time zone awareness in CircuitPython by applying a tz_offset. In CPython, it uses gmtime to ensure consistent UTC time. This is a change when using CPython, but aligns better with the CircuitPython functionality. Now the tz_offset parameter works the same in either environment.
  • Compatibility: The updated module maintains compatibility with existing applications by defaulting to blocking mode and preserving method signatures. Blocking mode will initially act the same as the current code, but will be different when the cached offset expires. During subsequent synchronization attemts, this version will silently continue to use the previous value on network operation failures.
  • Custom _IntFlag Class: Due to the absence of the enum module in CircuitPython, a custom _IntFlag class is used internally for event types. This class is tightly coupled with the EventType class and is intended for internal use.
  • Local testing has been done in venv with python 3.11.9, and CircuitPython 9.1.3 on 2024-08-29; Adafruit-Qualia-S3-RGB666 with ESP32S3

Conclusion

These enhancements aim to provide developers with greater control over NTP synchronization, improve efficiency, and reduce unnecessary network load. We believe these changes will significantly benefit users working with time synchronization in embedded systems.

We appreciate your time in reviewing this pull request and welcome any feedback or suggestions, especially regarding the exponential backoff timing configuration.

@mMerlin
Copy link
Contributor Author

mMerlin commented Oct 6, 2024

My question about specifying circuitpython module requirements just became 'important'. The CI run is failing because adafruit_logging is not being found. How to get it to be loaded?
E ModuleNotFoundError: No module named 'adafruit_logging'

@mMerlin
Copy link
Contributor Author

mMerlin commented Oct 6, 2024

From message I got from the CI run, something else needs updating in the CI actions:

The following actions use a deprecated Node.js version and will be forced to run on node20: adafruit/circuitpython-action-library-ci-failed@v1. For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/

Also reported, and may or may not be related:

Run adafruit/circuitpython-action-library-ci-failed@v1
with:
token: ***
Error: Cannot read properties of undefined (reading 'number')

@FoamyGuy
Copy link
Contributor

FoamyGuy commented Oct 6, 2024

E ModuleNotFoundError: No module named 'adafruit_logging'

@mMerlin I think if you add adafruit-circuitpython-logging into the optional_requirements.txt file in the root of the repo it should install the library inside of the actions container.

@mMerlin
Copy link
Contributor Author

mMerlin commented Oct 6, 2024

That worked. I did not expect the cpython based requirements files to work for circuitpython libraries. More accurately, I did not expect pip install to work for circuitpython modules.

After a few more iterations, the CI run is clean. That process found some things to add to my 'pre' testing in the local environment. Time for a review, to see what else I didn't notice was needed. And if these enhancements fit with CircuitPyton goals. It adds functionality, but is bigger. I minimized breaking changes, but there are going to be some run time differences, even when using the default (compatibility) settings.

@dhalbert
Copy link
Contributor

dhalbert commented Oct 6, 2024

That worked. I did not expect the cpython based requirements files to work for circuitpython libraries. More accurately, I did not expect pip install to work for circuitpython modules.

The pip install is to run type-checking and other checks on the code -- it's run in a blinka environment, not a naative circuitpython environment.

@mMerlin
Copy link
Contributor Author

mMerlin commented Oct 7, 2024

Understood. I just did not expect pip install to find circuitpython modules. That must be in pypi as well. I know about using circup to install for circuitpython which is AFAIK using a totally different source. I also note that adafruit_logging will technically work in cpython. I did some testing previously for something else. But for that, I put it in the search path manually, not using pip install.

@dhalbert
Copy link
Contributor

dhalbert commented Oct 7, 2024

Yes, nearly all CircuitPython libraries are in pypi. This is so they can be used with Blinka, and also for the checking reasons I mentioend above.

Copy link
Contributor

@dhalbert dhalbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

Re the non-blocking capability: are there other network libraries that are non-blocking, and are they in the same style? Do they often use asyncio?

Exponential back-off: would it make sense to factor this out into a separate library (or add to, say, ConnectionManager) that could be used for other network functionality? I have not looked at it enough to judge.

.pylintrc Outdated Show resolved Hide resolved
.pylintrc Outdated
@@ -159,7 +159,7 @@ ignored-classes=optparse.Values,thread._local,_thread._local
# (useful for modules/projects where namespaces are manipulated during runtime
# and thus existing member attributes cannot be deduced by static analysis. It
# supports qualified module names, as well as Unix pattern matching.
ignored-modules=board
ignored-modules=board,socketpool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding socketpool should also not be necessary

Suggested change
ignored-modules=board,socketpool
ignored-modules=board

adafruit_ntp.py Outdated Show resolved Hide resolved
@mMerlin
Copy link
Contributor Author

mMerlin commented Oct 7, 2024

As mentioned, my background is not networking. I do not know of other librararies capabilities, styles, underlying technology. I started this work because I had another project that needed the capabilities. That rabbit hole lead to enhancing the existing library instead of implementing the needed functionality directly into my own project.

Using asyncio looks (conceptually) like a good solution. However, my understanding is that would require the calling application to be asyncio aware as well. That would limit where the NTP class could be used. If that understanding is not correct, then an asyncio compatible network stack would be needed. Outside of my background. Remembering that the (a) specific goal for the NTP helper, is to return a value (or abort) as quickly as possible (in non-blocking mode). So that it can safely be used in an event loop that is doing time sensitive operations.

I do not believe that factoring out exponential backoff is practical. That is tied to failures in the underlying network operations. The NTP class is instatiated with a socket«pool», but the source of the connection is not know internally to the NTP class. All NTP can do is respond to failures on operations that use that socketpool and socket. The user of NTP can use connection manager to provide the socket, but that still does not help with failures using the socket. Having another wrapper around the network operations, so that for example the DNS lookup did wait and retry would defeat the non-blocking intent. To provided non-blocking operations, NTP 'must' deal with the failures itself. That said, if the getaddrinfo, socket.sendto, socket.recv_into were guaranteed non-blocking, and to have built in backoff logic, NTP could just ignore the issue. That would add additional contraints on what socketpool could be used to create an NTP instance. It would also make getting a valid time reference more complex. The existing logic uses absolute time references before and after the sendto and recv_into calls combined with the returned NTP information to calculate the offset. If retries were happening invisibly, the logic would break.

init-hook: Correct. the initial purpose was to get cpython unittests to find the circuitpython library in the lib folder. With a duplicate of the library installed from pypi, that is no longer needed. cpython does not need access to the circuitpython unittest.py module.

ignore-modules: What is the purpose of having entries here? It would seem that 'board' is not needed either. Initially I added socketpool to reduce complaints from static checking in the IDE (vscode). Once I started doing the dual cpython and circuitpython imports, plus the SocketProtocol class, the complaints went away. Pylance still complains though in the example code, Import "«lib»" could not be resolved Pylance(reportMissingImports). That complaint is for rtc, socketpool, wifi and adafruit_wiznet5k.adafruit_wiznet5k. Those warning can simply be ignored, but is that the correct answer? We 'know' that the warnings are bogus, because those are standard circuitpython libraries. Adding the libraries to ignored-modules just tells pylance that we know. Unless those libraries also exist in pypi, and should be added to the requirements.

_IntFlag: That was another rabbit hole. I was annoyed by the lack of enum that would make the event type handling so much cleaner, and wanted to see how close I could get with a custom implementation. Pretty close, but as noted, the benefit is minimal for the extra code footprint. Removing it though is going to take a bit of time. The unittesting code references it, and will need updating too. Plus typehints. Working on it.

@mMerlin mMerlin requested a review from dhalbert October 8, 2024 02:15
@mMerlin
Copy link
Contributor Author

mMerlin commented Oct 8, 2024

I removed board from ignore-modules as well. That only suppresses messages about unrecognized attributes of the module, not the module itself. I added type:ignore comments to suppress the complaints about not being able to load the modules.

@dhalbert
Copy link
Contributor

I read your issue #40, which I hadn't read and didn't realize motivated this PR.

I am trying to understand the initial issues you had with request timeouts, which provoked your additions to the library. My impression is that at least some NTP servers will discard requests that come in too frequently. There are minimum intervals between requests (I did a websearch for ntp retry intervals).

Were your initial retries occurring too frequently, and the NTP server(s) you contacted were dropping your requests? Could this be fixed simply by respecting some longer intervals in the user code, and not necessarily in the library? In other words, could we provide documentation guidance for using this library that would help?

I looked at the general-purpose NTP modules in pypi.org, and did not find backoff mechanisms in the code for those. So that makes me think that backoff could be moved to user code or another library.

You also added non-blocking requests, and added the callback mechanism to handle that. In the long run, we would like to provide some general mechanism for handling non-blocking network requests, maybe inspired by what is commonly done in CPython.

We have generally tried to keep the libraries small, since they run on memory-limited machines. Is there a minimal set of functionality that you think should be added to the library to make it easier to use but still small?

I think it would be fine to have a "Fancy NTP" library available, but I also am worried making this library a lot bigger for functionality that might not be used for the most common use case.

@dhalbert
Copy link
Contributor

I added type:ignore comments to suppress the complaints about not being able to load the modules.

We haven't had to do type: ignore for other libraries. I'm assuming you looked at other CircuitPython libraries and took guidance from which modules they ignore, which they require, how they do imports for type-checking, etc. Would the problems you ran into without type: ignore also show up in other libraries?

@mMerlin
Copy link
Contributor Author

mMerlin commented Oct 22, 2024

The initial timeout issues were with the DNS lookup (getaddrinfo) calls. Probably made worse by weak WiFi signal at my desk. Not with the ntp server calls. The initial library and app mostly worked once an address was acquired. The mostly likely from poor wifi. The AP I was using was just about at the limit of usable range.

The existing library works perfectly fine on the 'happy path'. Except maybe for its default choice of polling intervals (see below). As long as the network is perfect, so no network errors occur. Temporary lack of WiFi, DNS lookup failures (or delays), NTP Server response failures break it. To handle failures needs a wrapper around the time request calls to handle recovery, which needs some knowledge of recovery paths for different cases. I moved as much of that as I could into the library. Once it gets the first time synchronization, it will no longer raise exceptions that the calling app needs to deal with. If the network is good when it is first called, the 'simpletest' path will continue to work.

For ntp retry intervals, if there are better values, the cache interval in the NTP library should be set (defaulted) accordingly. I did some reading as well. The existing code is using the polling interval for the cache timeout. Once it gets a successful synchronization. Before that, it will just keep making requests one after the other. Neither case seems appropriate.
The comment in the existing code says that a cache_seconds value of zero will respect the NTP servers minimum. However, that 'minimum' is taken from the NTP packet poll field. My reading says that is not a minimum or recommended value. It is a representation of the past polling intervals. A 'history' value.
One of the changes in the new code is to enforce an initial minimum interval of 1 minute.

The backoff is for general network operations, not specifically NTP. My initial (triggering) case was around DNS lookups. If you found some numbers with that websearch, the existing default cache_seconds=0, or my MINIMUM_RETRY_DELAY=60 should be adjusted. Part of the PR was asking for better numbers to use for the backoff pattern as well. That level of network operations is not in my background.

I looked at moving backoff somewhere else, but could not see a scenario where doing that would keep a simple interface for the NTP library user. The NTP library requires a socket or socket pool to function. To keep the NTP usage simple, the backoff logic would need to be implemented in a way that did not break NTP time request calls. That would seem to need at least 3 separate implementations. For CPython, wifi.radio, and wiznet. To be transparent to NTP, the backoff logic would need to be in the socket. A wrapper for the socket that is customized to respond with appropriate error status conditions during the backoff intervals. I suspect those wrappers would be more complex than the implementation I used, because they would not have the context information that the NTP library code is using. Even that would need additional intelligence in the NTP library code, to return reasonable time values to the user, if a fallback time can be calculated.

The non-blocking implemented here is a first step. It is not truly non-blocking. More reduced blocking. It can still block until any specified network timeout value expires. Fully non blocking network operations would be much preferred. Until then …

For 'most common use cases', I was aiming at users that do not know a lot about network operations, but want to be able to use a reasonably accurate time value. Without worrying about non blocking mode, a simple try-except block in a loop to get the first synchronization is all that should be needed. After that, no NTP specific error handling is needed. The notifications and non blocking options allow users with additional requirements deal with state outside of the normal application flow. Using helper methods.

'Minimal set of functionality' depends on what the scope of the helper library should be. The whole functionality is wrapped around 3 network calls. DNS lookup, send NTP request, get NTP response. With a bit of math to calculate the time offset. For 'happy path', that is easy to implement directly in an application. If the details for those calls is known. It is appropriate handling of the exception cases that is needed, in a way that lets the application continue to run, and not pound on the network or servers when there is a problem.

  • use existing information to provide a time value even if a network error occurs.
  • limit network requests even if time requests keep comming after an error.
  • provide a method for the app to discover there was a network problem that was bypassed.
  • better default or source for cache timeout intervals.

The type:ignore is not 'needed'. It was included to prevent complaints (from pylance) when working with CPython. Or more specifically, in the IDE that is using CPython for the static checks. In my case vscode. It might be possible to suppress those a different way, by adding a requirement to the project that pulls in the empty templates for the modules. Not blinka. circuitpython-stubs. If the CI, pre-commit, etc do not see those errors, the ignore comments could be removed, and the warnings ignored in the local environment. They just iritate me when vscode keeps showing them. The comment lets me tell vscode one time that I know about it (and don't care). The type:ignore entries are 'useful', not 'needed' when editing any code that imports circuitpython libraries that are not visible to the IDE. installing circuitpython-stubs in the virtual envirionment may cover the cases, or including the circuitypthon libraries in the requirements. I'm fairly new to circuitpython environments. I don't have enough knowlege there to know exactly what works or is appropriate. Neither of those alternatives were in the requirements for the existing NTP library, so I used type:ignore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

timeout and retry logic
3 participants