Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nwis_client iv context handler #257

Merged
merged 2 commits into from
Oct 8, 2024
Merged

Add nwis_client iv context handler #257

merged 2 commits into from
Oct 8, 2024

Conversation

JoshCu
Copy link
Contributor

@JoshCu JoshCu commented Sep 13, 2024

When using the IVDataService class to get usgs data, the rest client threads keep running after the main program has finished executing.

I managed to "fix" it by adding a context handler, but I'm not familiar enough with asyncio to make it work without the with syntax.

Test

from hydrotools.nwis_client import IVDataService
service = IVDataService()
print("getting data")
usgs_data = service.get(sites="10154200", startDT="2001-01-01", endDT="2001-01-02")
print("done, exiting")
exit(1)
# Console doesn't return until I hit Ctrl+C
getting data
done, exiting
^CException ignored in: <module 'threading' from '/usr/lib/python3.12/threading.py'>
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1622, in _shutdown
    lock.acquire()
KeyboardInterrupt: 
^C

Fix

from hydrotools.nwis_client import IVDataService
with IVDataService() as service:
    print("getting data")
    usgs_data = service.get(sites="10154200", startDT="2001-01-01", endDT="2001-01-02")
    print("done, exiting")
    exit(1)
getting data
done, exiting

Note

To clarify: This does not entirely fix the original issue, the threads will not close unless you use the with context handler. The "test" code will cause the bug with and without this change.

@jarq6c jarq6c requested a review from aaraney September 13, 2024 18:56
Copy link
Member

@aaraney aaraney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the addition, @JoshCu! This issue looks similar to #237 (comment) and requests-cache/aiohttp-client-cache#173. I want to investigate the root cause of this, but I don't think that should hold up your addition. Could you add a "slow" unit test that creates a IVDataService instance and uses the context manager? Use test_nwis_client_cache_path for inspiration. You should more or less be able to copy and paste the test and just use the context manager instead.

Thanks again for contributing!

@JoshCu
Copy link
Contributor Author

JoshCu commented Sep 13, 2024

Thanks for the links! Is asserting that the _restclient._session is closed sufficient? I tried an assertion that the event loop was closed too but it fails as the _loop is still open. Unless it's instantiating a new one by accessing the property?

@jarq6c
Copy link
Collaborator

jarq6c commented Sep 16, 2024

@aaraney
Copy link
Member

aaraney commented Sep 16, 2024

@aaraney @JoshCu Shall we update to 3.4.0 ?

https://github.com/NOAA-OWP/hydrotools/blob/main/python/nwis_client/src/hydrotools/nwis_client/_version.py

Ill take care of that, @jarq6c!

@jarq6c
Copy link
Collaborator

jarq6c commented Oct 2, 2024

@aaraney Did you want to merge then update or update then merge?

@aaraney
Copy link
Member

aaraney commented Oct 8, 2024

Sorry, i'm just now getting back to this.

I don't think the behavior you found @JoshCu is avoidable on our end (more on that in a bit). The easiest way to get around this is by wrapping your code in a function. For example:

#!/usr/bin/env python
from hydrotools.nwis_client import IVDataService

def main():
  service = IVDataService()
  print("getting data")
  usgs_data = service.get(sites="10154200", startDT="2001-01-01", endDT="2001-01-02")
  print("done, exiting")
  return usgs_data

if __name__ == "__main__":
  df = main()

The above will not hang.

Now as to why I think this is happening. Ive run into weird reference counting issues with python's asyncio package in the past in particular when writing "script" like code (so using a global scope, no function declarations, and no if __name__ == "main"). More often than not the issue is cyclic references, garbage collection, and / or finalizer semantics.

Without getting too much into the weeds, I think this is happening because the IVDataService is creating and object that both holds a reference to the asyncio event loop and a reference to a task running on the event loop. My suspicion is python's exit semantics don't guarantee that the IVDataService's reference counter is decremented in the same order as it would be if it were dropped from a function for example. This leads to the garbage collector collecting the asyncio loop before work that needs scheduling and running can be added.

We can manually "drop" our reference to IVDataService (analogous to returning from a function) by setting the variable to None. This also gets around the issue:

from hydrotools.nwis_client import IVDataService

service = IVDataService()
print("getting data")
usgs_data = service.get(sites="10154200", startDT="2001-01-01", endDT="2001-01-02")

# explicitly drop ref to IVDataService; this decrements the ref count to 0 and schedules `__del__` which calls `close()`
service = None

print("done, exiting")

@aaraney aaraney merged commit f49397f into NOAA-OWP:main Oct 8, 2024
3 checks passed
@aaraney
Copy link
Member

aaraney commented Oct 8, 2024

Thanks for adding this, @JoshCu! 🎉

@JoshCu
Copy link
Contributor Author

JoshCu commented Oct 8, 2024

What a strange edge case, thanks for the explanation! When I first encountered it I was testing a super basic functionless script. This also explains why I was having such a hard time reproducing the issue within the tests. When I was looking at it over the weekend, I thought I fixed it at least 5 times only for it not to work when I tested it outside of pytest. I've had multiprocessing module and flask also complain when there's no main function. Maybe the real solution is for me to not be lazy and actually use functions.

@aaraney
Copy link
Member

aaraney commented Oct 9, 2024

What a strange edge case, thanks for the explanation!

@JoshCu, no kidding 😅! For sure! Kudos for trying to reproduce it in pytest and other environments! TBF, it took me forever to track this down when I ran into it a few years ago haha.

I've had multiprocessing module and flask also complain when there's no main function.

Likewise, don't get me started on multiprocessing haha. So many platform and version specific edge cases.

Maybe the real solution is for me to not be lazy and actually use functions.

Hey, but if you would have done that, you never would have found and fixed this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants