-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix ystore start flow #40
Conversation
Nice find @jzhang20133! I think this removes the wrong line though. I think we should remove the initialization that happens in the |
In the test_version method, we want to be able to restart ystore after its db is closed. Hence we are creating a new Event() at bottom of stop method so when it is restarted, db can be initialized again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the YStore should be started before any read/write operation is done, this is not an issue.
I would do the opposite though: remove the event creation in __init__
and keep it in __start__
. This way any attempt at reading/writing before starting the YStore will fail.
@davidbrochart This PR is in response to another issue we are seeing where documents never open (i.e. spinning wheel forever). We haven't been able to reproduce with vanilla JupyterLab + jupyter-collaboration + pycrdt-websocket yet; though, admittedly we haven't had a lot of time to try to create an example in the vanilla case. The only thing that is different on our end is we do some authentication that could take 1-2 seconds before connecting to a room. I think we're seeing this latency cause some issues with the db_initialization. |
Would you mind opening an issue, that this PR references? Otherwise it gets difficult understanding what this PR is supposed to fix. |
@davidbrochart I have updated the PR to follow your suggestions to remove self.db_initialized in init method. Would you like to review it one more time? cc @Zsailer |
With current implementation, the room initialization method in YDocWebsocketHandler which load contents from ystore will fail directly instead of waiting for ystore to start. Due to that failure, websocket is teared down and ystore got stopped and there is no time to initialize db and file can't be opened.
We would need to start ystore and then await for db_initialized in prepare method to get out of this race condition. |
Have you verified that this works? |
After adding a line |
I also have to change ystore to await |
pycrdt_websocket/ystore.py
Outdated
self.db_initialized = Event() | ||
if from_context_manager: | ||
assert self._task_group is not None | ||
self._task_group.start_soon(self._init_db) | ||
await self._init_db() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jzhang20133 is there a specific reason you removed the start_soon
here? Was this causing a race condition?
I ask because I think keeping this as part of the top level task_group is the "proper" way to structure concurrency here in an AnyIO world. Scheduling one-off tasks like this outside of the task_group might cause some issues, though I'm not experienced enough in AnyIO to know for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._task_group.start_soon(self._init_db)
will schedule this self._init_db task to run later in task group. And self._task_group.start_soon(self._init_db)
will not wait for _init_db to run and once task is scheduled, it moves to next line and finish this method. For caller of ystore, when they call ystore.start(), after start() finish, it only make sure self._init_db is scheduled and there is no guarantee that it has been run. Hence I changes it to wait here so if caller also await ystore.start(), it can guarantee that self._init_db has been run and finished and after that, when other method of ystore is called, they don't throw runtimeError due to ystore is not running. In DocumentRoom, when loading contents from file or ystore, it calls ystore.apply_updates without knowing whether ystore is db_initialized or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then wouldn't self.db_initialized
not be set, so any read/write calls would wait?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did a test and it looks like there is no need to wait self._init_db here. As long as we await ystore.start call right after the ystore initialization in jupyter-collaboration and make sure self.db_initialized = Event() is called before any other methods of ystore are called, then we are good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for ystore class, after initialization, we also need to call start first before calling any other methods otherwise other methods will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the PR accordingly now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @jzhang20133
added an open issue here #41 cc @davidbrochart @Zsailer |
Looks good to me! Thanks @jzhang20133 for the great work! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jzhang20133 Could you send a follow-up PR with my suggested changes?
@@ -361,7 +359,7 @@ async def start( | |||
|
|||
async def stop(self) -> None: | |||
"""Stop the store.""" | |||
if self.db_initialized.is_set(): | |||
if hasattr(self, "db_initialized") and self.db_initialized.is_set(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that, db_initialized
should be set to None
in __init__
and you should check for None
here instead of looking up for an attribute on the class instance:
if self.db_initialized is not None and self.db_initialized.is_set():
@@ -415,6 +413,8 @@ async def read(self) -> AsyncIterator[tuple[bytes, bytes, float]]: | |||
Returns: | |||
A tuple of (update, metadata, timestamp) for each update. | |||
""" | |||
if not hasattr(self, "db_initialized"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self.db_initialized is None:
@@ -438,6 +438,8 @@ async def write(self, data: bytes) -> None: | |||
Arguments: | |||
data: The update to store. | |||
""" | |||
if not hasattr(self, "db_initialized"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self.db_initialized is None:
I opened #42. |
This line
self.db_initialized = Event()
in start() method replace self.db_initialized with a new Event() and when there is another call to ystore.read() or ystore.write() ahead of db initialization, this another call could be wait for the old self.db_initialized Event (created in initialize method) which is never set. In UI, file access is blocked and the loading sign keeps spinning forever.Address this open issue:
#41