-
Notifications
You must be signed in to change notification settings - Fork 909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a lightweight solution to speed up session reload or create new session #2879
Comments
|
I recall there's some issue about |
That's more related to orchestration and it requires a way to pass a unique identifier when the run is spread to multiple KedroSession |
Arguably if we kept a private session_id and exposed a parameterisable one that would be sufficient |
Uh, we're sorting by session_id? Maybe we should store the datetime instead, but this might be a bit of a digression. |
Moving this to the Session milestone. |
Quotes
Description
As I have many development work with IPython or Jupyter, often I want to make small changes to test if it works.
%reload_kedro
could be quite slow and the developing experience is frustrating because for every change .This also potentially related to #1853, #2134, #2182
kedro ipython
take > 20s to start and%reload_kedro
takesContext
1 session = 1 run
#1329After this PR,
session
can only be run once. The easiest way to create a new session is%reload_kedro
. While%reload_kedro
works, it is considerably slow with big project for a few reasons:session
,context
,pipelines
,catalog
.What's the minimal effort to recreate session?
If we look into the code, there is a
self._run_called
attribute and everytime we dosession.run
it will check if it isTrue
.kedro/kedro/framework/session/session.py
Lines 434 to 438 in 6913acd
kedro/kedro/framework/session/session.py
Lines 366 to 371 in 6913acd
Why do we need this check? Mainly because of
session_id
need to be a unique value, otherwise it can cause error in experiment tracking (kedro-viz) because it need to be a unique id. If we simply overridesession._run_called = False
and dosession.run()
, almost everything will work.Experiment-tracking is not a core feature of kedro (but kedro-viz), is there other obivous reason that we need to protect session_id from running twice?
(edited)
It could be related to the timestamp for saving versioned data. However, it's unclear to me because
catalog
getsave_version
fromsession_id
, but there is another function that you can find in most dataset implementation.save_version = self.resolve_save_version()
Possible Implementation
Source: #1551 (comment)
Maybe implement a
session.clear()
,session.reset()
methodPossible Alternatives
reload_kedro
so the overhead is insignificant.session._run_called
checksThe text was updated successfully, but these errors were encountered: