-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(telemetry): Run telemetry outside of kedro projects #775
feat(telemetry): Run telemetry outside of kedro projects #775
Conversation
Signed-off-by: Dmitry Sorokin <[email protected]>
…nds-outside-kedro-project-folder
@@ -168,26 +168,23 @@ def before_command_run( | |||
): | |||
"""Hook implementation to send command run data to Heap""" | |||
|
|||
if not project_metadata: # in package mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean we also track packaged mode now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that line isn't related to package mode:
#729 (comment)
As I understand it, in package mode, execution will occur without the CLI. Am I correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A packaged project will use the __main__.py
as entrypoint. It will then look for the run method to execute.
I think you're right this line here in telemetry actually isn't related to package mode.. It seems to only be reached if you're not in a kedro project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Packaged mode will execute the CLI through the entrypoint. I believe the commend about package mode
because it was assumed that it is the only situation that project_metadata
will be missing. This is not exactly true now because it would be either:
- packaged mode
- kedro command outside of project (i.e. kedro new)
I believe bot situations should be handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @noklam. Could you please provide more details about packaged mode? From what I see in the code, if I package a Kedro project, it will have an entry point project_name.__main__:main
. When I execute project_name
in the terminal, the kedro/framework/cli/project.py:run()
function will be executed directly, without calling the before_command_run()
hook. This is because that hook is only called in the KedroCLI()
class during Kedro CLI execution. I'm trying to understand what we need to handle in the telemetry hooks code concerning packaging mode.
if not self._consent: | ||
self._opt_out_notification() | ||
return | ||
|
||
# get KedroCLI and its structure from actual project root | ||
cli = KedroCLI(project_path=project_metadata.project_path) | ||
cli = KedroCLI(project_path=project_path if project_path else Path.cwd()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my question is, does that project_path
here actually matters? Where is it used as the command_args
are coming from hook and KedroCLI
which was created in kedro
framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also trying to understand why KedroCLI is dependent on a specific Kedro project. I believe the CLI structure should be the same for all projects. Do you think we can just use Path.cwd() here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the project_path
matters but we should add a comment to explain this. I have questions about the _recurse_cli
used by the function, does it takes care of the LazyCommand we added lately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have questions about the
_recurse_cli
used by the function, does it takes care of the LazyCommand we added lately?
I checked the Lazy Loading PR, looks like _recurse_cli()
func should work as previously. Also I tested current PR manually with latest kedro main branch on fresh virtual env, everything works well from the 1st command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dig into this a bit and I think project_path
is not an arbitrary value here and it matters.
The output of recurse_cli
change when I provide different value of project_path
(mlflow etc is missing), they are not important here but I am not sure if this will trigger the imports.
When project_path is set as root correctly:
I am also concerned would this revert the effort of LazyCommand
since this seems to try to load the CLI group (hopefully not the dependencies). How could we test this properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Nok is right, the recursion through the CLI definitely structure triggers the loading of lazy the CLI command groups -
It uses get_command()
:
cli_element.get_command(ctx, command_name), |
Which if you look at the implementation of the
LazyGroup
, triggers the lazy load:https://github.com/kedro-org/kedro/blob/8bbbfb6c7257005bef841143690308f05d4b829b/kedro/framework/cli/utils.py#L504-L509
I believe we need to find a way to do the masking without loading relying on KedroCLI to not defeat the purpose of introducing lazy loading
if project_path: | ||
pyproject_path = project_path / PYPROJECT_CONFIG_NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of overwriting the supplied project_path
here I'd just create a new variable to make it easier to follow what happens and debug if something is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's already a new variable?
Signed-off-by: Dmitry Sorokin <[email protected]>
…nds-outside-kedro-project-folder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with notes. #775 (comment), upon some investigation, I think there are issues to follow up:
project_path
is not arbitary as it affects the result of recursive call- The recursive call undo the effort of LazyCommand as it eagerly import all the commands.
I will open a new issue and I suggest we fix both problem together (1. may not needed if we end up throwing away the recursive call)
Thanks for your review, @ankatiyar and @noklam. I agree with you; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…nds-outside-kedro-project-folder
* Run telemetry outside of kedro projects --------- Signed-off-by: Dmitry Sorokin <[email protected]> Co-authored-by: Nok Lam Chan <[email protected]> Signed-off-by: Merel Theisen <[email protected]>
Description
Closes #729,
Telemetry will be executed if a CLI command is run outside of a Kedro project (including the
kedro new
command). In this case, the same information will be sent, except for the project-related data.Checklist
RELEASE.md
file