-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(telemetry): Run telemetry outside of kedro projects #775
Changes from 2 commits
2f92844
6fb1930
de3abe8
63a0c91
1b63f74
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -168,26 +168,23 @@ def before_command_run( | |||
): | ||||
"""Hook implementation to send command run data to Heap""" | ||||
|
||||
if not project_metadata: # in package mode | ||||
return | ||||
project_path = project_metadata.project_path if project_metadata else None | ||||
|
||||
self._consent = _check_for_telemetry_consent(project_metadata.project_path) | ||||
self._consent = _check_for_telemetry_consent(project_path) | ||||
if not self._consent: | ||||
self._opt_out_notification() | ||||
return | ||||
|
||||
# get KedroCLI and its structure from actual project root | ||||
cli = KedroCLI(project_path=project_metadata.project_path) | ||||
cli = KedroCLI(project_path=project_path if project_path else Path.cwd()) | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess my question is, does that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am also trying to understand why KedroCLI is dependent on a specific Kedro project. I believe the CLI structure should be the same for all projects. Do you think we can just use Path.cwd() here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I checked the Lazy Loading PR, looks like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I dig into this a bit and I think The output of When project_path is set as root correctly: I am also concerned would this revert the effort of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think Nok is right, the recursion through the CLI definitely structure triggers the loading of lazy the CLI command groups -
Which if you look at the implementation of the LazyGroup , triggers the lazy load:https://github.com/kedro-org/kedro/blob/8bbbfb6c7257005bef841143690308f05d4b829b/kedro/framework/cli/utils.py#L504-L509 I believe we need to find a way to do the masking without loading relying on KedroCLI to not defeat the purpose of introducing lazy loading |
||||
cli_struct = _get_cli_structure(cli_obj=cli, get_help=False) | ||||
masked_command_args = _mask_kedro_cli( | ||||
cli_struct=cli_struct, command_args=command_args | ||||
) | ||||
|
||||
self._user_uuid = _get_or_create_uuid() | ||||
|
||||
event_properties = _get_project_properties( | ||||
self._user_uuid, project_metadata.project_path / PYPROJECT_CONFIG_NAME | ||||
) | ||||
event_properties = _get_project_properties(self._user_uuid, project_path) | ||||
event_properties["command"] = ( | ||||
f"kedro {' '.join(masked_command_args)}" if masked_command_args else "kedro" | ||||
) | ||||
|
@@ -224,7 +221,7 @@ def after_catalog_created(self, catalog): | |||
|
||||
if not self._event_properties: | ||||
self._event_properties = _get_project_properties( | ||||
self._user_uuid, self._project_path / PYPROJECT_CONFIG_NAME | ||||
self._user_uuid, self._project_path | ||||
) | ||||
|
||||
project_properties = _format_project_statistics_data( | ||||
|
@@ -274,12 +271,16 @@ def _is_known_ci_env(known_ci_env_var_keys: set[str]): | |||
return any(os.getenv(key) for key in known_ci_env_var_keys) | ||||
|
||||
|
||||
def _get_project_properties(user_uuid: str, pyproject_path: Path) -> dict: | ||||
project_id = _get_or_create_project_id(pyproject_path) | ||||
package_name = PACKAGE_NAME or UNDEFINED_PACKAGE_NAME | ||||
hashed_project_id = ( | ||||
_hash(f"{project_id}{package_name}") if project_id is not None else None | ||||
) | ||||
def _get_project_properties(user_uuid: str, project_path: Path) -> dict: | ||||
DimedS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||
if project_path: | ||||
pyproject_path = project_path / PYPROJECT_CONFIG_NAME | ||||
Comment on lines
+266
to
+267
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of overwriting the supplied There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe it's already a new variable? |
||||
project_id = _get_or_create_project_id(pyproject_path) | ||||
package_name = PACKAGE_NAME or UNDEFINED_PACKAGE_NAME | ||||
hashed_project_id = ( | ||||
_hash(f"{project_id}{package_name}") if project_id is not None else None | ||||
) | ||||
else: | ||||
hashed_project_id = None | ||||
|
||||
properties = { | ||||
"username": user_uuid, | ||||
|
@@ -291,7 +292,8 @@ def _get_project_properties(user_uuid: str, pyproject_path: Path) -> dict: | |||
"is_ci_env": _is_known_ci_env(KNOWN_CI_ENV_VAR_KEYS), | ||||
} | ||||
|
||||
properties = _add_tool_properties(properties, pyproject_path) | ||||
if project_path: | ||||
properties = _add_tool_properties(properties, pyproject_path) | ||||
|
||||
return properties | ||||
|
||||
|
@@ -353,21 +355,23 @@ def _send_heap_event( | |||
) | ||||
|
||||
|
||||
def _check_for_telemetry_consent(project_path: Path) -> bool: | ||||
def _check_for_telemetry_consent(project_path: Path | None) -> bool: | ||||
""" | ||||
Use telemetry consent from ".telemetry" file if it exists and has a valid format. | ||||
Telemetry is considered as opt-in otherwise. | ||||
""" | ||||
telemetry_file_path = project_path / ".telemetry" | ||||
|
||||
for env_var in _SKIP_TELEMETRY_ENV_VAR_KEYS: | ||||
if os.environ.get(env_var): | ||||
return False | ||||
if telemetry_file_path.exists(): | ||||
with open(telemetry_file_path, encoding="utf-8") as telemetry_file: | ||||
telemetry = yaml.safe_load(telemetry_file) | ||||
if _is_valid_syntax(telemetry): | ||||
return telemetry["consent"] | ||||
|
||||
if project_path: | ||||
telemetry_file_path = project_path / ".telemetry" | ||||
if telemetry_file_path.exists(): | ||||
with open(telemetry_file_path, encoding="utf-8") as telemetry_file: | ||||
telemetry = yaml.safe_load(telemetry_file) | ||||
if _is_valid_syntax(telemetry): | ||||
return telemetry["consent"] | ||||
return True | ||||
|
||||
|
||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean we also track packaged mode now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that line isn't related to package mode:
#729 (comment)
As I understand it, in package mode, execution will occur without the CLI. Am I correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A packaged project will use the
__main__.py
as entrypoint. It will then look for the run method to execute.I think you're right this line here in telemetry actually isn't related to package mode.. It seems to only be reached if you're not in a kedro project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Packaged mode will execute the CLI through the entrypoint. I believe the commend about
package mode
because it was assumed that it is the only situation thatproject_metadata
will be missing. This is not exactly true now because it would be either:I believe bot situations should be handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @noklam. Could you please provide more details about packaged mode? From what I see in the code, if I package a Kedro project, it will have an entry point
project_name.__main__:main
. When I executeproject_name
in the terminal, thekedro/framework/cli/project.py:run()
function will be executed directly, without calling thebefore_command_run()
hook. This is because that hook is only called in theKedroCLI()
class during Kedro CLI execution. I'm trying to understand what we need to handle in the telemetry hooks code concerning packaging mode.