-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingestion/looker): Add view file-path as option in view_naming_pattern config #8713
feat(ingestion/looker): Add view file-path as option in view_naming_pattern config #8713
Conversation
) | ||
n_mapping: NamingPatternMapping = self.get_mapping(config) | ||
# / is not urn friendly | ||
if n_mapping.file_path is not None: # to silent the lint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n_mapping is not used anywhere else. Can we move this file_path specific handling inside get_mapping ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separately, maybe create a function named preprocess_file_path
. Some other suggestions for this preprocessing:
- remove .lkml and/or .view.lkml extensions
- handle path for imported projects correctly (refer
test_looker.py::test_looker_ingest_external_project_view
for example of imported project tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like tests are not updated for this. Can you please update golden files ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
) -> List[LookmlModelExploreField]: | ||
return list(fields) if fields is not None else [] | ||
|
||
lkml_fields.extend(empty_list(explore.fields.dimensions)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lkml_fields.extend(empty_list(explore.fields.dimensions)) | |
lkml_fields.extend(list(explore.fields.dimensions) or []) |
same for below two statements. can remove nested function empty_list. It is already ensured earlier in code flow that explore.fields is not None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty_list will avoid empty list creation logic three places in the function.
empty_list is handling Not None for fields which are inside explore.fields (i.e. dimensions, measures, and parameters)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case, I prefer the or []
syntax over a new function
also, nesting functions like this is generally an antipattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is avoiding duplicates of or []
logic at three statement and more readable
file_path: str = ( | ||
cast(str, self.upstream_views_file_path[view_ref.include]) | ||
if self.upstream_views_file_path[view_ref.include] is not None | ||
else ViewFieldType.UNKNOWN.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we create a new unknown variable for this ? ViewFieldType.UNKNOWN feels misplaced here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -81,6 +81,7 @@ class NamingPatternMapping: | |||
project: str | |||
model: str | |||
name: str | |||
file_path: Optional[str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file_path shows up as an option in explore_naming_pattern and explore_browse_pattern as well whereas we don't need them there. See - https://docs-website-1dask8vf5-acryldata.vercel.app/docs/next/generated/ingestion/sources/looker
Probably create a subclass ViewNamingPatternMapping
with file_path:str
and remove it from parent class ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not able to see file_path in docs against view naming pattern and view_browse_path, can you please check ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is doc generation issue, need help from @hsheth2
logger.debug( | ||
f"base_folder_path({base_folder_path}) and absolute_file_path({absolute_file_path}) not matching" | ||
) | ||
return ViewFieldType.UNKNOWN.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as new unknown variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added new enum ViewFieldValue
to address this
…iquebagwan-gslab/datahub-fork into master+ing-160-incomplete-urn-issue
|
||
str_to_replace: Dict[str, str] = { | ||
f"imported_projects/{self.project_name}/": "", | ||
"/": "_", # / is not urn friendly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I personally prefer
.
over_
for readability. Also_
are common in folder names. Might lead to urn conflicts. e.g. if folder names are likea_b / a_c.view.lkml
anda / b_a / c.view.lkml
- Need accuracy check for imported_projects handling.
@hsheth2 - can you please review and confirm ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree - let's do /
-> .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
for remove in str_to_remove: | ||
new_file_path = new_file_path.rstrip(remove) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't use rstrip here. It works on individual characters and not complete string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
str_to_replace: Dict[str, str] = { | ||
f"imported_projects/{self.project_name}/": "", | ||
"/": "_", # / is not urn friendly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree - let's do /
-> .
|
||
upstream_views_file_path[view_name] = file_path | ||
|
||
logger.debug("Exit") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove these debug lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return None | ||
|
||
|
||
def create_upstream_views_file_path_map( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shouldn't do this now, but I think it will make sense to more cleanly separate the looker and lookml code paths in the future e.g. this stuff really should live in looker_source.py, not looker common
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I think it is happening because from_api is implemented in common
) -> List[LookmlModelExploreField]: | ||
return list(fields) if fields is not None else [] | ||
|
||
lkml_fields.extend(empty_list(explore.fields.dimensions)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case, I prefer the or []
syntax over a new function
also, nesting functions like this is generally an antipattern
@@ -457,6 +571,9 @@ class LookerExplore: | |||
upstream_views: Optional[ | |||
List[ProjectInclude] | |||
] = None # captures the view name(s) this explore is derived from | |||
upstream_views_file_path: Dict[str, Optional[str]] = dataclasses_field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this Optional[str]
? it looks like it should be str
since every view has a file path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please go through function create_upstream_views_file_path_map
and get_view_file_path
. The function is looking for view-name on lkml_fields
if view-name is found then return view-file-path othrewise return None
# set file_path to ViewFieldType.UNKNOWN if file_path is not available to keep backward compatibility | ||
# if we raise error on file_path equal to None then existing test-cases will fail as mock data doesn't have required attributes. | ||
file_path: str = ( | ||
cast(str, self.upstream_views_file_path[view_ref.include]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this cast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because self.upstream_views_file_path[view_ref.include] is optional, None check is there but still lint was raising error
@@ -791,12 +925,20 @@ def _to_metadata_events( # noqa: C901 | |||
upstreams = [] | |||
observed_lineage_ts = datetime.datetime.now(tz=datetime.timezone.utc) | |||
for view_ref in sorted(self.upstream_views): | |||
# set file_path to ViewFieldType.UNKNOWN if file_path is not available to keep backward compatibility | |||
# if we raise error on file_path equal to None then existing test-cases will fail as mock data doesn't have required attributes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't we update the test mock data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure whether mock data is written as per production scenario. The fields are marked as Optional in looker sdk, so to avoid any surprises in production I preferred to keep existing test as is
No description provided.