Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingestion/looker): Add view file-path as option in view_naming_pattern config #8713

Conversation

siddiquebagwan-gslab
Copy link
Contributor

No description provided.

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Aug 24, 2023
)
n_mapping: NamingPatternMapping = self.get_mapping(config)
# / is not urn friendly
if n_mapping.file_path is not None: # to silent the lint
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n_mapping is not used anywhere else. Can we move this file_path specific handling inside get_mapping ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separately, maybe create a function named preprocess_file_path. Some other suggestions for this preprocessing:

  1. remove .lkml and/or .view.lkml extensions
  2. handle path for imported projects correctly (refer test_looker.py::test_looker_ingest_external_project_view for example of imported project tests)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like tests are not updated for this. Can you please update golden files ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

) -> List[LookmlModelExploreField]:
return list(fields) if fields is not None else []

lkml_fields.extend(empty_list(explore.fields.dimensions))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lkml_fields.extend(empty_list(explore.fields.dimensions))
lkml_fields.extend(list(explore.fields.dimensions) or [])

same for below two statements. can remove nested function empty_list. It is already ensured earlier in code flow that explore.fields is not None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty_list will avoid empty list creation logic three places in the function.
empty_list is handling Not None for fields which are inside explore.fields (i.e. dimensions, measures, and parameters)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, okay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case, I prefer the or [] syntax over a new function

also, nesting functions like this is generally an antipattern

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is avoiding duplicates of or [] logic at three statement and more readable

file_path: str = (
cast(str, self.upstream_views_file_path[view_ref.include])
if self.upstream_views_file_path[view_ref.include] is not None
else ViewFieldType.UNKNOWN.value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we create a new unknown variable for this ? ViewFieldType.UNKNOWN feels misplaced here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -81,6 +81,7 @@ class NamingPatternMapping:
project: str
model: str
name: str
file_path: Optional[str]
Copy link
Collaborator

@mayurinehate mayurinehate Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file_path shows up as an option in explore_naming_pattern and explore_browse_pattern as well whereas we don't need them there. See - https://docs-website-1dask8vf5-acryldata.vercel.app/docs/next/generated/ingestion/sources/looker

Probably create a subclass ViewNamingPatternMapping with file_path:str and remove it from parent class ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not able to see file_path in docs against view naming pattern and view_browse_path, can you please check ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is doc generation issue, need help from @hsheth2

logger.debug(
f"base_folder_path({base_folder_path}) and absolute_file_path({absolute_file_path}) not matching"
)
return ViewFieldType.UNKNOWN.value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as new unknown variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added new enum ViewFieldValue to address this


str_to_replace: Dict[str, str] = {
f"imported_projects/{self.project_name}/": "",
"/": "_", # / is not urn friendly
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I personally prefer . over _ for readability. Also _ are common in folder names. Might lead to urn conflicts. e.g. if folder names are like a_b / a_c.view.lkml and a / b_a / c.view.lkml
  2. Need accuracy check for imported_projects handling.
    @hsheth2 - can you please review and confirm ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree - let's do / -> .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

for remove in str_to_remove:
new_file_path = new_file_path.rstrip(remove)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't use rstrip here. It works on individual characters and not complete string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


str_to_replace: Dict[str, str] = {
f"imported_projects/{self.project_name}/": "",
"/": "_", # / is not urn friendly
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree - let's do / -> .


upstream_views_file_path[view_name] = file_path

logger.debug("Exit")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove these debug lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return None


def create_upstream_views_file_path_map(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't do this now, but I think it will make sense to more cleanly separate the looker and lookml code paths in the future e.g. this stuff really should live in looker_source.py, not looker common

Copy link
Contributor Author

@siddiquebagwan-gslab siddiquebagwan-gslab Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I think it is happening because from_api is implemented in common

) -> List[LookmlModelExploreField]:
return list(fields) if fields is not None else []

lkml_fields.extend(empty_list(explore.fields.dimensions))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case, I prefer the or [] syntax over a new function

also, nesting functions like this is generally an antipattern

@@ -457,6 +571,9 @@ class LookerExplore:
upstream_views: Optional[
List[ProjectInclude]
] = None # captures the view name(s) this explore is derived from
upstream_views_file_path: Dict[str, Optional[str]] = dataclasses_field(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this Optional[str]? it looks like it should be str since every view has a file path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please go through function create_upstream_views_file_path_map and get_view_file_path . The function is looking for view-name on lkml_fields if view-name is found then return view-file-path othrewise return None

# set file_path to ViewFieldType.UNKNOWN if file_path is not available to keep backward compatibility
# if we raise error on file_path equal to None then existing test-cases will fail as mock data doesn't have required attributes.
file_path: str = (
cast(str, self.upstream_views_file_path[view_ref.include])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this cast

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because self.upstream_views_file_path[view_ref.include] is optional, None check is there but still lint was raising error

@@ -791,12 +925,20 @@ def _to_metadata_events( # noqa: C901
upstreams = []
observed_lineage_ts = datetime.datetime.now(tz=datetime.timezone.utc)
for view_ref in sorted(self.upstream_views):
# set file_path to ViewFieldType.UNKNOWN if file_path is not available to keep backward compatibility
# if we raise error on file_path equal to None then existing test-cases will fail as mock data doesn't have required attributes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't we update the test mock data?

Copy link
Contributor Author

@siddiquebagwan-gslab siddiquebagwan-gslab Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether mock data is written as per production scenario. The fields are marked as Optional in looker sdk, so to avoid any surprises in production I preferred to keep existing test as is

@hsheth2 hsheth2 added the merge-pending-ci A PR that has passed review and should be merged once CI is green. label Sep 8, 2023
@anshbansal anshbansal merged commit 95b2d43 into datahub-project:master Sep 11, 2023
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata merge-pending-ci A PR that has passed review and should be merged once CI is green.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants