-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(dbt-ingestion): add documentation link from dbt source to institutionalMemory #8686
feat(dbt-ingestion): add documentation link from dbt source to institutionalMemory #8686
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some initial comments from a first pass
we'll need docs for this in the dbt.md file too
stateful_ingestion: | ||
enabled: false | ||
account_id: '107298' | ||
job_id: '399798' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you merge this stuff into the dbt sample recipe instead
run_id: # set to your dbt cloud run id. This is optional, and defaults to the latest run |
@@ -12,6 +14,13 @@ | |||
OwnershipTypeClass, | |||
) | |||
|
|||
# Imports for metadata model classes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Imports for metadata model classes |
|
||
if Constants.ADD_DOC_LINK_OPERATION in operation_map: | ||
try: | ||
docs_dic = ast.literal_eval( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ast eval feels strange here - could we just use json.loads
or similar instead?
also, the ownership one supports ownership types without ever doing any parsing - can we use the same approach of a nested object here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, went with the second option. Parsing was indeed unnecessary
|
||
if Constants.ADD_DOC_LINK_OPERATION in operation_map: | ||
try: | ||
docs_dic = ast.literal_eval( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs_dic = ast.literal_eval( | |
docs_dict = ast.literal_eval( |
now = int(time.time() * 1000) # milliseconds since epoch | ||
institutional_memory_element = InstitutionalMemoryMetadataClass( | ||
url=docs_dic["link"], | ||
description=docs_dic["descr"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
descr -> description
Deployment failed with the following error:
|
account_id: # set to your dbt cloud account id | ||
project_id: # set to your dbt cloud project id | ||
job_id: # set to your dbt cloud job id | ||
account_id: '107298' # set to your dbt cloud account id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make it clear that this is not something they can just copy
account_id: '107298' # set to your dbt cloud account id | |
account_id: "${DBT_ACCOUNT_ID}" # set to your dbt cloud account id |
match: ".*" | ||
operation: "add_link" | ||
config: | ||
link: "https://en.wikipedia.org/wiki/Anchor_text" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's change this example to use a pattern
in reality, we don't expect people to just add the same link to every asset. instead, the link is going to be in their meta section and we have to extract it out
|
||
if Constants.ADD_DOC_LINK_OPERATION in operation_map: | ||
try: | ||
docs_dict = operation_map[Constants.ADD_DOC_LINK_OPERATION].pop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using pop() here doesn't seem right
}, | ||
}, | ||
) | ||
# we require a description, so this should stay empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if description is missing, do we issue a generic error message or a detailed one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now we give a detailed one 👍
processor = OperationProcessor( | ||
operation_defs={ | ||
"documentation_link": { | ||
"match": ".*", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also test that match param works correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on this one for me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something similar to the test case here
"match": "^@(.*)", |
e.g. match=(.+)(?:#.*)?
and raw props =http://example.com/my-page#stuff-to-ignore
should produce an aspect with link = http://example.com/my-page
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
CI passed on a prior commit, so going to merge @ethan-cartwright - if CI is green, there's usually no need to hit the "update branch" button |
oh okay thanks! |
Description
We want the ability to add a link in the
+Add Link
section from the DBT source side. This PR adds dbt-side specifieddocumentation_link
as an institutionalMemory aspect on a dataset with the datahub-sidemeta_mapping
specifieddescription
in theconfig
(see the dbt.md file for details).NOTE: right now it replaces the current institution memory link with the new link. If we'd like the ability to add multiple links, or to check the dataset's current institutionalMemory and add our link to it, let me know.
Example
adds:
to:
Testing
Results:
Checklist