Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest/unity): Remove metastore from ingestion and urns; standardize platform instance; add notebook filter #8943

Merged

Conversation

asikowitz
Copy link
Collaborator

@asikowitz asikowitz commented Oct 4, 2023

It's not pretty but it gets the job done...

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata docs Issues and Improvements to docs labels Oct 4, 2023
Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some notes about messaging, but otherwise lgtm


### Breaking Changes

- #8810 - Removed support for SQLAlchemy 1.3.x. Only SQLAlchemy 1.4.x is supported now.
- #8853 - The Airflow plugin no longer supports Airflow 2.0.x or Python 3.7. See the docs for more details.
- #8853 - Introduced the Airflow plugin v2. If you're using Airflow 2.3+, the v2 plugin will be enabled by default, and so you'll need to switch your requirements to include `pip install 'acryl-datahub-airflow-plugin[plugin-v2]'`. To continue using the v1 plugin, set the `DATAHUB_AIRFLOW_PLUGIN_USE_V1_PLUGIN` environment variable to `true`.
- #8943 All Unity Catalog urns are changed if a new config param, `include_metastore`, is set to `false`.
This is set to `true` by default at the moment, but this default will be changed in the future.
To handle the change in urns, we recommend soft deleting all databricks data via the DataHub CLI: `datahub delete --platform databricks --soft`,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a note that if you have stateful ingest enabled, you don't need to do anything

" All databricks urns will change if you re-ingest with this disabled."
" We recommend soft deleting all databricks data and re-ingesting with include_metastore set to False."
)
logger.warning(msg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also call add_global_warning

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's that do?

@@ -247,10 +251,14 @@ def build_service_principal_map(self) -> None:

def process_notebooks(self) -> Iterable[MetadataWorkUnit]:
for notebook in self.unity_catalog_api_proxy.workspace_notebooks():
if not self.config.notebook_pattern.allowed(notebook.path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this just missing before?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the pattern as part of this PR, forgot it last time


### Breaking Changes

- #8810 - Removed support for SQLAlchemy 1.3.x. Only SQLAlchemy 1.4.x is supported now.
- #8853 - The Airflow plugin no longer supports Airflow 2.0.x or Python 3.7. See the docs for more details.
- #8853 - Introduced the Airflow plugin v2. If you're using Airflow 2.3+, the v2 plugin will be enabled by default, and so you'll need to switch your requirements to include `pip install 'acryl-datahub-airflow-plugin[plugin-v2]'`. To continue using the v1 plugin, set the `DATAHUB_AIRFLOW_PLUGIN_USE_V1_PLUGIN` environment variable to `true`.
- #8943 All Unity Catalog urns are changed if a new config param, `include_metastore`, is set to `false`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrasing is a bit confusing, and splitting this across the two sections isn't ideal

maybe let's do "The unity catalog ingestion source has a new option include_metastore, which will cause all urns to be changed when disabled. This is currently enabled by default to preserve compatibility, but will be disabled by default and then removed in the future. If stateful ingestion is enabled, simply setting include_metastore: true will perform all required cleanup. Otherwise, we recommend soft deleting all databricks data via the DataHub CLI: datahub delete --platform databricks --soft and then reingest with include_metastore: false.

@asikowitz asikowitz added the merge-pending-ci A PR that has passed review and should be merged once CI is green. label Oct 4, 2023
@asikowitz asikowitz merged commit 26bc039 into datahub-project:master Oct 6, 2023
52 of 53 checks passed
@asikowitz asikowitz deleted the databricks-remove-metastore branch October 6, 2023 03:23
@maggiehays maggiehays added the hacktoberfest-accepted Acceptance for hacktoberfest https://hacktoberfest.com/participation/ label Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Issues and Improvements to docs hacktoberfest-accepted Acceptance for hacktoberfest https://hacktoberfest.com/participation/ ingestion PR or Issue related to the ingestion of metadata merge-pending-ci A PR that has passed review and should be merged once CI is green.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants