refactor(ingest): Refactor structured logging to support infos, warnings, and failures structured reporting to UI #10828

jjoyce0510 · 2024-07-02T02:39:37Z

Summary

In this PR, we add a new structured_logs field and refactor the warning / failures APIs to take type, message, and context.
Still need to update a bunch of method references, will do that refactoring once approach is aligned.

We also add support for throwing well-specified exception types, and mapping those into a standard set of types. This enables the source to EITHER raise a standard exception OR report_failure and return.

QA

I qa'd this locally by testing various failure and warning scenarios to ensure UI is displaying them.

Status

Review

Checklist

The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
Links to related issues (if applicable)
Tests for the changes have been added/updated (if applicable)
Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

Summary by CodeRabbit

Refactor
- Updated structured report creation to handle different types of log entries, improving the accuracy and detail of generated reports.
- Changed types in various components to align with new structured report log entry format.
Bug Fixes
- Corrected type inconsistencies in report item handling to ensure smoother data processing and display.

…estion' into jj--add-structured-logging-to-ingestion

coderabbitai · 2024-07-02T02:39:47Z

Walkthrough

The recent updates to DataHub's code involve a refactoring of the structured report handling functionality. This refactoring primarily revolves around changing data types in various TypeScript files to ensure consistency and reliability, particularly shifting from StructuredReportItem to StructuredReportLogEntry. These changes streamline the process of creating and managing structured reports based on log entries.

Changes

File Path	Change Summary
`datahub-web-react/src/app/ingest/source/utils.ts`	Refactored imports and functions to use `StructuredReportLogEntry` instead of `StructuredReportItem`, updating report creation logic.
`.../executions/reporting/StructuredReportItem.tsx`, `.../executions/reporting/StructuredReportItemContext.tsx`	Updated type usage in `Props` interfaces from `StructuredReportItem` to `StructuredReportLogEntry`.
`datahub-web-react/src/app/ingest/source/executions/reporting/StructuredReportItemList.tsx`	Modified `items` prop type from `StructuredReportItemType[]` to `StructuredReportLogEntry[]` in the `Props` interface.

A rabbit in the data
Hops through logs and code,
Building reports with ease,
On paths new and old.

From items to entries,
The change flows like the stream,
In DataHub's garden,
Where structure is the dream.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 8d5f0f3 and 856731c.

Files selected for processing (4)

metadata-ingestion/src/datahub/ingestion/api/exception.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/api/source.py (3 hunks)
metadata-ingestion/src/datahub/ingestion/run/pipeline.py (7 hunks)
metadata-ingestion/tests/unit/test_nifi_source.py (4 hunks)

Additional context used

Ruff

metadata-ingestion/src/datahub/ingestion/run/pipeline.py

103-104: Use a single if statement instead of nested if statements

(SIM102)

541-543: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)

Additional comments not posted (30)

metadata-ingestion/src/datahub/ingestion/api/exception.py (13)

4-5: LGTM!

The ScanUnauthorizedException class looks good.

8-9: LGTM!

The LineageUnauthorizedException class looks good.

12-13: LGTM!

The UsageUnauthorizedException class looks good.

16-17: LGTM!

The ProfilingUnauthorizedException class looks good.

20-21: LGTM!

The LineageQueryParsingFailedException class looks good.

24-25: LGTM!

The UsageQueryParsingFailedException class looks good.

28-29: LGTM!

The ConnectionFailedCoordinatesException class looks good.

32-33: LGTM!

The ConnectionFailedCredentialsException class looks good.

36-37: LGTM!

The ConnectionFailedServiceUnavailableException class looks good.

40-41: LGTM!

The ConnectionFailedServiceTimeoutException class looks good.

44-45: LGTM!

The ConnectionFailedUnknownException class looks good.

48-60: LGTM!

The StructuredReportLogType enum looks good.

63-75: LGTM!

The EXCEPTION_TO_REPORT_TYPE dictionary looks good.

metadata-ingestion/src/datahub/ingestion/api/source.py (9)

65-68: LGTM!

The StructuredLogLevel enum looks good.

71-76: LGTM!

The StructuredLog dataclass looks good.

99-115: LGTM!

The structured_logs property method looks good.

147-164: LGTM!

The report_warning method looks good.

165-167: LGTM!

The warning method looks good.

169-186: LGTM!

The report_failure method looks good.

187-189: LGTM!

The failure method looks good.

190-206: LGTM!

The report_info method looks good.

208-210: LGTM!

The info method looks good.

metadata-ingestion/tests/unit/test_nifi_source.py (4)

337-337: LGTM!

The test_single_user_auth_failed_to_get_token test function looks good.

356-356: LGTM!

The test_kerberos_auth_failed_to_get_token test function looks good.

376-376: LGTM!

The test_client_cert_auth_failed test function looks good.

396-396: LGTM!

The test_failure_to_create_nifi_flow test function looks good.

metadata-ingestion/src/datahub/ingestion/run/pipeline.py (4)

512-513: LGTM!

The run method looks good.

623-623: LGTM!

The _approx_all_vals method looks good.

653-653: LGTM!

The pretty_print_summary method looks good.

721-732: LGTM!

The _handle_uncaught_pipeline_exception method looks good.

…estion' into jj--add-structured-logging-to-ingestion

coderabbitai

Actionable comments posted: 10

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 856731c and fd7357a.

Files selected for processing (7)

metadata-ingestion/src/datahub/ingestion/api/exception.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/api/source.py (3 hunks)
metadata-ingestion/src/datahub/ingestion/run/pipeline.py (6 hunks)
metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source/sql/vertica.py (1 hunks)
metadata-ingestion/tests/integration/snowflake/test_snowflake_failures.py (8 hunks)
metadata-ingestion/tests/unit/test_nifi_source.py (4 hunks)

Files skipped from review as they are similar to previous changes (4)

metadata-ingestion/src/datahub/ingestion/api/exception.py
metadata-ingestion/src/datahub/ingestion/api/source.py
metadata-ingestion/src/datahub/ingestion/run/pipeline.py
metadata-ingestion/tests/unit/test_nifi_source.py

Additional context used

Ruff

metadata-ingestion/tests/integration/snowflake/test_snowflake_failures.py

77-77: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

99-99: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

121-121: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

149-149: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

175-176: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

201-202: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

232-232: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

254-254: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

coderabbitai · 2024-07-02T18:39:42Z

metadata-ingestion/tests/integration/snowflake/test_snowflake_failures.py

@@ -96,7 +96,7 @@ def test_snowflake_missing_warehouse_access_causes_pipeline_failure(
        )
        pipeline = Pipeline(snowflake_pipeline_config)
        pipeline.run()
-        assert "permission-error" in pipeline.source.get_report().failures.keys()
+        assert "permission-error" in pipeline.source.get_report()._errors.keys()


Optimize dictionary key check.

Use key in dict instead of key in dict.keys().

- assert "permission-error" in pipeline.source.get_report()._errors.keys() + assert "permission-error" in pipeline.source.get_report()._errors

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

assert "permission-error" in pipeline.source.get_report()._errors.keys()

assert "permission-error" in pipeline.source.get_report()._errors

Tools

Ruff

99-99: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

coderabbitai · 2024-07-02T18:39:42Z

metadata-ingestion/tests/integration/snowflake/test_snowflake_failures.py

@@ -118,7 +118,7 @@
        )
        pipeline = Pipeline(snowflake_pipeline_config)
        pipeline.run()
-        assert "permission-error" in pipeline.source.get_report().failures.keys()
+        assert "permission-error" in pipeline.source.get_report()._errors.keys()


Optimize dictionary key check.

Use key in dict instead of key in dict.keys().

- assert "permission-error" in pipeline.source.get_report()._errors.keys() + assert "permission-error" in pipeline.source.get_report()._errors

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

assert "permission-error" in pipeline.source.get_report()._errors.keys()

assert "permission-error" in pipeline.source.get_report()._errors

Tools

Ruff

121-121: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

coderabbitai · 2024-07-02T18:39:42Z

metadata-ingestion/tests/integration/snowflake/test_snowflake_failures.py

            "Failed to get primary key for table"
-            in pipeline.source.get_report().warnings.keys()
+            in pipeline.source.get_report()._warnings.keys()


Optimize dictionary key check.

Use key in dict instead of key in dict.keys().

- "Failed to get primary key for table" in pipeline.source.get_report()._warnings.keys() + "Failed to get primary key for table" in pipeline.source.get_report()._warnings

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"Failed to get primary key for table"

in pipeline.source.get_report().warnings.keys()

in pipeline.source.get_report()._warnings.keys()

"Failed to get primary key for table"

in pipeline.source.get_report()._warnings

Tools

Ruff

201-202: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

coderabbitai · 2024-07-02T18:39:42Z

metadata-ingestion/tests/integration/snowflake/test_snowflake_failures.py

@@ -146,7 +146,7 @@

        pipeline = Pipeline(snowflake_pipeline_config)
        pipeline.run()
-        assert "permission-error" in pipeline.source.get_report().failures.keys()
+        assert "permission-error" in pipeline.source.get_report()._errors.keys()


Optimize dictionary key check.

Use key in dict instead of key in dict.keys().

- assert "permission-error" in pipeline.source.get_report()._errors.keys() + assert "permission-error" in pipeline.source.get_report()._errors

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

assert "permission-error" in pipeline.source.get_report()._errors.keys()

assert "permission-error" in pipeline.source.get_report()._errors

Tools

Ruff

149-149: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

coderabbitai · 2024-07-02T18:39:42Z

metadata-ingestion/tests/integration/snowflake/test_snowflake_failures.py

-        assert (
-            "lineage-permission-error" in pipeline.source.get_report().failures.keys()
-        )
+        assert "lineage-permission-error" in pipeline.source.get_report()._errors.keys()


Optimize dictionary key check.

Use key in dict instead of key in dict.keys().

- assert "lineage-permission-error" in pipeline.source.get_report()._errors.keys() + assert "lineage-permission-error" in pipeline.source.get_report()._errors

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

assert "lineage-permission-error" in pipeline.source.get_report()._errors.keys()

assert "lineage-permission-error" in pipeline.source.get_report()._errors

Tools

Ruff

232-232: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

coderabbitai · 2024-07-02T18:39:42Z

metadata-ingestion/tests/integration/snowflake/test_snowflake_failures.py

@@ -253,4 +251,4 @@
        )
        pipeline = Pipeline(snowflake_pipeline_config)
        pipeline.run()
-        assert "usage-permission-error" in pipeline.source.get_report().failures.keys()
+        assert "usage-permission-error" in pipeline.source.get_report()._errors.keys()


Optimize dictionary key check.

Use key in dict instead of key in dict.keys().

- assert "usage-permission-error" in pipeline.source.get_report()._errors.keys() + assert "usage-permission-error" in pipeline.source.get_report()._errors

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

assert "usage-permission-error" in pipeline.source.get_report()._errors.keys()

assert "usage-permission-error" in pipeline.source.get_report()._errors

Tools

Ruff

254-254: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

coderabbitai · 2024-07-02T18:39:42Z

metadata-ingestion/tests/integration/snowflake/test_snowflake_failures.py

@@ -74,7 +74,7 @@

        pipeline = Pipeline(snowflake_pipeline_config)
        pipeline.run()
-        assert "permission-error" in pipeline.source.get_report().failures.keys()
+        assert "permission-error" in pipeline.source.get_report()._errors.keys()


Optimize dictionary key check.

Use key in dict instead of key in dict.keys().

- assert "permission-error" in pipeline.source.get_report()._errors.keys() + assert "permission-error" in pipeline.source.get_report()._errors

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

assert "permission-error" in pipeline.source.get_report()._errors.keys()

assert "permission-error" in pipeline.source.get_report()._errors

Tools

Ruff

77-77: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

coderabbitai · 2024-07-02T18:39:42Z

metadata-ingestion/tests/integration/snowflake/test_snowflake_failures.py

            "Failed to get columns for table"
-            in pipeline.source.get_report().warnings.keys()
+            in pipeline.source.get_report()._warnings.keys()


Optimize dictionary key check.

Use key in dict instead of key in dict.keys().

- "Failed to get columns for table" in pipeline.source.get_report()._warnings.keys() + "Failed to get columns for table" in pipeline.source.get_report()._warnings

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"Failed to get columns for table"

in pipeline.source.get_report().warnings.keys()

in pipeline.source.get_report()._warnings.keys()

"Failed to get columns for table" in pipeline.source.get_report()._warnings

Tools

Ruff

175-176: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

coderabbitai · 2024-07-02T18:39:43Z

metadata-ingestion/src/datahub/ingestion/source/sql/vertica.py

+            missing_column_info_warn = self.report._warnings.get(MISSING_COLUMN_INFO)
            if (
                missing_column_info_warn is not None
-                and dataset_name in missing_column_info_warn
+                and dataset_name in missing_column_info_warn.context


Access the _errors property instead of _warnings.

The _warnings property should be _errors based on the AI-generated summary.

- missing_column_info_warn = self.report._warnings.get(MISSING_COLUMN_INFO) + missing_column_info_warn = self.report._errors.get(MISSING_COLUMN_INFO)

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

missing_column_info_warn = self.report._warnings.get(MISSING_COLUMN_INFO)

if (

missing_column_info_warn is not None

and dataset_name in missing_column_info_warn

and dataset_name in missing_column_info_warn.context

missing_column_info_warn = self.report._errors.get(MISSING_COLUMN_INFO)

if (

missing_column_info_warn is not None

and dataset_name in missing_column_info_warn.context

coderabbitai · 2024-07-02T18:39:43Z

metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py

+            missing_column_info_warn = self.report._warnings.get(MISSING_COLUMN_INFO)
            if (
                missing_column_info_warn is not None
-                and dataset_name in missing_column_info_warn
+                and dataset_name in missing_column_info_warn.context


Access the _errors property instead of _warnings.

The _warnings property should be _errors based on the AI-generated summary.

- missing_column_info_warn = self.report._warnings.get(MISSING_COLUMN_INFO) + missing_column_info_warn = self.report._errors.get(MISSING_COLUMN_INFO)

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

missing_column_info_warn = self.report._warnings.get(MISSING_COLUMN_INFO)

if (

missing_column_info_warn is not None

and dataset_name in missing_column_info_warn

and dataset_name in missing_column_info_warn.context

missing_column_info_warn = self.report._errors.get(MISSING_COLUMN_INFO)

if (

missing_column_info_warn is not None

and dataset_name in missing_column_info_warn.context

…estion' into jj--add-structured-logging-to-ingestion

coderabbitai

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between fd7357a and b5bfe6c.

Files selected for processing (11)

metadata-ingestion/src/datahub/ingestion/api/exception.py (1 hunks)
metadata-ingestion/src/datahub/ingestion/source/dynamodb/dynamodb.py (3 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (3 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py (5 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (13 hunks)
metadata-ingestion/src/datahub/ingestion/source/metabase.py (15 hunks)
metadata-ingestion/src/datahub/ingestion/source/mode.py (15 hunks)
metadata-ingestion/src/datahub/ingestion/source/mongodb.py (3 hunks)
metadata-ingestion/src/datahub/ingestion/source/openapi.py (6 hunks)
metadata-ingestion/src/datahub/ingestion/source/redash.py (2 hunks)
metadata-ingestion/src/datahub/ingestion/source/tableau.py (3 hunks)

Files skipped from review as they are similar to previous changes (1)

metadata-ingestion/src/datahub/ingestion/api/exception.py

Additional context used

Ruff

metadata-ingestion/src/datahub/ingestion/source/openapi.py

340-340: Use key not in dict instead of key not in dict.keys()

Remove .keys()

(SIM118)

Additional comments not posted (48)

metadata-ingestion/src/datahub/ingestion/source/openapi.py (3)
187-212: LGTM!

The changes to the report_bad_responses function improve error message clarity and are well-structured.

279-279: LGTM!

The changes to the get_workunits_internal function ensure consistent usage of the type parameter.

Also applies to: 301-303, 328-332, 361-363, 393-395

340-340: Simplify dictionary key check.

Use key not in dict instead of key not in dict.keys().
- if endpoint_k not in config.forced_examples.keys():
+ if endpoint_k not in config.forced_examples:
Tools

Ruff

340-340: Use key not in dict instead of key not in dict.keys()

Remove .keys()

(SIM118)
metadata-ingestion/src/datahub/ingestion/source/mongodb.py (5)

321-323: LGTM!

The changes to the get_pymongo_type_string function improve the clarity of warning messages by providing additional context.

347-349: LGTM!

The changes to the get_field_type function improve the clarity of warning messages by providing additional context.

425-427: LGTM!

The changes to the construct_schema_metadata function improve the clarity of warning messages by providing additional context.

Line range hint 539-541: LGTM!

The changes to the get_native_type function improve the clarity of warning messages by providing additional context.

Line range hint 556-558: LGTM!

The changes to the get_field_type function improve the clarity of warning messages by providing additional context.

metadata-ingestion/src/datahub/ingestion/source/dynamodb/dynamodb.py (4)

469-471: LGTM!

The changes to the construct_schema_metadata function improve the clarity of warning messages by providing additional context.

539-541: LGTM!

The changes to the get_native_type function improve the clarity of warning messages by providing additional context.

556-558: LGTM!

The changes to the get_field_type function improve the clarity of warning messages by providing additional context.

556-558: LGTM!

The changes to the get_datasource_urn function improve the clarity of warning messages by providing additional context.

Also applies to: 571-573

metadata-ingestion/src/datahub/ingestion/source/metabase.py (8)

214-215: LGTM!

The changes to the setup_session function improve the clarity of failure messages by providing additional context.

226-227: LGTM!

The changes to the close function improve the clarity of failure messages by providing additional context.

260-262: LGTM!

The changes to the emit_dashboard_mces function improve the clarity of failure messages by providing additional context.

287-289: LGTM!

The changes to the construct_dashboard_from_api_data function improve the clarity of warning messages by providing additional context.

351-353: LGTM!

The changes to the _get_ownership function improve the clarity of warning messages by providing additional context.

Also applies to: 358-360

392-394: LGTM!

The changes to the emit_card_mces function improve the clarity of failure messages by providing additional context.

415-417: LGTM!

The changes to the construct_card_from_api_data function improve the clarity of warning messages by providing additional context.

Also applies to: 425-427, 434-436

556-558: LGTM!

The changes to the get_datasource_urn function improve the clarity of warning messages by providing additional context.

Also applies to: 571-573

metadata-ingestion/src/datahub/ingestion/source/redash.py (2)

662-677: LGTM!

The changes to use report_warning for structured logging are appropriate and align with the new logging approach.

716-717: LGTM!

The changes to use report_warning for structured logging are appropriate and align with the new logging approach.

metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (3)

581-582: LGTM!

The changes to use report_warning for structured logging are appropriate and align with the new logging approach.

838-840: LGTM!

The changes to use report_warning for structured logging are appropriate and align with the new logging approach.

986-988: LGTM!

The changes to use report_warning for structured logging are appropriate and align with the new logging approach.
metadata-ingestion/src/datahub/ingestion/source/mode.py (5)
314-315: Add structured logging for connection verification.

The structured logging for connection verification looks good.

However, ensure that the structured logging functionality is verified.

Verification successful

Structured logging for connection verification is correctly implemented.

The structured logging for connection verification using report_failure is present in the codebase as expected.

metadata-ingestion/src/datahub/ingestion/source/mode.py: Lines 314-315
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the structured logging for connection verification works correctly.

# Test: Search for the structured logging usage in the codebase. Expect: structured logging for connection verification.
rg --type python 'report_failure' --context 5
Length of output: 76

Script:
#!/bin/bash
# Description: Verify that the structured logging for connection verification works correctly.

# Test: Search for the structured logging usage in Python files. Expect: structured logging for connection verification.
rg 'report_failure' --glob '*.py' --context 5
Length of output: 93868
531-532: Add structured logging for failed retrieval of spaces for the workspace.

The structured logging for failed retrieval of spaces for the workspace looks good.

However, ensure that the structured logging functionality is verified.

Verification successful

Structured logging for failed retrieval of spaces for the workspace is correctly implemented.

The structured logging functionality for failed retrieval of spaces for the workspace is verified and appears to be implemented correctly.

The report_failure function is used for structured logging in the specified file and other parts of the codebase.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the structured logging for failed retrieval of spaces for the workspace works correctly.

# Test: Search for the structured logging usage in the codebase. Expect: structured logging for failed retrieval of spaces for the workspace.
rg --type python 'report_failure' --context 5
Length of output: 76

Script:
#!/bin/bash
# Description: Verify that the structured logging for failed retrieval of spaces for the workspace works correctly.

# Test: Search for the structured logging usage in the codebase. Expect: structured logging for failed retrieval of spaces for the workspace.
rg 'report_failure' --context 5
Length of output: 93854
374-375: Add structured logging for missing report tokens and IDs.

The structured logging for missing report tokens and IDs looks good.

However, ensure that the structured logging functionality is verified.

Also applies to: 381-382

Verification successful

Add structured logging for missing report tokens and IDs.

The structured logging for missing report tokens and IDs is implemented correctly in the codebase. The report_warning method is utilized as expected.

metadata-ingestion/src/datahub/ingestion/source/mode.py:

Line 374: self.report.report_warning(type="Missing Report Token", message=f"Report token is missing for {report_info.get('id', '')}")

Line 381: self.report.report_warning(type="Missing Report ID", message=f"Report id is missing for {report_info.get('token', '')}")
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the structured logging for missing report tokens and IDs works correctly.

# Test: Search for the structured logging usage in the codebase. Expect: structured logging for missing report tokens and IDs.
rg --type python 'report_warning' --context 5
Length of output: 76

Script:
#!/bin/bash
# Description: Verify that the structured logging for missing report tokens and IDs works correctly.

# Test: Search for the structured logging usage in the codebase. Expect: structured logging for missing report tokens and IDs.
rg 'report_warning' --glob '*.py' --context 5
Length of output: 256673
679-680: Add structured logging for failed retrieval of data sources.

The structured logging for failed retrieval of data sources looks good.

However, ensure that the structured logging functionality is verified.

Verification successful

Structured logging for failed retrieval of data sources verified successfully.

The structured logging for failed retrieval of data sources is implemented correctly across the codebase, including the target file.

metadata-ingestion/src/datahub/ingestion/source/mode.py:

Line 679: self.report.report_failure(type="Unable to retrieve Data Sources", message=f"Unable to retrieve data sources. Reason: {str(http_error)}")
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the structured logging for failed retrieval of data sources works correctly.

# Test: Search for the structured logging usage in the codebase. Expect: structured logging for failed retrieval of data sources.
rg --type python 'report_failure' --context 5
Length of output: 76

Script:
#!/bin/bash
# Description: Verify that the structured logging for failed retrieval of data sources works correctly.

# Test: Search for the structured logging usage in the codebase. Expect: structured logging for failed retrieval of data sources.
rg --type py 'report_failure' --context 5
Length of output: 93864
491-493: Add structured logging for failed retrieval of Mode creator.

The structured logging for failed retrieval of Mode creator looks good.

However, ensure that the structured logging functionality is verified.

Verification successful

Add structured logging for failed retrieval of Mode creator.

The structured logging for failed retrieval of Mode creator looks good.

File: metadata-ingestion/src/datahub/ingestion/source/mode.py

Lines: 491-493
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that the structured logging for failed retrieval of Mode creator works correctly.

# Test: Search for the structured logging usage in the codebase. Expect: structured logging for failed retrieval of Mode creator.
rg --type python 'report_warning' --context 5
Length of output: 76

Script:
#!/bin/bash
# Description: Verify that the structured logging for failed retrieval of Mode creator works correctly.

# Test: Search for the structured logging usage in the codebase. Expect: structured logging for failed retrieval of Mode creator.
rg --type py 'report_warning' --context 5
Length of output: 256669
metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py (4)

287-289: LGTM! Structured logging for warnings.

The changes to report warnings using structured logs are approved.

597-599: LGTM! Structured logging for unrecognized chart types.

The changes to report warnings for unrecognized chart types using structured logs are approved.

Also applies to: 606-608

1257-1259: LGTM! Structured logging for errors and warnings.

The changes to report errors and warnings using structured logs are approved.

Also applies to: 1269-1271

1549-1551: LGTM! Structured logging for warnings.

The changes to report warnings using structured logs are approved.

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (12)

422-424: Good use of structured logging for error handling.

The changes correctly implement the new structured logging format for reporting errors when loading included files.

525-529: Good use of structured logging for error handling.

The changes correctly implement the new structured logging format for reporting failures when resolving includes.

532-534: Good use of structured logging for error handling.

The changes correctly implement the new structured logging format for reporting failures when resolving wildcard includes.

573-575: Good use of structured logging for error handling.

The changes correctly implement the new structured logging format for reporting errors when loading included files.

686-690: Good use of structured logging for error handling.

The changes correctly implement the new structured logging format for reporting failures when loading view files.

708-712: Good use of structured logging for error handling.

The changes correctly implement the new structured logging format for reporting errors when loading view files.

1369-1371: Good use of structured logging for error handling.

The changes correctly implement the new structured logging format for reporting errors when parsing SQL queries.

1956-1958: Good use of structured logging for error handling.

The changes correctly implement the new structured logging format for reporting failures when no metadata is produced.

2083-2085: Good use of structured logging for error handling.

The changes correctly implement the new structured logging format for reporting errors when loading models.

2096-2098: Good use of structured logging for error handling.

The changes correctly implement the new structured logging format for reporting failures when loading connections.

2138-2140: Good use of structured logging for error handling.

The changes correctly implement the new structured logging format for reporting failures when processing explores.

2220-2222: Good use of structured logging for error handling.

The changes correctly implement the new structured logging format for reporting errors when loading Looker views.

metadata-ingestion/src/datahub/ingestion/source/tableau.py (2)

803-805: Use consistent structured logging format.

The new structured logging format is correctly implemented for reporting failures.

2825-2827: Use consistent structured logging format.

The new structured logging format is correctly implemented for reporting failures.

…estion' into jj--add-structured-logging-to-ingestion

coderabbitai

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between b5bfe6c and 47445c8.

Files selected for processing (1)

metadata-ingestion/src/datahub/ingestion/api/source.py (4 hunks)

Additional comments not posted (7)

metadata-ingestion/src/datahub/ingestion/api/source.py (7)
66-70: Enum StructuredLogLevel is well-defined.

The enum class StructuredLogLevel is correctly defined with levels INFO, WARN, and ERROR.

104-123: Properties warnings, failures, and infos are well-defined.

The properties correctly aggregate the respective logs into a LossyList.

295-302: Method as_obj is well-defined.

The method converts the report object to a dictionary with materialized properties.

202-247: Methods report_failure and failure are well-defined.

The methods handle and log failures with structured logs.

Ensure the logging functionality is supported.

Verification successful

We need to re-verify the logging functionality by correctly specifying the file type for rg and checking the definition of logger.

Logging functionality is supported.

The logger is defined and used extensively throughout the codebase, confirming that the logging functionality is appropriately supported.

metadata-ingestion/src/datahub/ingestion/api/source.py includes the required logger.error usage.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the logging functionality is supported.

# Test: Search for the usage of `logger.error`. Expect: The logging functionality to be supported.
rg --type python $'logger.error'
Length of output: 62

Script:
#!/bin/bash
# Description: Verify the logging functionality is supported.

# Test: Search for the usage of `logger.error`. Expect: The logging functionality to be supported.
rg --type py 'logger.error'

# Test: Search for the definition of `logger`. Expect: `logger` to be defined and properly configured.
rg --type py 'logger ='
Length of output: 30465
249-289: Methods report_info and info are well-defined.

The methods handle and log info messages with structured logs.

Ensure the logging functionality is supported.

Verification successful

Methods report_info and info are well-defined.

The methods handle and log info messages with structured logs. The logging functionality is supported as confirmed by the presence of logger.info in multiple files across the codebase.

metadata-ingestion/src/datahub/ingestion/api/source.py

metadata-ingestion/src/datahub/ingestion/source_config/operation_config.py

metadata-ingestion/src/datahub/ingestion/source_config/sql/snowflake.py

and many others.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the logging functionality is supported.

# Test: Search for the usage of `logger.info`. Expect: The logging functionality to be supported.
rg --type python $'logger.info'
Length of output: 61

Script:
#!/bin/bash
# Description: Verify the logging functionality is supported.

# Test: Search for the usage of `logger.info`. Expect: The logging functionality to be supported.
rg --type py $'logger.info'
Length of output: 39297
72-79: Dataclass StructuredLog is well-defined.

The dataclass StructuredLog is correctly defined with appropriate fields for structured logging.

Ensure that the Report class supports the required functionality.

Verification successful

Dataclass StructuredLog is well-defined.

The dataclass StructuredLog is correctly defined with appropriate fields for structured logging. The Report class provides utility methods that support the required functionality for StructuredLog.

metadata-ingestion/src/datahub/ingestion/api/report.py: The Report class extends SupportsAsObj and includes methods for converting values to strings and pure Python objects.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the `Report` class supports the required functionality.

# Test: Search for the `Report` class definition. Expect: The class to support required functionality.
rg --type python $'class Report'
Length of output: 64

Script:
#!/bin/bash
# Description: Verify the `Report` class supports the required functionality.

# Test: Search for the `Report` class definition in Python files.
rg 'class Report' --glob '*.py'
Length of output: 536

Script:
#!/bin/bash
# Description: Extract the definition and methods of the `Report` class in `metadata-ingestion/src/datahub/ingestion/api/report.py`.

# Extract the `Report` class definition and its methods.
ast-grep --lang python --pattern $'class Report($_) {
  $$$
}' metadata-ingestion/src/datahub/ingestion/api/report.py
Length of output: 567

Script:
#!/bin/bash
# Description: Extract the definition and methods of the `Report` class in `metadata-ingestion/src/datahub/ingestion/api/report.py`.

# Extract the `Report` class definition and its methods.
ast-grep --lang python --pattern 'class Report: $$$' metadata-ingestion/src/datahub/ingestion/api/report.py
Length of output: 109

Script:
#!/bin/bash
# Description: Extract the definition and methods of the `Report` class in `metadata-ingestion/src/datahub/ingestion/api/report.py`.

# Search for the `Report` class definition and output surrounding lines for context.
rg 'class Report' -A 20 metadata-ingestion/src/datahub/ingestion/api/report.py
Length of output: 1409
155-201: Methods report_warning and warning are well-defined.

The methods handle and log warnings with structured logs.

Ensure the logging functionality is supported.

Verification successful

Methods report_warning and warning are well-defined.

The methods handle and log warnings with structured logs. The logging functionality is supported across the codebase.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the logging functionality is supported.

# Test: Search for the usage of `logger.warning`. Expect: The logging functionality to be supported.
rg --type python $'logger.warning'
Length of output: 64

Script:
#!/bin/bash
# Description: Verify the logging functionality is supported.

# Test: Search for the usage of `logger.warning` in .py files.
rg 'logger.warning' --glob '*.py'
Length of output: 28807

…estion' into jj--add-structured-logging-to-ingestion

coderabbitai

Actionable comments posted: 2

Outside diff range and nitpick comments (4)

datahub-web-react/src/app/ingest/source/types.ts (1)

38-38: Ensure Consistency in Field Names

The items field in StructuredReport should be consistent with the naming conventions used in the rest of the interface.

Consider renaming items to logEntries for clarity and consistency.
datahub-web-react/src/app/ingest/source/utils.ts (3)
131-133: Use Constants for Regular Expressions

Consider defining the URL pattern as a constant outside the function for better readability and maintainability.
const URL_PATTERN = /^(?:http(s)?:\/\/)?[\w.-]+(?:\.[a-zA-Z0-9.-]{2,})+[\w\-._~:/?#[\]@!$&'()*+,;=.]+$/;

export const validateURL = (fieldName: string) => {
    return {
        validator(_, value) {
            const isURLValid = URL_PATTERN.test(value);
            if (!value || isURLValid) {
                return Promise.resolve();
            }
            return Promise.reject(new Error(`A valid ${fieldName} is required.`));
        },
    };
};
165-173: Improve Commenting and Documentation

The comments and documentation for mapItemObject and mapItemArray could be more detailed to explain the purpose and usage of these helper functions.

Consider adding more detailed comments and examples to improve readability and maintainability.

Also applies to: 178-181

Line range hint 341-341: Simplify Dictionary Key Check

Use key not in dict instead of key not in dict.keys() for better readability and performance.
if endpoint_k not in config.forced_examples:

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 47445c8 and 023ad85.

Files selected for processing (13)

datahub-web-react/src/app/ingest/source/types.ts (2 hunks)
datahub-web-react/src/app/ingest/source/utils.ts (4 hunks)
metadata-ingestion/src/datahub/ingestion/api/source.py (4 hunks)
metadata-ingestion/src/datahub/ingestion/source/dynamodb/dynamodb.py (3 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py (3 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py (5 hunks)
metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (13 hunks)
metadata-ingestion/src/datahub/ingestion/source/metabase.py (15 hunks)
metadata-ingestion/src/datahub/ingestion/source/mode.py (15 hunks)
metadata-ingestion/src/datahub/ingestion/source/mongodb.py (3 hunks)
metadata-ingestion/src/datahub/ingestion/source/openapi.py (6 hunks)
metadata-ingestion/src/datahub/ingestion/source/redash.py (2 hunks)
metadata-ingestion/src/datahub/ingestion/source/tableau.py (3 hunks)

Files skipped from review due to trivial changes (1)

metadata-ingestion/src/datahub/ingestion/source/tableau.py

Files skipped from review as they are similar to previous changes (7)

metadata-ingestion/src/datahub/ingestion/source/dynamodb/dynamodb.py
metadata-ingestion/src/datahub/ingestion/source/looker/looker_common.py
metadata-ingestion/src/datahub/ingestion/source/looker/looker_source.py
metadata-ingestion/src/datahub/ingestion/source/metabase.py
metadata-ingestion/src/datahub/ingestion/source/mode.py
metadata-ingestion/src/datahub/ingestion/source/mongodb.py
metadata-ingestion/src/datahub/ingestion/source/redash.py

Additional context used

Ruff

metadata-ingestion/src/datahub/ingestion/source/openapi.py

341-341: Use key not in dict instead of key not in dict.keys()

Remove .keys()

(SIM118)

Additional comments not posted (27)

datahub-web-react/src/app/ingest/source/types.ts (1)

24-26: Ensure Optional Fields are Properly Handled

The title field is marked as optional, but message and context are not. Ensure that all usages of StructuredReportLogEntry properly handle the case where title is undefined.

datahub-web-react/src/app/ingest/source/utils.ts (3)

16-16: Imports Look Good

The imports for StructuredReport, StructuredReportLogEntry, and StructuredReportItemLevel are correctly added.

145-145: Ensure Case-Insensitive Matching

Ensure that toLocaleUpperCase is appropriate for your use case. If you need case-insensitive matching, consider using toUpperCase for simplicity.

Line range hint 148-153: Functionality Looks Good

The createStructuredReport function correctly calculates the counts and returns the structured report object.

metadata-ingestion/src/datahub/ingestion/source/openapi.py (5)

280-280: Ensure Proper Warning Handling

The warning message splitting logic should be robust to handle unexpected formats.

Ensure that the warning message splitting logic correctly handles all expected formats.

302-304: Ensure Context is Properly Handled

The context field is newly introduced. Ensure that all usages correctly handle this field.

330-333: Ensure Consistent Warning Messages

Ensure that the warning messages are consistent and provide enough context for debugging.

362-364: Ensure Consistent Warning Messages

Ensure that the warning messages are consistent and provide enough context for debugging.

394-396: Ensure Consistent Warning Messages

Ensure that the warning messages are consistent and provide enough context for debugging.

metadata-ingestion/src/datahub/ingestion/api/source.py (13)

66-70: Enum Declaration Looks Good

The StructuredLogLevel enum is correctly declared with the appropriate log levels.

72-79: Dataclass Declaration Looks Good

The StructuredLog dataclass is correctly declared with the appropriate fields.

95-103: LossyDict Initialization Looks Good

The initialization of _errors, _warnings, and _infos using LossyDict is correct.

104-110: Property Method Looks Good

The warnings property method correctly aggregates the warnings.

111-117: Property Method Looks Good

The failures property method correctly aggregates the failures.

118-123: Property Method Looks Good

The infos property method correctly aggregates the infos.

155-175: Method Documentation Looks Good

The documentation for the report_warning method is clear and detailed.

Also applies to: 176-181

192-201: Method Implementation Looks Good

The warning method correctly calls report_warning and logs the warning.

202-222: Method Documentation Looks Good

The documentation for the report_failure method is clear and detailed.

Also applies to: 223-228

239-248: Method Implementation Looks Good

The failure method correctly calls report_failure and logs the error.

249-266: Method Documentation Looks Good

The documentation for the report_info method is clear and detailed.

Also applies to: 267-272

282-289: Method Implementation Looks Good

The info method correctly calls report_info and logs the info.

295-302: Method Implementation Looks Good

The as_obj method correctly materializes the properties for the report object.

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (5)

422-424: Structured logging improvements look good.

The changes to add structured logging for errors when loading included files provide more context and improve error reporting.

525-534: Structured logging improvements look good.

The changes to add structured logging for failures when resolving includes provide more context and improve error reporting.

686-690: Structured logging improvements look good.

The changes to add structured logging for failures when loading view files provide more context and improve error reporting.

708-712: Structured logging improvements look good.

The changes to add structured logging for failures when loading view files provide more context and improve error reporting.

1369-1371: Structured logging improvements look good.

The changes to add structured logging for failures when parsing SQL provide more context and improve error reporting.

coderabbitai · 2024-07-02T23:59:59Z

metadata-ingestion/src/datahub/ingestion/source/openapi.py

+    def report_bad_responses(self, status_code: int, type: str) -> None:
        if status_code == 400:
            self.report.report_warning(
-                key=key, reason="Unknown error for reaching endpoint"
+                title=type,
+                message="Bad request body when retrieving data from endpoint",
            )
        elif status_code == 403:
-            self.report.report_warning(key=key, reason="Not authorised to get endpoint")
+            self.report.report_warning(
+                title=type,
+                message="Not authorised to retrieve data from OpenAPI endpoint",
+            )
        elif status_code == 404:
            self.report.report_warning(
-                key=key,
-                reason="Unable to find an example for endpoint. Please add it to the list of forced examples.",
+                title=type,
+                message="Unable to find an example for endpoint. Please add it to the list of forced examples.",
            )
        elif status_code == 500:
            self.report.report_warning(
-                key=key, reason="Server error for reaching endpoint"
+                title=type, message="Server error for reaching endpoint"
            )
        elif status_code == 504:
-            self.report.report_warning(key=key, reason="Timeout for reaching endpoint")
+            self.report.report_warning(
+                title=type, message="Timeout for reaching endpoint"
+            )
        else:
            raise Exception(
-                f"Unable to retrieve endpoint, response code {status_code}, key {key}"
+                f"Unable to retrieve endpoint, response code {status_code}, key {type}"


Refactor Method for Consistency and Readability

The report_bad_responses method could be refactored to reduce redundancy and improve readability.

def report_bad_responses(self, status_code: int, type: str) -> None: messages = { } message = messages.get(status_code) if message: self.report.report_warning(title=type, message=message) else: raise Exception(f"Unable to retrieve endpoint, response code {status_code}, key {type}")

coderabbitai · 2024-07-03T00:00:00Z

metadata-ingestion/src/datahub/ingestion/source/openapi.py

                    schema_metadata = set_metadata(dataset_name, fields2add)
                    dataset_snapshot.aspects.append(schema_metadata)

                    yield self.build_wu(dataset_snapshot, dataset_name)
                else:
-                    self.report_bad_responses(response.status_code, key=endpoint_k)
+                    self.report_bad_responses(response.status_code, type=endpoint_k)
            else:
                if endpoint_k not in config.forced_examples.keys():


Simplify Dictionary Key Check

Use key not in dict instead of key not in dict.keys().

if endpoint_k not in config.forced_examples:

Tools

Ruff

341-341: Use key not in dict instead of key not in dict.keys()

Remove .keys()

(SIM118)

…b.com/acryldata/datahub-fork into jj--add-structured-logging-to-ingestion

…estion' into jj--add-structured-logging-to-ingestion

coderabbitai

Actionable comments posted: 0

Outside diff range and nitpick comments (2)

metadata-ingestion/tests/integration/mode/test_mode.py (2)
Line range hint 66-66: Rename the test function to reflect Metabase ingestion.

The function name should be updated to test_metabase_ingest_success to reflect Metabase ingestion.
- def test_mode_ingest_success(pytestconfig, tmp_path):
+ def test_metabase_ingest_success(pytestconfig, tmp_path):
Line range hint 163-163: Rename the test function to reflect Metabase ingestion.

The function name should be updated to test_metabase_ingest_failure to reflect Metabase ingestion.
- def test_mode_ingest_failure(pytestconfig, tmp_path):
+ def test_metabase_ingest_failure(pytestconfig, tmp_path):

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between d72da4a and d6650a2.

Files selected for processing (3)

metadata-ingestion/src/datahub/ingestion/source/mode.py (15 hunks)
metadata-ingestion/tests/integration/metabase/test_metabase.py (2 hunks)
metadata-ingestion/tests/integration/mode/test_mode.py (1 hunks)

Files skipped from review as they are similar to previous changes (1)

metadata-ingestion/src/datahub/ingestion/source/mode.py

Additional comments not posted (2)

metadata-ingestion/tests/integration/metabase/test_metabase.py (2)

166-166: LGTM!

The function name and logic correctly reflect Metabase ingestion.

263-263: LGTM!

The function name and logic correctly reflect Metabase ingestion.

hsheth2

Needs some major cleanup, but approving to unblock

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

metadata-ingestion/src/datahub/ingestion/source/mode.py

metadata-ingestion/src/datahub/ingestion/source/mongodb.py

metadata-ingestion/src/datahub/ingestion/source/openapi.py

metadata-ingestion/src/datahub/ingestion/source/tableau.py

…estion' into jj--add-structured-logging-to-ingestion

coderabbitai

Actionable comments posted: 19

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between d6650a2 and aabe311.

Files selected for processing (5)

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py (13 hunks)
metadata-ingestion/src/datahub/ingestion/source/mode.py (15 hunks)
metadata-ingestion/src/datahub/ingestion/source/mongodb.py (3 hunks)
metadata-ingestion/src/datahub/ingestion/source/openapi.py (6 hunks)
metadata-ingestion/src/datahub/ingestion/source/tableau.py (3 hunks)

Files skipped from review as they are similar to previous changes (3)

metadata-ingestion/src/datahub/ingestion/source/mode.py
metadata-ingestion/src/datahub/ingestion/source/mongodb.py
metadata-ingestion/src/datahub/ingestion/source/tableau.py

Additional context used

Ruff

metadata-ingestion/src/datahub/ingestion/source/openapi.py

347-347: Use key not in dict instead of key not in dict.keys()

Remove .keys()

(SIM118)

Additional comments not posted (1)

metadata-ingestion/src/datahub/ingestion/source/openapi.py (1)

Line range hint 413-413: LGTM!

The get_report function is a simple getter and aligns with the new structured logging format.

coderabbitai · 2024-07-03T21:30:21Z

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

+                title="Error Parsing SQL",
+                message="Failed to parse sql query, lineage will not be accurate.",
+                context=f"Table Name: {sql_table_name}, Query: {sql_query}",
+                exc=e,


Use consistent structured logging format.

The report_warning method should be used consistently with the new structured logging format.

- reporter.report_warning( - title="Error Parsing SQL", - message="Failed to parse sql query, lineage will not be accurate.", - context=f"Table Name: {sql_table_name}, Query: {sql_query}", - exc=e, + reporter.report_warning( + structured_log=StructuredLog( + level=StructuredLogLevel.WARN, + title="Error Parsing SQL", + message="Failed to parse sql query, lineage will not be accurate.", + context=f"Table Name: {sql_table_name}, Query: {sql_query}", + exc=e, + ) )

Committable suggestion was skipped due to low confidence.

coderabbitai · 2024-07-03T21:30:21Z

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

+                    title="Failed to Load Connection",
+                    message="Failed to load connection. Check your API key permissions and/or connection_to_platform_map configuration.",
+                    context=f"Connection: {model.connection}",


Use consistent structured logging format.

The report_warning method should be used consistently with the new structured logging format.

- self.reporter.report_warning( - title="Failed to Load Connection", - message="Failed to load connection. Check your API key permissions and/or connection_to_platform_map configuration.", - context=f"Connection: {model.connection}", + self.reporter.report_warning( + structured_log=StructuredLog( + level=StructuredLogLevel.WARN, + title="Failed to Load Connection", + message="Failed to load connection. Check your API key permissions and/or connection_to_platform_map configuration.", + context=f"Connection: {model.connection}", + ) )

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

title="Failed to Load Connection",

message="Failed to load connection. Check your API key permissions and/or connection_to_platform_map configuration.",

context=f"Connection: {model.connection}",

self.reporter.report_warning(

structured_log=StructuredLog(

level=StructuredLogLevel.WARN,

title="Failed to Load Connection",

message="Failed to load connection. Check your API key permissions and/or connection_to_platform_map configuration.",

context=f"Connection: {model.connection}",

)

)

coderabbitai · 2024-07-03T21:30:21Z

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

+                            title="Failed to process explores",
+                            message="Failed to process explore dictionary.",
+                            context=f"Explore Details: {explore_dict}",
+                            exc=e,


Use consistent structured logging format.

The report_warning method should be used consistently with the new structured logging format.

- self.reporter.report_warning( - title="Failed to process explores", - message="Failed to process explore dictionary.", - context=f"Explore Details: {explore_dict}", - exc=e, + self.reporter.report_warning( + structured_log=StructuredLog( + level=StructuredLogLevel.WARN, + title="Failed to process explores", + message="Failed to process explore dictionary.", + context=f"Explore Details: {explore_dict}", + exc=e, + ) )

Committable suggestion was skipped due to low confidence.

coderabbitai · 2024-07-03T21:30:21Z

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

+            title="Malformed Table Name",
+            message="Table name has more than 3 parts.",
+            context=f"Table Name: {sql_table_name}",


Use consistent structured logging format.

The report_warning method should be used consistently with the new structured logging format.

- self.reporter.report_warning( - title="Malformed Table Name", - message="Table name has more than 3 parts.", - context=f"Table Name: {sql_table_name}", + self.reporter.report_warning( + structured_log=StructuredLog( + level=StructuredLogLevel.WARN, + title="Malformed Table Name", + message="Table name has more than 3 parts.", + context=f"Table Name: {sql_table_name}", + ) )

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

title="Malformed Table Name",

message="Table name has more than 3 parts.",

context=f"Table Name: {sql_table_name}",

self.reporter.report_warning(

structured_log=StructuredLog(

level=StructuredLogLevel.WARN,

title="Malformed Table Name",

message="Table name has more than 3 parts.",

context=f"Table Name: {sql_table_name}",

)

)

coderabbitai · 2024-07-03T21:30:21Z

metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py

+            self.reporter.report_failure(
+                message="Failed to parse view file",
+                context=f"Path: {path}",
+                exc=e,
+            )


Use consistent structured logging format.

The report_failure method should be used consistently with the new structured logging format.

- self.reporter.report_failure( - message="Failed to parse view file", - context=f"Path: {path}", - exc=e, + self.reporter.report_failure( + structured_log=StructuredLog( + level=StructuredLogLevel.ERROR, + message="Failed to parse view file", + context=f"Path: {path}", + exc=e, + ) )

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

self.reporter.report_failure(

message="Failed to parse view file",

context=f"Path: {path}",

exc=e,

)

self.reporter.report_failure(

structured_log=StructuredLog(

level=StructuredLogLevel.ERROR,

message="Failed to parse view file",

context=f"Path: {path}",

exc=e,

)

)

coderabbitai · 2024-07-03T21:30:22Z

metadata-ingestion/src/datahub/ingestion/source/openapi.py

+    def report_bad_responses(self, status_code: int, type: str) -> None:
        if status_code == 400:
            self.report.report_warning(
-                key=key, reason="Unknown error for reaching endpoint"
+                title="Failed to Extract Metadata",
+                message="Bad request body when retrieving data from OpenAPI endpoint",
+                context=f"Endpoint Type: {type}, Status Code: {status_code}",
            )
        elif status_code == 403:
-            self.report.report_warning(key=key, reason="Not authorised to get endpoint")
+            self.report.report_warning(
+                title="Unauthorized to Extract Metadata",
+                message="Received unauthorized response when attempting to retrieve data from OpenAPI endpoint",
+                context=f"Endpoint Type: {type}, Status Code: {status_code}",
+            )
        elif status_code == 404:
            self.report.report_warning(
-                key=key,
-                reason="Unable to find an example for endpoint. Please add it to the list of forced examples.",
+                title="Failed to Extract Metadata",
+                message="Unable to find an example for endpoint. Please add it to the list of forced examples.",
+                context=f"Endpoint Type: {type}, Status Code: {status_code}",
            )
        elif status_code == 500:
            self.report.report_warning(
-                key=key, reason="Server error for reaching endpoint"
+                title="Failed to Extract Metadata",
+                message="Received unknown server error from OpenAPI endpoint",
+                context=f"Endpoint Type: {type}, Status Code: {status_code}",
            )
        elif status_code == 504:
-            self.report.report_warning(key=key, reason="Timeout for reaching endpoint")
+            self.report.report_warning(
+                title="Failed to Extract Metadata",
+                message="Timed out when attempting to retrieve data from OpenAPI endpoint",
+                context=f"Endpoint Type: {type}, Status Code: {status_code}",
+            )
        else:
            raise Exception(
-                f"Unable to retrieve endpoint, response code {status_code}, key {key}"
+                f"Unable to retrieve endpoint, response code {status_code}, key {type}"


Refactor Method for Consistency and Readability

The report_bad_responses method could be refactored to reduce redundancy and improve readability.

def report_bad_responses(self, status_code: int, type: str) -> None: messages = { } if status_code in messages: title, message = messages[status_code] self.report.report_warning( title=title, message=message, context=f"Endpoint Type: {type}, Status Code: {status_code}", ) else: raise Exception( f"Unable to retrieve endpoint, response code {status_code}, key {type}" )

coderabbitai · 2024-07-03T21:30:23Z

metadata-ingestion/src/datahub/ingestion/source/openapi.py

+                            self.report.info(
+                                message="No fields found from endpoint response.",
+                                context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}",


Use Structured Logging for Infos

The info message should include a title for consistency with other structured logs.

- self.report.info( - message="No fields found from endpoint response.", - context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}", - ) + self.report.info( + title="Info", + message="No fields found from endpoint response.", + context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}", + )

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

self.report.info(

message="No fields found from endpoint response.",

context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}",

self.report.info(

title="Info",

message="No fields found from endpoint response.",

context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}",

coderabbitai · 2024-07-03T21:30:23Z

metadata-ingestion/src/datahub/ingestion/source/openapi.py

+                    title="Failed to Extract Endpoint Metadata",
+                    message=f"No example provided for {endpoint_dets['method']}",
+                    context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}",


Use Structured Logging for Warnings

The warning message should include a title for consistency with other structured logs.

- self.report.report_warning( - message=f"No example provided for {endpoint_dets['method']}", - context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}", - ) + self.report.report_warning( + title="Failed to Extract Endpoint Metadata", + message=f"No example provided for {endpoint_dets['method']}", + context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}", + )

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

title="Failed to Extract Endpoint Metadata",

message=f"No example provided for {endpoint_dets['method']}",

context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}",

self.report.report_warning(

title="Failed to Extract Endpoint Metadata",

message=f"No example provided for {endpoint_dets['method']}",

context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}",

)

coderabbitai · 2024-07-03T21:30:23Z

metadata-ingestion/src/datahub/ingestion/source/openapi.py

@@ -271,7 +284,7 @@
            for w in warn_c:
                w_msg = w.message
                w_spl = w_msg.args[0].split(" --- ")  # type: ignore
-                self.report.report_warning(key=w_spl[1], reason=w_spl[0])
+                self.report.report_warning(message=w_spl[1], context=w_spl[0])


Use Structured Logging for Warnings

The warning message should include a title for consistency with other structured logs.

- self.report.report_warning(message=w_spl[1], context=w_spl[0]) + self.report.report_warning(title="Warning", message=w_spl[1], context=w_spl[0])

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

self.report.report_warning(message=w_spl[1], context=w_spl[0])

self.report.report_warning(title="Warning", message=w_spl[1], context=w_spl[0])

coderabbitai · 2024-07-03T21:30:23Z

metadata-ingestion/src/datahub/ingestion/source/openapi.py

+                        self.report.info(
+                            message="No fields found from endpoint response.",
+                            context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}",
+                        )


Use Structured Logging for Infos

The info message should include a title for consistency with other structured logs.

- self.report.info( - message="No fields found from endpoint response.", - context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}", - ) + self.report.info( + title="Info", + message="No fields found from endpoint response.", + context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}", + )

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

self.report.info(

message="No fields found from endpoint response.",

context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}",

)

self.report.info(

title="Info",

message="No fields found from endpoint response.",

context=f"Endpoint Type: {endpoint_k}, Name: {dataset_name}",

)

…estion' into jj--add-structured-logging-to-ingestion

coderabbitai

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between aabe311 and 01d0263.

Files selected for processing (1)

datahub-web-react/src/app/ingest/source/utils.ts (4 hunks)

Additional comments not posted (7)

datahub-web-react/src/app/ingest/source/utils.ts (7)

16-16: Import statement is correct.

The import statement correctly imports StructuredReport, StructuredReportLogEntry, and StructuredReportItemLevel from the types module. These imports are necessary for the structured logging functionality.

143-143: Function correctly handles the new type.

The createStructuredReport function correctly handles StructuredReportLogEntry[] and accurately counts the number of errors, warnings, and infos.

160-160: Function and helper functions correctly handle legacy and new structured report formats.

The transformToStructuredReport function and its helper functions correctly map legacy and new structured report formats to StructuredReportLogEntry[]. The use of try-catch ensures that any errors during transformation are caught and logged.

224-224: Function correctly extracts and transforms the structured report.

The getStructuredReport function correctly extracts the serialized structured report, parses it into a JSON object, and transforms it using the transformToStructuredReport function.

Line range hint 239-239:
Function correctly determines the ingestion source status.

The getIngestionSourceStatus function correctly determines the status based on the structured report and the presence of warnings. The logic to map SUCCESS to SUCCEEDED_WITH_WARNINGS is appropriate.

164-164: Helper function correctly maps legacy item objects.

The mapItemObject helper function correctly maps legacy item objects to StructuredReportLogEntry[]. The use of Object.entries and mapping to the new structure is appropriate.

174-174: Helper function correctly maps new item arrays.

The mapItemArray helper function correctly maps new item arrays to StructuredReportLogEntry[]. The function handles edge cases, such as items being strings, appropriately by returning null.

…estion' into jj--add-structured-logging-to-ingestion

coderabbitai

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 01d0263 and 3ac4b42.

Files selected for processing (3)

datahub-web-react/src/app/ingest/source/executions/reporting/StructuredReportItem.tsx (2 hunks)
datahub-web-react/src/app/ingest/source/executions/reporting/StructuredReportItemContext.tsx (2 hunks)
datahub-web-react/src/app/ingest/source/executions/reporting/StructuredReportItemList.tsx (2 hunks)

Additional comments not posted (6)

datahub-web-react/src/app/ingest/source/executions/reporting/StructuredReportItemContext.tsx (2)

6-6: Update import to use StructuredReportLogEntry.

The import statement has been updated to use StructuredReportLogEntry, which aligns with the new data structure.

29-29: Update Props interface to use StructuredReportLogEntry.

The Props interface has been updated to use StructuredReportLogEntry instead of StructuredReportItem. This change is consistent with the overall refactoring.

datahub-web-react/src/app/ingest/source/executions/reporting/StructuredReportItemList.tsx (2)

5-5: Update import to use StructuredReportLogEntry.

The import statement has been updated to use StructuredReportLogEntry, which aligns with the new data structure.

14-14: Update Props interface to use StructuredReportLogEntry[].

The Props interface has been updated to use StructuredReportLogEntry[] instead of StructuredReportItemType[]. This change is consistent with the overall refactoring.

datahub-web-react/src/app/ingest/source/executions/reporting/StructuredReportItem.tsx (2)

8-8: Update import to use StructuredReportLogEntry.

The import statement has been updated to use StructuredReportLogEntry, which aligns with the new data structure.

54-54: Update Props interface to use StructuredReportLogEntry.

The Props interface has been updated to use StructuredReportLogEntry instead of StructuredReportItem. This change is consistent with the overall refactoring.

hsheth2 · 2024-07-04T00:16:58Z

@jjoyce0510 CI is still red src/app/ingest/source/executions/reporting/StructuredReportItemList.tsx(29,91): error TS2339: Property 'rawType' does not exist on type 'StructuredReportLogEntry'.

…estion' into jj--add-structured-logging-to-ingestion

coderabbitai

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 3ac4b42 and 8fc1b39.

Files selected for processing (1)

datahub-web-react/src/app/ingest/source/executions/reporting/StructuredReportItemList.tsx (3 hunks)

Files skipped from review as they are similar to previous changes (1)

datahub-web-react/src/app/ingest/source/executions/reporting/StructuredReportItemList.tsx

…ngs, and failures structured reporting to UI (#10828) Co-authored-by: John Joyce <[email protected]> Co-authored-by: Harshal Sheth <[email protected]>

…ngs, and failures structured reporting to UI (datahub-project#10828) Co-authored-by: John Joyce <[email protected]> Co-authored-by: Harshal Sheth <[email protected]>

* feat(forms) Handle deleting forms references when hard deleting forms (datahub-project#10820) * refactor(ui): Misc improvements to the setup ingestion flow (ingest uplift 1/2) (datahub-project#10764) Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]> * fix(ingestion/airflow-plugin): pipeline tasks discoverable in search (datahub-project#10819) * feat(ingest/transformer): tags to terms transformer (datahub-project#10758) Co-authored-by: Aseem Bansal <[email protected]> * fix(ingestion/unity-catalog): fixed issue with profiling with GE turned on (datahub-project#10752) Co-authored-by: Aseem Bansal <[email protected]> * feat(forms) Add java SDK for form entity PATCH + CRUD examples (datahub-project#10822) * feat(SDK) Add java SDK for structuredProperty entity PATCH + CRUD examples (datahub-project#10823) * feat(SDK) Add StructuredPropertyPatchBuilder in python sdk and provide sample CRUD files (datahub-project#10824) * feat(forms) Add CRUD endpoints to GraphQL for Form entities (datahub-project#10825) * add flag for includeSoftDeleted in scroll entities API (datahub-project#10831) * feat(deprecation) Return actor entity with deprecation aspect (datahub-project#10832) * feat(structuredProperties) Add CRUD graphql APIs for structured property entities (datahub-project#10826) * add scroll parameters to openapi v3 spec (datahub-project#10833) * fix(ingest): correct profile_day_of_week implementation (datahub-project#10818) * feat(ingest/glue): allow ingestion of empty databases from Glue (datahub-project#10666) Co-authored-by: Harshal Sheth <[email protected]> * feat(cli): add more details to get cli (datahub-project#10815) * fix(ingestion/glue): ensure date formatting works on all platforms for aws glue (datahub-project#10836) * fix(ingestion): fix datajob patcher (datahub-project#10827) * fix(smoke-test): add suffix in temp file creation (datahub-project#10841) * feat(ingest/glue): add helper method to permit user or group ownership (datahub-project#10784) * feat(): Show data platform instances in policy modal if they are set on the policy (datahub-project#10645) Co-authored-by: Hendrik Richert <[email protected]> * docs(patch): add patch documentation for how implementation works (datahub-project#10010) Co-authored-by: John Joyce <[email protected]> * fix(jar): add missing custom-plugin-jar task (datahub-project#10847) * fix(): also check exceptions/stack trace when filtering log messages (datahub-project#10391) Co-authored-by: John Joyce <[email protected]> * docs(): Update posts.md (datahub-project#9893) Co-authored-by: Hyejin Yoon <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * chore(ingest): update acryl-datahub-classify version (datahub-project#10844) * refactor(ingest): Refactor structured logging to support infos, warnings, and failures structured reporting to UI (datahub-project#10828) Co-authored-by: John Joyce <[email protected]> Co-authored-by: Harshal Sheth <[email protected]> * fix(restli): log aspect-not-found as a warning rather than as an error (datahub-project#10834) * fix(ingest/nifi): remove duplicate upstream jobs (datahub-project#10849) * fix(smoke-test): test access to create/revoke personal access tokens (datahub-project#10848) * fix(smoke-test): missing test for move domain (datahub-project#10837) * ci: update usernames to not considered for community (datahub-project#10851) * env: change defaults for data contract visibility (datahub-project#10854) * fix(ingest/tableau): quote special characters in external URL (datahub-project#10842) * fix(smoke-test): fix flakiness of auto complete test * ci(ingest): pin dask dependency for feast (datahub-project#10865) * fix(ingestion/lookml): liquid template resolution and view-to-view cll (datahub-project#10542) * feat(ingest/audit): add client id and version in system metadata props (datahub-project#10829) * chore(ingest): Mypy 1.10.1 pin (datahub-project#10867) * docs: use acryl-datahub-actions as expected python package to install (datahub-project#10852) * docs: add new js snippet (datahub-project#10846) * refactor(ingestion): remove company domain for security reason (datahub-project#10839) * fix(ingestion/spark): Platform instance and column level lineage fix (datahub-project#10843) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat(ingestion/tableau): optionally ingest multiple sites and create site containers (datahub-project#10498) Co-authored-by: Yanik Häni <[email protected]> * fix(ingestion/looker): Add sqlglot dependency and remove unused sqlparser (datahub-project#10874) * fix(manage-tokens): fix manage access token policy (datahub-project#10853) * Batch get entity endpoints (datahub-project#10880) * feat(system): support conditional write semantics (datahub-project#10868) * fix(build): upgrade vercel builds to Node 20.x (datahub-project#10890) * feat(ingest/lookml): shallow clone repos (datahub-project#10888) * fix(ingest/looker): add missing dependency (datahub-project#10876) * fix(ingest): only populate audit stamps where accurate (datahub-project#10604) * fix(ingest/dbt): always encode tag urns (datahub-project#10799) * fix(ingest/redshift): handle multiline alter table commands (datahub-project#10727) * fix(ingestion/looker): column name missing in explore (datahub-project#10892) * fix(lineage) Fix lineage source/dest filtering with explored per hop limit (datahub-project#10879) * feat(conditional-writes): misc updates and fixes (datahub-project#10901) * feat(ci): update outdated action (datahub-project#10899) * feat(rest-emitter): adding async flag to rest emitter (datahub-project#10902) Co-authored-by: Gabe Lyons <[email protected]> * feat(ingest): add snowflake-queries source (datahub-project#10835) * fix(ingest): improve `auto_materialize_referenced_tags_terms` error handling (datahub-project#10906) * docs: add new company to adoption list (datahub-project#10909) * refactor(redshift): Improve redshift error handling with new structured reporting system (datahub-project#10870) Co-authored-by: John Joyce <[email protected]> Co-authored-by: Harshal Sheth <[email protected]> * feat(ui) Finalize support for all entity types on forms (datahub-project#10915) * Index ExecutionRequestResults status field (datahub-project#10811) * feat(ingest): grafana connector (datahub-project#10891) Co-authored-by: Shirshanka Das <[email protected]> Co-authored-by: Harshal Sheth <[email protected]> * fix(gms) Add Form entity type to EntityTypeMapper (datahub-project#10916) * feat(dataset): add support for external url in Dataset (datahub-project#10877) * docs(saas-overview) added missing features to observe section (datahub-project#10913) Co-authored-by: John Joyce <[email protected]> * fix(ingest/spark): Fixing Micrometer warning (datahub-project#10882) * fix(structured properties): allow application of structured properties without schema file (datahub-project#10918) * fix(data-contracts-web) handle other schedule types (datahub-project#10919) * fix(ingestion/tableau): human-readable message for PERMISSIONS_MODE_SWITCHED error (datahub-project#10866) Co-authored-by: Harshal Sheth <[email protected]> * Add feature flag for view defintions (datahub-project#10914) Co-authored-by: Ethan Cartwright <[email protected]> * feat(ingest/BigQuery): refactor+parallelize dataset metadata extraction (datahub-project#10884) * fix(airflow): add error handling around render_template() (datahub-project#10907) * feat(ingestion/sqlglot): add optional `default_dialect` parameter to sqlglot lineage (datahub-project#10830) * feat(mcp-mutator): new mcp mutator plugin (datahub-project#10904) * fix(ingest/bigquery): changes helper function to decode unicode scape sequences (datahub-project#10845) * feat(ingest/postgres): fetch table sizes for profile (datahub-project#10864) * feat(ingest/abs): Adding azure blob storage ingestion source (datahub-project#10813) * fix(ingest/redshift): reduce severity of SQL parsing issues (datahub-project#10924) * fix(build): fix lint fix web react (datahub-project#10896) * fix(ingest/bigquery): handle quota exceeded for project.list requests (datahub-project#10912) * feat(ingest): report extractor failures more loudly (datahub-project#10908) * feat(ingest/snowflake): integrate snowflake-queries into main source (datahub-project#10905) * fix(ingest): fix docs build (datahub-project#10926) * fix(ingest/snowflake): fix test connection (datahub-project#10927) * fix(ingest/lookml): add view load failures to cache (datahub-project#10923) * docs(slack) overhauled setup instructions and screenshots (datahub-project#10922) Co-authored-by: John Joyce <[email protected]> * fix(airflow): Add comma parsing of owners to DataJobs (datahub-project#10903) * fix(entityservice): fix merging sideeffects (datahub-project#10937) * feat(ingest): Support System Ingestion Sources, Show and hide system ingestion sources with Command-S (datahub-project#10938) Co-authored-by: John Joyce <[email protected]> * chore() Set a default lineage filtering end time on backend when a start time is present (datahub-project#10925) Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]> * Added relationships APIs to V3. Added these generic APIs to V3 swagger doc. (datahub-project#10939) * docs: add learning center to docs (datahub-project#10921) * doc: Update hubspot form id (datahub-project#10943) * chore(airflow): add python 3.11 w/ Airflow 2.9 to CI (datahub-project#10941) * fix(ingest/Glue): column upstream lineage between S3 and Glue (datahub-project#10895) * fix(ingest/abs): split abs utils into multiple files (datahub-project#10945) * doc(ingest/looker): fix doc for sql parsing documentation (datahub-project#10883) Co-authored-by: Harshal Sheth <[email protected]> * fix(ingest/bigquery): Adding missing BigQuery types (datahub-project#10950) * fix(ingest/setup): feast and abs source setup (datahub-project#10951) * fix(connections) Harden adding /gms to connections in backend (datahub-project#10942) * feat(siblings) Add flag to prevent combining siblings in the UI (datahub-project#10952) * fix(docs): make graphql doc gen more automated (datahub-project#10953) * feat(ingest/athena): Add option for Athena partitioned profiling (datahub-project#10723) * fix(spark-lineage): default timeout for future responses (datahub-project#10947) * feat(datajob/flow): add environment filter using info aspects (datahub-project#10814) * fix(ui/ingest): correct privilege used to show tab (datahub-project#10483) Co-authored-by: Kunal-kankriya <[email protected]> * feat(ingest/looker): include dashboard urns in browse v2 (datahub-project#10955) * add a structured type to batchGet in OpenAPI V3 spec (datahub-project#10956) * fix(ui): scroll on the domain sidebar to show all domains (datahub-project#10966) * fix(ingest/sagemaker): resolve incorrect variable assignment for SageMaker API call (datahub-project#10965) * fix(airflow/build): Pinning mypy (datahub-project#10972) * Fixed a bug where the OpenAPI V3 spec was incorrect. The bug was introduced in datahub-project#10939. (datahub-project#10974) * fix(ingest/test): Fix for mssql integration tests (datahub-project#10978) * fix(entity-service) exist check correctly extracts status (datahub-project#10973) * fix(structuredProps) casing bug in StructuredPropertiesValidator (datahub-project#10982) * bugfix: use anyOf instead of allOf when creating references in openapi v3 spec (datahub-project#10986) * fix(ui): Remove ant less imports (datahub-project#10988) * feat(ingest/graph): Add get_results_by_filter to DataHubGraph (datahub-project#10987) * feat(ingest/cli): init does not actually support environment variables (datahub-project#10989) * fix(ingest/graph): Update get_results_by_filter graphql query (datahub-project#10991) * feat(ingest/spark): Promote beta plugin (datahub-project#10881) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat(ingest): support domains in meta -> "datahub" section (datahub-project#10967) * feat(ingest): add `check server-config` command (datahub-project#10990) * feat(cli): Make consistent use of DataHubGraphClientConfig (datahub-project#10466) Deprecates get_url_and_token() in favor of a more complete option: load_graph_config() that returns a full DatahubClientConfig. This change was then propagated across previous usages of get_url_and_token so that connections to DataHub server from the client respect the full breadth of configuration specified by DatahubClientConfig. I.e: You can now specify disable_ssl_verification: true in your ~/.datahubenv file so that all cli functions to the server work when ssl certification is disabled. Fixes datahub-project#9705 * fix(ingest/s3): Fixing container creation when there is no folder in path (datahub-project#10993) * fix(ingest/looker): support platform instance for dashboards & charts (datahub-project#10771) * feat(ingest/bigquery): improve handling of information schema in sql parser (datahub-project#10985) * feat(ingest): improve `ingest deploy` command (datahub-project#10944) * fix(backend): allow excluding soft-deleted entities in relationship-queries; exclude soft-deleted members of groups (datahub-project#10920) - allow excluding soft-deleted entities in relationship-queries - exclude soft-deleted members of groups * fix(ingest/looker): downgrade missing chart type log level (datahub-project#10996) * doc(acryl-cloud): release docs for 0.3.4.x (datahub-project#10984) Co-authored-by: John Joyce <[email protected]> Co-authored-by: RyanHolstien <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Pedro Silva <[email protected]> * fix(protobuf/build): Fix protobuf check jar script (datahub-project#11006) * fix(ui/ingest): Support invalid cron jobs (datahub-project#10998) * fix(ingest): fix graph config loading (datahub-project#11002) Co-authored-by: Pedro Silva <[email protected]> * feat(docs): Document __DATAHUB_TO_FILE_ directive (datahub-project#10968) Co-authored-by: Harshal Sheth <[email protected]> * fix(graphql/upsertIngestionSource): Validate cron schedule; parse error in CLI (datahub-project#11011) * feat(ece): support custom ownership type urns in ECE generation (datahub-project#10999) * feat(assertion-v2): changed Validation tab to Quality and created new Governance tab (datahub-project#10935) * fix(ingestion/glue): Add support for missing config options for profiling in Glue (datahub-project#10858) * feat(propagation): Add models for schema field docs, tags, terms (datahub-project#2959) (datahub-project#11016) Co-authored-by: Chris Collins <[email protected]> * docs: standardize terminology to DataHub Cloud (datahub-project#11003) * fix(ingestion/transformer): replace the externalUrl container (datahub-project#11013) * docs(slack) troubleshoot docs (datahub-project#11014) * feat(propagation): Add graphql API (datahub-project#11030) Co-authored-by: Chris Collins <[email protected]> * feat(propagation): Add models for Action feature settings (datahub-project#11029) * docs(custom properties): Remove duplicate from sidebar (datahub-project#11033) * feat(models): Introducing Dataset Partitions Aspect (datahub-project#10997) Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]> * feat(propagation): Add Documentation Propagation Settings (datahub-project#11038) * fix(models): chart schema fields mapping, add dataHubAction entity, t… (datahub-project#11040) * fix(ci): smoke test lint failures (datahub-project#11044) * docs: fix learning center color scheme & typo (datahub-project#11043) * feat: add cloud main page (datahub-project#11017) Co-authored-by: Jay <[email protected]> * feat(restore-indices): add additional step to also clear system metadata service (datahub-project#10662) Co-authored-by: John Joyce <[email protected]> * docs: fix typo (datahub-project#11046) * fix(lint): apply spotless (datahub-project#11050) * docs(airflow): example query to get datajobs for a dataflow (datahub-project#11034) * feat(cli): Add run-id option to put sub-command (datahub-project#11023) Adds an option to assign run-id to a given put command execution. This is useful when transformers do not exist for a given ingestion payload, we can follow up with custom metadata and assign it to an ingestion pipeline. * fix(ingest): improve sql error reporting calls (datahub-project#11025) * fix(airflow): fix CI setup (datahub-project#11031) * feat(ingest/dbt): add experimental `prefer_sql_parser_lineage` flag (datahub-project#11039) * fix(ingestion/lookml): enable stack-trace in lookml logs (datahub-project#10971) * (chore): Linting fix (datahub-project#11015) * chore(ci): update deprecated github actions (datahub-project#10977) * Fix ALB configuration example (datahub-project#10981) * chore(ingestion-base): bump base image packages (datahub-project#11053) * feat(cli): Trim report of dataHubExecutionRequestResult to max GMS size (datahub-project#11051) * fix(ingestion/lookml): emit dummy sql condition for lookml custom condition tag (datahub-project#11008) Co-authored-by: Harshal Sheth <[email protected]> * fix(ingestion/powerbi): fix issue with broken report lineage (datahub-project#10910) * feat(ingest/tableau): add retry on timeout (datahub-project#10995) * change generate kafka connect properties from env (datahub-project#10545) Co-authored-by: david-leifker <[email protected]> * fix(ingest): fix oracle cronjob ingestion (datahub-project#11001) Co-authored-by: david-leifker <[email protected]> * chore(ci): revert update deprecated github actions (datahub-project#10977) (datahub-project#11062) * feat(ingest/dbt-cloud): update metadata_endpoint inference (datahub-project#11041) * build: Reduce size of datahub-frontend-react image by 50-ish% (datahub-project#10878) Co-authored-by: david-leifker <[email protected]> * fix(ci): Fix lint issue in datahub_ingestion_run_summary_provider.py (datahub-project#11063) * docs(ingest): update developing-a-transformer.md (datahub-project#11019) * feat(search-test): update search tests from datahub-project#10408 (datahub-project#11056) * feat(cli): add aspects parameter to DataHubGraph.get_entity_semityped (datahub-project#11009) Co-authored-by: Harshal Sheth <[email protected]> * docs(airflow): update min version for plugin v2 (datahub-project#11065) * doc(ingestion/tableau): doc update for derived permission (datahub-project#11054) Co-authored-by: Pedro Silva <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Harshal Sheth <[email protected]> * fix(py): remove dep on types-pkg_resources (datahub-project#11076) * feat(ingest/mode): add option to exclude restricted (datahub-project#11081) * fix(ingest): set lastObserved in sdk when unset (datahub-project#11071) * doc(ingest): Update capabilities (datahub-project#11072) * chore(vulnerability): Log Injection (datahub-project#11090) * chore(vulnerability): Information exposure through a stack trace (datahub-project#11091) * chore(vulnerability): Comparison of narrow type with wide type in loop condition (datahub-project#11089) * chore(vulnerability): Insertion of sensitive information into log files (datahub-project#11088) * chore(vulnerability): Risky Cryptographic Algorithm (datahub-project#11059) * chore(vulnerability): Overly permissive regex range (datahub-project#11061) Co-authored-by: Harshal Sheth <[email protected]> * fix: update customer data (datahub-project#11075) * fix(models): fixing the datasetPartition models (datahub-project#11085) Co-authored-by: John Joyce <[email protected]> * fix(ui): Adding view, forms GraphQL query, remove showing a fallback error message on unhandled GraphQL error (datahub-project#11084) Co-authored-by: John Joyce <[email protected]> * feat(docs-site): hiding learn more from cloud page (datahub-project#11097) * fix(docs): Add correct usage of orFilters in search API docs (datahub-project#11082) Co-authored-by: Jay <[email protected]> * fix(ingest/mode): Regexp in mode name matcher didn't allow underscore (datahub-project#11098) * docs: Refactor customer stories section (datahub-project#10869) Co-authored-by: Jeff Merrick <[email protected]> * fix(release): fix full/slim suffix on tag (datahub-project#11087) * feat(config): support alternate hashing algorithm for doc id (datahub-project#10423) Co-authored-by: david-leifker <[email protected]> Co-authored-by: John Joyce <[email protected]> * fix(emitter): fix typo in get method of java kafka emitter (datahub-project#11007) * fix(ingest): use correct native data type in all SQLAlchemy sources by compiling data type using dialect (datahub-project#10898) Co-authored-by: Harshal Sheth <[email protected]> * chore: Update contributors list in PR labeler (datahub-project#11105) * feat(ingest): tweak stale entity removal messaging (datahub-project#11064) * fix(ingestion): enforce lastObserved timestamps in SystemMetadata (datahub-project#11104) * fix(ingest/powerbi): fix broken lineage between chart and dataset (datahub-project#11080) * feat(ingest/lookml): CLL support for sql set in sql_table_name attribute of lookml view (datahub-project#11069) * docs: update graphql docs on forms & structured properties (datahub-project#11100) * test(search): search openAPI v3 test (datahub-project#11049) * fix(ingest/tableau): prevent empty site content urls (datahub-project#11057) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat(entity-client): implement client batch interface (datahub-project#11106) * fix(snowflake): avoid reporting warnings/info for sys tables (datahub-project#11114) * fix(ingest): downgrade column type mapping warning to info (datahub-project#11115) * feat(api): add AuditStamp to the V3 API entity/aspect response (datahub-project#11118) * fix(ingest/redshift): replace r'\n' with '\n' to avoid token error redshift serverless… (datahub-project#11111) * fix(entiy-client): handle null entityUrn case for restli (datahub-project#11122) * fix(sql-parser): prevent bad urns from alter table lineage (datahub-project#11092) * fix(ingest/bigquery): use small batch size if use_tables_list_query_v2 is set (datahub-project#11121) * fix(graphql): add missing entities to EntityTypeMapper and EntityTypeUrnMapper (datahub-project#10366) * feat(ui): Changes to allow editable dataset name (datahub-project#10608) Co-authored-by: Jay Kadambi <[email protected]> * fix: remove saxo (datahub-project#11127) * feat(mcl-processor): Update mcl processor hooks (datahub-project#11134) * fix(openapi): fix openapi v2 endpoints & v3 documentation update * Revert "fix(openapi): fix openapi v2 endpoints & v3 documentation update" This reverts commit 573c1cb. * docs(policies): updates to policies documentation (datahub-project#11073) * fix(openapi): fix openapi v2 and v3 docs update (datahub-project#11139) * feat(auth): grant type and acr values custom oidc parameters support (datahub-project#11116) * fix(mutator): mutator hook fixes (datahub-project#11140) * feat(search): support sorting on multiple fields (datahub-project#10775) * feat(ingest): various logging improvements (datahub-project#11126) * fix(ingestion/lookml): fix for sql parsing error (datahub-project#11079) Co-authored-by: Harshal Sheth <[email protected]> * feat(docs-site) cloud page spacing and content polishes (datahub-project#11141) * feat(ui) Enable editing structured props on fields (datahub-project#11042) * feat(tests): add md5 and last computed to testResult model (datahub-project#11117) * test(openapi): openapi regression smoke tests (datahub-project#11143) * fix(airflow): fix tox tests + update docs (datahub-project#11125) * docs: add chime to adoption stories (datahub-project#11142) * fix(ingest/databricks): Updating code to work with Databricks sdk 0.30 (datahub-project#11158) * fix(kafka-setup): add missing script to image (datahub-project#11190) * fix(config): fix hash algo config (datahub-project#11191) * test(smoke-test): updates to smoke-tests (datahub-project#11152) * fix(elasticsearch): refactor idHashAlgo setting (datahub-project#11193) * chore(kafka): kafka version bump (datahub-project#11211) * readd UsageStatsWorkUnit * fix merge problems * change logo --------- Co-authored-by: Chris Collins <[email protected]> Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]> Co-authored-by: dushayntAW <[email protected]> Co-authored-by: sagar-salvi-apptware <[email protected]> Co-authored-by: Aseem Bansal <[email protected]> Co-authored-by: Kevin Chun <[email protected]> Co-authored-by: jordanjeremy <[email protected]> Co-authored-by: skrydal <[email protected]> Co-authored-by: Harshal Sheth <[email protected]> Co-authored-by: david-leifker <[email protected]> Co-authored-by: sid-acryl <[email protected]> Co-authored-by: Julien Jehannet <[email protected]> Co-authored-by: Hendrik Richert <[email protected]> Co-authored-by: Hendrik Richert <[email protected]> Co-authored-by: RyanHolstien <[email protected]> Co-authored-by: Felix Lüdin <[email protected]> Co-authored-by: Pirry <[email protected]> Co-authored-by: Hyejin Yoon <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: cburroughs <[email protected]> Co-authored-by: ksrinath <[email protected]> Co-authored-by: Mayuri Nehate <[email protected]> Co-authored-by: Kunal-kankriya <[email protected]> Co-authored-by: Shirshanka Das <[email protected]> Co-authored-by: ipolding-cais <[email protected]> Co-authored-by: Tamas Nemeth <[email protected]> Co-authored-by: Shubham Jagtap <[email protected]> Co-authored-by: haeniya <[email protected]> Co-authored-by: Yanik Häni <[email protected]> Co-authored-by: Gabe Lyons <[email protected]> Co-authored-by: Gabe Lyons <[email protected]> Co-authored-by: 808OVADOZE <[email protected]> Co-authored-by: noggi <[email protected]> Co-authored-by: Nicholas Pena <[email protected]> Co-authored-by: Jay <[email protected]> Co-authored-by: ethan-cartwright <[email protected]> Co-authored-by: Ethan Cartwright <[email protected]> Co-authored-by: Nadav Gross <[email protected]> Co-authored-by: Patrick Franco Braz <[email protected]> Co-authored-by: pie1nthesky <[email protected]> Co-authored-by: Joel Pinto Mata (KPN-DSH-DEX team) <[email protected]> Co-authored-by: Ellie O'Neil <[email protected]> Co-authored-by: Ajoy Majumdar <[email protected]> Co-authored-by: deepgarg-visa <[email protected]> Co-authored-by: Tristan Heisler <[email protected]> Co-authored-by: Andrew Sikowitz <[email protected]> Co-authored-by: Davi Arnaut <[email protected]> Co-authored-by: Pedro Silva <[email protected]> Co-authored-by: amit-apptware <[email protected]> Co-authored-by: Sam Black <[email protected]> Co-authored-by: Raj Tekal <[email protected]> Co-authored-by: Steffen Grohsschmiedt <[email protected]> Co-authored-by: jaegwon.seo <[email protected]> Co-authored-by: Renan F. Lima <[email protected]> Co-authored-by: Matt Exchange <[email protected]> Co-authored-by: Jonny Dixon <[email protected]> Co-authored-by: Pedro Silva <[email protected]> Co-authored-by: Pinaki Bhattacharjee <[email protected]> Co-authored-by: Jeff Merrick <[email protected]> Co-authored-by: skrydal <[email protected]> Co-authored-by: AndreasHegerNuritas <[email protected]> Co-authored-by: jayasimhankv <[email protected]> Co-authored-by: Jay Kadambi <[email protected]> Co-authored-by: David Leifker <[email protected]>

John Joyce added 2 commits July 1, 2024 19:33

Adding structured log reporting to ingestion framework:

4a637b0

Merge remote-tracking branch 'acryl/jj--add-structured-logging-to-ing…

856731c

…estion' into jj--add-structured-logging-to-ingestion

github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Jul 2, 2024

coderabbitai bot reviewed Jul 2, 2024

View reviewed changes

vercel bot deployed to Preview July 2, 2024 02:53 View deployment

John Joyce added 4 commits July 2, 2024 11:29

Adding final reporting method support

5bf5d73

Merge remote-tracking branch 'acryl/jj--add-structured-logging-to-ing…

5461cff

…estion' into jj--add-structured-logging-to-ingestion

Yeah

7d580e5

Merge remote-tracking branch 'acryl/jj--add-structured-logging-to-ing…

fd7357a

…estion' into jj--add-structured-logging-to-ingestion

coderabbitai bot reviewed Jul 2, 2024

View reviewed changes

vercel bot deployed to Preview July 2, 2024 18:47 View deployment

John Joyce added 2 commits July 2, 2024 12:48

Adding refactoring

b8e5382

Merge remote-tracking branch 'acryl/jj--add-structured-logging-to-ing…

b5bfe6c

…estion' into jj--add-structured-logging-to-ingestion

coderabbitai bot reviewed Jul 2, 2024

View reviewed changes

vercel bot deployed to Preview July 2, 2024 20:21 View deployment

John Joyce and others added 3 commits July 2, 2024 16:19

Adding title, making literalstring requirement

bd4b3ff

Merge remote-tracking branch 'acryl/jj--add-structured-logging-to-ing…

47445c8

…estion' into jj--add-structured-logging-to-ingestion

type -> title

e897ede

coderabbitai bot reviewed Jul 2, 2024

View reviewed changes

Fix final occurrences of type

a667cf8

vercel bot deployed to Preview July 2, 2024 23:41 View deployment

John Joyce added 2 commits July 2, 2024 16:51

Adding prettier and supporting new log fields from ingest

da0739e

Merge remote-tracking branch 'acryl/jj--add-structured-logging-to-ing…

023ad85

…estion' into jj--add-structured-logging-to-ingestion

coderabbitai bot reviewed Jul 3, 2024

View reviewed changes

hsheth2 added 2 commits July 2, 2024 17:03

add structured logs type

839389b

Merge branch 'jj--add-structured-logging-to-ingestion' of ssh://githu…

6504bd4

…b.com/acryldata/datahub-fork into jj--add-structured-logging-to-ingestion

vercel bot deployed to Preview July 3, 2024 00:05 View deployment

John Joyce added 2 commits July 2, 2024 17:08

Test failures pause

9cd2035

Merge remote-tracking branch 'acryl/jj--add-structured-logging-to-ing…

41757e6

…estion' into jj--add-structured-logging-to-ingestion

vercel bot deployed to Preview July 3, 2024 17:41 View deployment

John Joyce added 4 commits July 3, 2024 11:25

Fix mode tests

2dfef8d

Merge remote-tracking branch 'acryl/jj--add-structured-logging-to-ing…

6547a5a

…estion' into jj--add-structured-logging-to-ingestion

Redshift to DataHub

245aa4b

Merge remote-tracking branch 'acryl/jj--add-structured-logging-to-ing…

d6650a2

…estion' into jj--add-structured-logging-to-ingestion

coderabbitai bot reviewed Jul 3, 2024

View reviewed changes

vercel bot deployed to Preview July 3, 2024 18:51 View deployment

hsheth2 approved these changes Jul 3, 2024

View reviewed changes

John Joyce added 2 commits July 3, 2024 14:24

Addressing comments

a1adbe9

Merge remote-tracking branch 'acryl/jj--add-structured-logging-to-ing…

aabe311

…estion' into jj--add-structured-logging-to-ingestion

coderabbitai bot reviewed Jul 3, 2024

View reviewed changes

John Joyce added 2 commits July 3, 2024 16:20

Adding source utils

0eaa0a9

Merge remote-tracking branch 'acryl/jj--add-structured-logging-to-ing…

01d0263

…estion' into jj--add-structured-logging-to-ingestion

coderabbitai bot reviewed Jul 3, 2024

View reviewed changes

vercel bot deployed to Preview July 3, 2024 23:35 View deployment

John Joyce added 2 commits July 3, 2024 16:44

Fix the build

cadc4f6

Merge remote-tracking branch 'acryl/jj--add-structured-logging-to-ing…

3ac4b42

…estion' into jj--add-structured-logging-to-ingestion

jjoyce0510 added the merge-pending-ci A PR that has passed review and should be merged once CI is green. label Jul 3, 2024

coderabbitai bot reviewed Jul 3, 2024

View reviewed changes

vercel bot deployed to Preview July 4, 2024 00:03 View deployment

John Joyce added 2 commits July 3, 2024 17:48

frontend lint

49d3774

Merge remote-tracking branch 'acryl/jj--add-structured-logging-to-ing…

8fc1b39

…estion' into jj--add-structured-logging-to-ingestion

coderabbitai bot reviewed Jul 4, 2024

View reviewed changes

vercel bot deployed to Preview July 4, 2024 01:02 View deployment

hsheth2 merged commit fa3e381 into datahub-project:master Jul 4, 2024
59 checks passed

Masterchen09 mentioned this pull request Jul 30, 2024

feat(ingest): add ingestion source for SAP Analytics Cloud #10958

Merged

5 tasks

	assert "permission-error" in pipeline.source.get_report()._errors.keys()
	assert "permission-error" in pipeline.source.get_report()._errors

	assert "lineage-permission-error" in pipeline.source.get_report()._errors.keys()
	assert "lineage-permission-error" in pipeline.source.get_report()._errors

	assert "usage-permission-error" in pipeline.source.get_report()._errors.keys()
	assert "usage-permission-error" in pipeline.source.get_report()._errors

	self.report.report_warning(message=w_spl[1], context=w_spl[0])
	self.report.report_warning(title="Warning", message=w_spl[1], context=w_spl[0])

refactor(ingest): Refactor structured logging to support infos, warnings, and failures structured reporting to UI #10828

refactor(ingest): Refactor structured logging to support infos, warnings, and failures structured reporting to UI #10828

Conversation

jjoyce0510 commented Jul 2, 2024 • edited by coderabbitai bot Loading

Summary

QA

Status

Checklist

Summary by CodeRabbit

coderabbitai bot commented Jul 2, 2024 • edited Loading

Walkthrough

Changes

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jul 2, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 2, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 2, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 2, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 2, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 2, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 2, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 2, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 2, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 2, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jul 2, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 3, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

hsheth2 left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jul 3, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 3, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 3, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 3, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 3, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 3, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 3, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 3, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 3, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 3, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

jjoyce0510 commented Jul 2, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 2, 2024 •

edited

Loading

CodeRabbit Configration File (`.coderabbit.yaml`)