Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DAE-145] Change the default values for create table event #62

Merged
merged 3 commits into from
Aug 12, 2021

Conversation

kenjihiraoka
Copy link
Contributor

@kenjihiraoka kenjihiraoka commented Aug 11, 2021

Why? 📖

We've been using the hive-metastore listener to track all DDLs running on the server that generates a payload sent to the Kafka topic in order for the Apache Atlas can read them and create the objects in your ecosystem.

Using the lib we faced a problem with the creation of tables based on Json schema, actually, the creation of the table worked as expected, but the listener could not get the json schema, given the following error:

2021-08-06T18:26:43,438 ERROR [pool-8-thread-2] metadata.Table: Unable to get field from serde: org.apache.hive.hcatalog.data.JsonSerDe
java.lang.NullPointerException: null

In the first shot, that error looks like something related to the listener code, but after realizing plenty of tests with many kinds of changes in the listener, I could not find any idea of the reason for this specific error: a mere NullPointer 😞

After that, I tried to get the object created by the listener and did a comparison with the object saved in the hive-metastore database through the hive-metastore client, and I noticed that there were different objects for the same table, so I could see that some default values for TableBuilder and SerdeInfoBuilder were the reason for the specific error above.

What? 🔧

  • SerdeInfoBuilder

    • Check if the attribute parameters is None, if yes the attribute is equal to an empty dict.
  • TableBuilder

    • Check if the attribute parameters is None, if yes the attribute is equal to an empty dict.
    • Check if the attribute partition_keys is None, if yes the attribute is equal to an empty list.

Type of change 🗄️

  • New feature (non-breaking change which adds functionality)

How everything was tested? 📏

Ran the create table event with those hard-coded attributes and checked the hive-metastore log that got the json schema.

Checklist 📝

  • I have added labels to distinguish the type of pull request.
  • My code follows the style guidelines of this project (docstrings, type hinting and linter compliance);
  • I have performed a self-review of my own code;
  • I have made corresponding changes to the documentation;
  • I have added tests that prove my fix is effective or that my feature works;
  • I have made sure that new and existing unit tests pass locally with my changes;

@kenjihiraoka kenjihiraoka added the enhancement New feature or request label Aug 11, 2021
@kenjihiraoka kenjihiraoka self-assigned this Aug 11, 2021
@kenjihiraoka kenjihiraoka requested a review from a team as a code owner August 11, 2021 18:26
@@ -34,7 +34,7 @@ def __init__(
"""
self.name = name
self.serialization_lib = serialization_lib
self.parameters = parameters
self.parameters = parameters if parameters else {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would compare it directly to None to avoid overwriting falsy parameters check this SO reply.
Just for information, there is an equivalent form as the one proposed self.parameters = parameters or {} which some people consider more readable. But I still think it's worth using the {} if parameters is None else parameter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm afraid I didn't get what you suggested Breno. Because python considers None as False in boolean comparisons.
Also if we restrict the comparison to if parameter is None, we will miss other False values as False, '', 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it would be better to not overwrite falsy values like False, '' or 0 because it would silently change the values.
If we want to restrict arguments like 0 and '' we can validate the type and raise an exception for the invalid ones but I don't like the idea of mixing this validation with the default argument behaviour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried using other false values for those attributes, and the thrift returns the following error:

image

For this case, we can apply the changes suggested by Breno and change only the attribute when they are None.

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@kenjihiraoka kenjihiraoka merged commit 8babdc0 into main Aug 12, 2021
@kenjihiraoka kenjihiraoka deleted the kenjihiraoka/change-create-table-default-parameter branch August 12, 2021 17:20
@kenjihiraoka kenjihiraoka mentioned this pull request Aug 12, 2021
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants