Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinguish ambiguous column value of None #489

Conversation

heehehe
Copy link
Contributor

@heehehe heehehe commented Sep 23, 2023

Resolve #479

Overview

In RowsEvent, there are three cases in which the column value is None.

  1. when the value is NULL
  2. when the datatype is related to time (datetime, time, ...) and the value is not supported format
    (ex: '0000-00-00 00:00:00')
  3. when the datatype is set and the value is empty

Since we cannot distinguish these cases, we're going to provide detail information for None value.

Changes

  • In RowsEvent, we added __none_sources variable.
  • When we execute __read_values_name,
    we check detail information for None cases and save it to __none_sources.
  • When dump() executes, we check for none_sources and show them together with column name and value.

Example

  • SQL query
    CREATE TABLE test_table (col0 int, col1 varchar(10), col2 datetime);
    INSERT INTO test_table VALUES (1, 'abc', '2023-09-09 00:00:00');
    UPDATE test_table SET col1=NULL, col2='0000-00-00 00:00:00' WHERE col0=1;
  • dump result for UPDATE
      === UpdateRowsEvent ===
      Date: 2023-09-09T05:23:14
      Log position: 1360
      Event size: 37
      Read bytes: 13
      Table: test.test_table
      Affected columns: 3
      Changed rows: 1
      Affected columns: 3
      Values:
      --
      *col0:1=>1
    + *col1:abc=>None(null)
    + *col2:2023-09-09 00:00:00=>None(out of datetime range)

return self.__read_datetime()
ret = self.__read_datetime()
if ret is None:
self.__none_sources[name] = "out of datetime range"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should be a constant this will allow people to code behavior based on that. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@julien-duponchelle Thanks for your review!

Do you mean that setting constant like
OUTDATETIME = "out of datetime range"

and set __none_source like below?
self.__none_source[name] = OUTDATETIME

If this is right, I'm going to set constant by enum and set values with that :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@julien-duponchelle
I added pymysqlreplication/constants/NONE_SOURCE.py
and changed row_event.py to handle values according to the variables in NONE_SORUCE.py.

Please let me know if there's anything need to change :)

Copy link
Collaborator

@sean-k1 sean-k1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Thanks for hard work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suggestion: distinguish ambiguous column value of None
6 participants