Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force all usage of sql_last_value to be typed according to the settings #260

Merged
merged 3 commits into from
Feb 27, 2018

Conversation

guyboertje
Copy link
Contributor

@guyboertje guyboertje commented Feb 14, 2018

Added three more tests to verify the jdbc_default_timezone behaviour
when no tracking column is specified
when a numeric column is specified
when a timestamp column is specified

Fixes #140

@guyboertje guyboertje self-assigned this Feb 14, 2018
@guyboertje
Copy link
Contributor Author

Will rebase.

@jakelandis
Copy link
Contributor

Will rebase.

sorry, i sniped 4.3.4

Added three more tests to verify the jdbc_default_timezone behaviour
when no tracking column is specified
when a numeric column is specified
when a timestamp column is specified
@jakelandis
Copy link
Contributor

@guyboertje - yeah, 4.3.4 is published.

@jakelandis jakelandis self-assigned this Feb 20, 2018
@jakelandis
Copy link
Contributor

@guyboertje - I am trying to manually test a scenario that fails with out this change, then works with this change. Could you help me to better understand exactly how to test this ?

What I have done so far:

Using MySQL

DROP TABLE test;
CREATE TABLE test (num INTEGER, update_dt_tm DATETIME(6));
INSERT into test (num, update_dt_tm) VALUES (96, NOW(6));
INSERT into test (num, update_dt_tm) VALUES (97, NOW(6));
INSERT into test (num, update_dt_tm) VALUES (98, NOW(6));
INSERT into test (num, update_dt_tm) VALUES (99, NOW(6));
SELECT * from test;

and the following config:

input {
  jdbc {
    jdbc_driver_library => "/Users/jake/workspace/examples/sql/mysql-connector-java-5.1.44/mysql-connector-java-5.1.44-bin.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/test2"
    jdbc_user => "XX"
    jdbc_password => "XX"
    schedule => "* * * * * *"
    statement => "SELECT * from test LIMIT 10"
    jdbc_default_timezone => "UTC"
#    record_last_run => true 
#    last_run_metadata_path => "/tmp/logstash_jdbc_last_run_1"
    }
}
output {
		stdout { codec => rubydebug}
}

Tried different TIMESTAMP and DATETIME types and various jdbc_default_timezones and looked for truncation of milliseconds in the output or in the last run file. I have read through #140, but don't think I am understanding the issue that this PR is addressing.

@guyboertje
Copy link
Contributor Author

guyboertje commented Feb 23, 2018

I will sent an email with a link to SFDC cases that describes the fault better.

The original code for timezone support was committed in Dec 2015 when Time.parse(sql_last_value.to_s) was used if the tracked column value was a DateTime (converted from sql.Timestamp by Sequel).

@yaauie very recently changed this in master to sql_last_value.to_time, this in itself does the preserve sub-seconds.

I did my investigations before the above change. However even with the above change in place Sequel does not perform the timezone adjustments correctly. This PR refactors to allow Sequel to do the TZ adjustments correctly while preserving Ry's intended fix.

I hacked the new test at line 327 from the PR into master.
Note: the test has ENV["TZ"] = "Etc/UTC".

The timestamps added to the DB reflects the Chicago timezone...
db[:test1_table].insert(:num => 22, :custom_time => "2015-01-01 20:00:00.811", :created_at => Time.now.utc) i.e. Jan 1, 20h00

This is what I see when I add print query.sql:
Between the queries below, before Sequel converts it, the value of custom_time of the last record is '2015-01-01 20:00:00.722000'. After it converts it it is '2015-01-02 02:00:00.722000'

With Ry's change reverted.

SELECT * from test1_table WHERE custom_time > '1970-01-01 00:00:00.000000' ORDER BY custom_time

SELECT * from test1_table WHERE custom_time > '2015-01-02 02:00:00.000000' ORDER BY custom_time

NOTE: sub-seconds NOT preserved AND timestamp is UTC (+600)

With Ry's change.

SELECT * from test1_table WHERE custom_time > '1970-01-01 00:00:00.000000' ORDER BY custom_time

SELECT * from test1_table WHERE custom_time > '2015-01-02 02:00:00.722000' ORDER BY custom_time

NOTE: sub-seconds IS preserved BUT timestamp is UTC (+600).

With the PR:

SELECT * from test1_table WHERE custom_time > '1999-12-31 18:00:00.987000' ORDER BY custom_time

SELECT * from test1_table WHERE custom_time > '2015-01-01 20:00:00.722000' ORDER BY custom_time

NOTE: sub-seconds IS preserved BUT timestamp is 6 hours earlier than UTC.

How to recreate with LS v5.5.2 (sub-seconds zeroed out):
Logstash runs in UTC TZ with jdbc_input settings:
"statement" => "SELECT * from test WHERE added_at > :sql_last_value ORDER BY added_at",
"use_column_value" => true,
"tracking_column" => "added_at",
"tracking_column_type" => "timestamp"
"jdbc_default_timezone" => "America/Chicago"

Add 1 record with a timestamp column (called say, added_at) set to 5:55 in Chicago - 5 minutes before noon UTC.
Run LS with:
You should see one event.
The sql_last_value substituted in the query should be UNIX epoch.
Persisted sql_last_value should be the added_at value in UTC "11:55".
Event added_at field should be converted to UTC - "11:55"

Add 1 record with added_at set to 6:05 - 5 minutes after noon UTC.
Run LS again.
You should see no events
The sql_last_value substituted in the query should be added_at value in UTC but no records exist for > "11:55".

Delete the persisted sql last value and truncate the test table
Switch to using a timezone 6 hours later than UTC say, "Asia/Almaty".
"jdbc_default_timezone" => "Asia/Almaty"

Add 1 record with added_at) set to "17:55"- 5 minutes before noon UTC.
Run LS with:
You should see one event.
The sql_last_value substituted in the query should be UNIX epoch.
Persisted sql_last_value should be the added_at value in UTC "11:55".
Event added_at field should be converted to UTC "11:55".

Add 1 record with added_at set to "18:55" - 5 minutes after noon UTC.
Run LS again.
You should see 2 events
The sql_last_value substituted in the query should be added_at value in UTC but 2 records exist for > "11:55".

Try LS 6.2.1 (sub-seconds preserved, but same results)

I think this no records vs duplicated records made it hard to track down and create a bug report. User can work around it by not specifying the jdbc_default_timezone, casting date/timestamps to string and using the date filter to adjust the time fields.

@guyboertje
Copy link
Contributor Author

The above should hold true when when not using a tracking column and sql_last_value is the time of run in UTC but the DB times are not UTC, i.e.:
"statement" => "SELECT * from test WHERE added_at > :sql_last_value ORDER BY added_at",
"tracking_column_type" => "timestamp"
"jdbc_default_timezone" => "America/Chicago"

@jakelandis
Copy link
Contributor

@guyboertje - thanks for the explanation, I think I get the changes here now.

I confirmed that the timezone issue is fixed, but w/r/t the millisecond precision, it seems that the persisted datetime/time does indeed have the milliseconds, but when the :sql_last_value is sent to the database does not.

For example the persisted timestamp is --- 2018-02-23 17:28:44.189915000 -06:00 , but the following is what is sent to the database: SELECT * from test WHERE update_dt_tm > '2018-02-23 17:28:43'

Is this just an artifact of sequel support and my testing (MySQL) ?


Also,
While trying to understand the changes here I wrote what could be a commit log describing the change (is this correct ?):

When :sql_last_value is sent the to database, it is converted to the JVM default timezone. This change honors the jdbc_default_timezone for use in the :sql_last_value timezone conversion.

Assuming America/Chicago as the default JVM timezone, but the configuration declares UTC as the database timezone jdbc_default_timezone => "UTC"
The last_run is correctly recorded as 2018-02-23 22:30:34.054592000 Z , but when the > :sql_last_value gets sent to the database, it incorrectly gets converted to the JVM default timezone ( > '2018-02-23 16:30:34' ).

This change fixes the conversion when sending the :sql_last_value to the database to honor the jdbc_default_timezone setting. Specifically, this leverages the fact that the Sequel library will handle the timezone conversions properly if passed a DateTime object, and won't don't any timezone conversions if passed a Time object.

This change also refactors parts of the code for better readability and adds more tests.

@jakelandis
Copy link
Contributor

Is this just an artifact of sequel support and my testing (MySQL) ?

Answering my own question ... I dug into sequel a bit and found this magic is needed to enable fractional seconds on mysql 5.6.5+

  sequel_opts => {
       fractional_seconds => true
     }

The ms now show up

 SELECT * from test WHERE update_dt_tm > '2018-02-24 00:26:06.226550'

Copy link
Contributor

@jakelandis jakelandis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @guyboertje !

Changes look good. The only suggestion I have is to beef up the commit message. Suggested wording is on #260 (comment)

EDIT:
Forgot the magic letters. LGTM

@guyboertje guyboertje merged commit cd75a15 into logstash-plugins:master Feb 27, 2018
@guyboertje guyboertje deleted the fix/140 branch March 8, 2018 17:35
@yangzhiwen911
Copy link

yangzhiwen911 commented Nov 25, 2020

Is this just an artifact of sequel support and my testing (MySQL) ?

Answering my own question ... I dug into sequel a bit and found this magic is needed to enable fractional seconds on mysql 5.6.5+

  sequel_opts => {
       fractional_seconds => true
     }

The ms now show up

 SELECT * from test WHERE update_dt_tm > '2018-02-24 00:26:06.226550'

where did u add sequel_opts => {
fractional_seconds => true
} to filter or mutate ??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JDBC Input set jdbc_default_timezone loses milliseconds
3 participants