Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql_last_value got converted ino local timezone, making fresh db records lookup unfeasible #293

Open
jevho opened this issue Jul 24, 2018 · 4 comments
Assignees

Comments

@jevho
Copy link

jevho commented Jul 24, 2018

input {
  jdbc { 
    jdbc_driver_library => "/local/path/mysql-connector-java-8.0.11.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/testdb" 
    jdbc_user => ""
    jdbc_password => ""
    schedule => "* * * * *"
    statement => "SELECT * FROM table WERE updatedAt > :sql_last_value ORDER BY updatedAt"
    clean_run => true
    use_column_value => true
    lowercase_column_names => false
    tracking_column => "updatedAt"
    tracking_column_type => "timestamp"
  }
}

sql_last_value here is being converted into local timezone. So the next time it queries db, the timestamp being used is wrong.

If the actual last fetched from db value of updatedAt in my example is 2018-07-20T00:00:00.000Z, the next query will get 2018-07-20T02:00:00.000Z in :sql_last_value instead, shifted according to my local timezone +02:00. So the whole procedure of fetching freshly updated db records will fail.

logstash 6.3.1
Ubuntu 16.04.4 LTS

@yaauie
Copy link
Contributor

yaauie commented Jul 30, 2018

It looks like you could be using the jdbc_default_timezone directive to inform the plugin of the external time zone.

@jevho
Copy link
Author

jevho commented Jul 31, 2018

This directive

will automatically convert your SQL timestamp fields to Logstash timestamps, in relative UTC time

sql_last_value will remain shifted next time it quires db.

@wych42
Copy link

wych42 commented Aug 20, 2018

Same issue here.

Logstash and MySQL both running at UTC+8 timezone, logstash is tracking updated column with latest row 2018-08-20T22:46:31.000Z, sync table to elasticsearch.

MySQL table schema was updated timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP

If jdbc_default_timezone was set to 'Asia/Shanghai'(UTC+8), the initial query and second query were:

ELECT * FROM table WHERE updated >= '1970-01-01 08:00:00'
SELECT * FROM table WHERE updated >= '2018-08-21 04:20:45'

If jdbc_default_timezone was set to 'Etc/UTC'(UTC+0),the initial query and second query were:

ELECT * FROM table WHERE updated >= '1970-01-01 00:00:00'
SELECT * FROM table WHERE updated >= '2018-08-21 04:20:45'

Wonder how those timezones be converted? Maybe related to #140 ?

Another thing was that, when jdbc_default_timezone was set to 'Asia/Shanghai'(UTC+8), one row with updated value 2018-08-20 17:46:31 in MySQL, was synced to elasticsearch with value 2018-08-20T22:46:31.000Z. Which is odd because either this should be 23:46:31(17+8) or 09:46:31(17-8), anything I missed?

@wych42
Copy link

wych42 commented Aug 21, 2018

Got workaround with:

  1. remove jdbc_default_timezone setting, using machine timezone.
  2. convert timestamp field to string(using DATE_FORMAT function) in statement sql.
  3. use date filter to match date string above and convert to utc time from my local timezone( set date filter's timezone config to Asia/Shanghai')
  4. remove use_column_value/ set to false.

Got what I expected:

  1. sync tables' last checkpoint was correct.
  2. time-related field updated to elasticsearch got correct value in UTC timezone(it seems all time in es was stored as UTC timezone)

config file looks like this:

input {
  jdbc {
    ...
    statement => "SELECT *, DATE_FORMAT(created, '%Y-%m-%d %T') as created_at, DATE_FORMAT(updated, '%Y-%m-%d %T') as updated_at FROM table_name WHERE updated >= :sql_last_value"
    use_column_value => false
    ...
  }
}
filter {
  ...
  date {
    match => [ "created_at" , "YYYY-MM-dd HH:mm:ss" ]
    timezone => "Asia/Shanghai"
    target => "created"
    remove_field => [ "created_at" ]
  }
  date {
    match => [ "updated_at" , "YYYY-MM-dd HH:mm:ss" ]
    timezone => "Asia/Shanghai"
    target => "updated"
    remove_field => [ "updated_at" ]
  }
}
output {
  ...
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants