Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on Use of Use_Column_Value And Timestamps #273

Closed
MorrieAtElastic opened this issue Mar 22, 2018 · 5 comments
Closed

Question on Use of Use_Column_Value And Timestamps #273

MorrieAtElastic opened this issue Mar 22, 2018 · 5 comments
Assignees
Labels

Comments

@MorrieAtElastic
Copy link

Please post all product and debugging questions on our forum. Your questions will reach our wider community members there, and if we confirm that there is a bug, then we can open a new issue here.

For all general issues, please provide the following details for fast resolution:

  • Version: 5.6.4 Customer; 5.6.5 reproduced in house against Oracle V12

  • Operating System: RHEL

  • Config File (if you have sensitive info, please remove it):(included below)

  • Sample Data: n/a

  • Steps to Reproduce:

  • When using the JDBC Plugin to query Oracle databases, it seems that USE_COLUMN_VALUE must be set to true even when using timestamp fields as the tracking column. This is inconsistent and completely opposite to the documented discussion of USE_COLUMN_VALUE which states that it should be used only when the tracking column is not a timestamp column. If USE_COLUMN_VALUE is set to false, timestamp comparisons do not work correctly and documents which should be issued to elasticsearch are not. See https://www.elastic.co/guide/en/logstash/5.6/plugins-inputs-jdbc.html#plugins-inputs-jdbc-use_column_value for the documented definition of USE_COLUMN_VALUE ("set to true when tracking column is not a timestamp").

  • A customer discovered this, provided a test case which I have reproduced his results; the problem can be demonstrated without connecting to an Elasticsearch cluster, just by using the rubydebug output.

Test configuration file:

input {
    jdbc {
        jdbc_validate_connection => true
        jdbc_driver_library => "/u01/app/oracle/product/12.2/db_1/md/property_graph/lib/ojdbc7.jar"
        jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
        jdbc_connection_string => "jdbc:oracle:thin:@//localhost:1521/orcl12c"
        jdbc_user => "system"
        jdbc_password => "oracle"
        statement => "SELECT CAST(add_months(sysdate, -1) AS TIMESTAMP) AS time_column, 'Some Value' AS some_data FROM dual WHERE CAST(add_months(sysdate, -1) AS TIMESTAMP) > :sql_last_value"
        use_column_value => true
        tracking_column => "time_column"
        tracking_column_type => "timestamp"
        schedule => "*/5 * * * * *"
        record_last_run =>  true
        tags => ["columnvalue"]
        last_run_metadata_path => "/tmp/logstash-oradb.lastrun_columnvalue.txt"
       }
    jdbc {
        jdbc_validate_connection => true
        jdbc_driver_library => "/u01/app/oracle/product/12.2/db_1/md/property_graph/lib/ojdbc7.jar"
        jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
        jdbc_connection_string => "jdbc:oracle:thin:@//localhost:1521/orcl12c"
        jdbc_user => "system"
        jdbc_password => "oracle"
        statement => "SELECT CAST(add_months(sysdate, -1) AS TIMESTAMP) AS n_time_column, 'Some Value' AS n_some_data FROM dual WHERE CAST(add_months(sysdate, -1) AS TIMESTAMP) > :sql_last_value"
        use_column_value => false
        tracking_column => "n_time_column"
        tracking_column_type => "timestamp"
        schedule => "*/5 * * * * *"
        record_last_run => true
        tags => ["nocolumnvalue"]
        last_run_metadata_path => "/tmp/logstash-oradb.lastrun_nocolumnvalue.txt"
       }
}

output {
    stdout { codec => rubydebug }
    #elasticsearch {}
}

To Reproduce

  • connect to any Oracle12 database by modifying "jdbc_connection_string" and "jdbc_driver_library" as required
  • run against any logstash V5.6 release:
./logstash -f ucvjdbc.conf

what you should observe is that documents having the "nocolumnvalue" tag (generated where use_column_value is false) will be issued once per run (wrong) while documents having the "columnvalue" tag (generated where use_column_value is true) will be output continuously (correct behavior).

@jakelandis
Copy link
Contributor

@MorrieAtElastic - I agree this is confusing, and I think this may be a documentation issue.

There are three states that may be recorded as the :sql_last_value

  1. The last timestamp from a DB column. Requires record_last_run = true, tracking_column, tracking_column_type = timestamp, and use_column_value = true
  2. The last numeric value from a DB column. Requires record_last_run = true, tracking_column, tracking_column_type = numeric, and use_column_value = true
  3. The last time the query ran. The default if nothing is set. Requires record_last_run = true (or not defined that is the default), and use_column_value = false (or not defined since that is the default)

The documentation for use_column_value reads

"Use an incremental column value rather than a timestamp"

Which is misleading since "timestamp" alludes to the tracking column type, but it really means the incremental DB column (numeric or timestamp) rather then timestamp of last time the query ran.

I think we should update the documentation for use_column_value to say:

When set to true uses the defined tracking_column for the :sql_last_value. When set to false use the last time the query executed for the :sql_last_value. Default is false.

@guyboertje - keep me honest on how these values work.

@MorrieAtElastic
Copy link
Author

MorrieAtElastic commented Mar 26, 2018

I - and the customer who identified this problem as well - agree: this needs to be better documented.

Thanks for the quick turnaround.

@guyboertje
Copy link
Contributor

tracking_column
Value type is string
There is no default value for this setting.
If tracking column value rather than timestamp, the column whose value is to be tracked

use_column_value
Value type is boolean
Default value is false
Use an incremental column value rather than a timestamp

I think the problem is the word timestamp in these two settings docs. If one mentally changes it to time of last run then it makes sense, i.e. timestamp is overloaded here in confusion with its use in...

tracking_column_type
Value can be any of: numeric, timestamp
Default value is "numeric"
Type of tracking column. Currently only "numeric" and "timestamp"

Either way I agree that the interplay between these three or four settings warrants its own section in the main docs.

@jakelandis
Copy link
Contributor

jakelandis commented Mar 28, 2018

Also it has been reported as a point of confusion for the tracking_column_type not needing to be set due to the overloading of the term.

We should fix the document holistically to disambiguate.

cc @karenzone

EDIT: corrected statement.

@karenzone
Copy link
Contributor

karenzone commented Apr 5, 2018

Reworked the topic (https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html) to clarify use of USE_COLUMN_VALUE (#277).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants