Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't execute select count(*) in the beginning #305

Open
iirekm opened this issue Oct 23, 2018 · 12 comments
Open

Don't execute select count(*) in the beginning #305

iirekm opened this issue Oct 23, 2018 · 12 comments

Comments

@iirekm
Copy link

iirekm commented Oct 23, 2018

For complex SQL queries as in our case it's performance killer, everything takes twice as long.

  • Version: 6.4.2
  • Operating System:
  • Config File (if you have sensitive info, please remove it):
  • Sample Data:
  • Steps to Reproduce:
@Johnb21
Copy link

Johnb21 commented Oct 23, 2018

This functionality is making the jdbc input unusable for my data retrieval process because I continue to get back I/O errors due to how long this takes to run and my connections appear to be timing out during that time.

EDIT: I took the query for this count and tried to manually run it - i stopped it after nonstop running for 30 minutes. It never finished. Is there a reason this count step is needed? Is there no config / setting that we can set to disable it?

@jintianzhouyine
Copy link

Even if i don't configure the jdbc_paging_enabled parameter,it still worked.

@danhermann
Copy link

This also prevents the JDBC input from working with a number of datastores that provide JDBC drivers but are not fully SQL-compliant such as Apache Jena, etc.

@sgtmacdiesel
Copy link

Is there any update on this? I am having this same issue. This count(*) is causing a heavy query to run twice on our database. Is there any way to disable it?

@stayman
Copy link

stayman commented Aug 7, 2019

I'd like to ask the same question. 4.3.11 actually made the problem worse, since the count is now executing in all contexts, not just debug mode.

We're seeing consistent failures because of the runtime of the count query.

@wouterbac1
Copy link

Hi I am having a related (or perhaps the same) issue. I am trying to read from a Firebird database with Logstash using the following
input {
jdbc {
jdbc_driver_library => "c:/bug/jaybird-full-3.0.6.jar"
jdbc_driver_class => "org.firebirdsql.jdbc.FBDriver"
jdbc_connection_string => "jdbc:firebirdsql://127.0.0.1/D:/DBASE.FDB"
jdbc_user => SYSDBA
jdbc_password => pwd12442
jdbc_paging_enabled => false
tracking_column => "unix_ts_in_secs"
use_column_value => true
tracking_column_type => "numeric"
schedule => "*/5 * * * * *"
statement => "SELECT * FROM TABLE1"
}
}

When I analyze the debug information in my Windows terminal I note that somehow 'SELECT count(*) AS "COUNT" FROM is added in front of my statement and AS "T1" LIMIT 1 is added behind my statement. For example:

SELECT count(*) AS "COUNT" FROM (select * from TABLE1) AS "T1" LIMIT 1

Obviously the SQL error I get is that the Token is unknown. Does anyone have a solution for it?

@jaredwa
Copy link

jaredwa commented Nov 17, 2019

Not sure if anyone else had this issue - I did and it is frustrating that Logstash can't put a kill on this count - it would be easy for them to let the user just insert their own count - preempting the query and causing them NO HARM WHATSOEVER. But, the workaround I found was to limit the query in the statement by the ID, so that the count was inherently limited. So basically

statement => "SELECT *t FROM table WHERE id > :sql_last_value and id < 10000000 ORDER BY id ASC"

Of course, this means you have to babysit and up the numbers as the search passes it, but it's better than having it all just fail.

@filip-java
Copy link

filip-java commented Apr 9, 2020

I worked around it by using a prepared statement with no params. Since prepared statement does not support paging, the plugin does not issue the extra "count" query.

@sanosay
Copy link

sanosay commented Aug 12, 2020

Any update in respect to this?
It seems like it's been abandoned / ignored, while so many complain about it.
In large tables, this is indeed a show stopper.

@Johnb21
Copy link

Johnb21 commented Aug 26, 2020

I worked around it by using a prepared statement with no params. Since prepared statement does not support paging, the plugin does not issue the extra "count" query.

This only works when you have a query that is not expecting data to be used from another input.

My work around was being forced to write everything in Python. Not exactly a logstash solution

@webmasterlv
Copy link

I have the same issue right now. I have a table with 100M rows. The same table was indexed with Sphinx and it was a breeze. Adding a separate statement_count where you can write your own SQL query that produces count for the table indeed would be the best solution.

@Noah1989
Copy link

Noah1989 commented Mar 5, 2024

This has been fixed here: logstash-plugins/logstash-integration-jdbc#95

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests