Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sincedb file not created, files from bucket not deleted #236

Open
niekosau opened this issue Feb 15, 2022 · 2 comments
Open

sincedb file not created, files from bucket not deleted #236

niekosau opened this issue Feb 15, 2022 · 2 comments
Labels

Comments

@niekosau
Copy link

Logstash information:

  1. Logstash version (e.g. bin/logstash --version)
    logstash 8.0.0
  2. Logstash installation source (e.g. built from source, with a package manager: DEB/RPM, expanded from tar or zip archive, docker)
    rpm repository from https://artifacts.elastic.co/packages/8.x/yum
  3. How is Logstash being run (e.g. as a service/service manager: systemd, upstart, etc. Via command line, docker/kubernetes)
    systemd unit provided by package
  4. How was the Logstash Plugin installed
    shipped with logstash

OS version (uname -a if on a Unix-like system):
Rocky linux 8.5 (4.18.0-348.7.1.el8_5.x86_64)
Description of the problem including expected versus actual behavior:
files from bucket (radosgw on-premise) not removed, sincedb file not created/updated

Steps to reproduce:
Configuration:

input {
  s3 {
    access_key_id => "XXXXXXXXXX"
    secret_access_key => "xxxxxxxxxxxxxxxxx"
    bucket => "test-bucket"
    endpoint => "https://s3.domain.tld"
    delete => true
    sincedb_path  => "/var/lib/logstash/s3-sincedb.db"
    additional_settings => {
      force_path_style => true
      follow_redirects => false
    }
  }
}
output {
  stdout {}
}

Tested older plugin versions, last working version: 3.5.0
Plugin downgraded by executing: bin/logstash-plugin install --version 3.5.0 logstash-input-s3

Please include a minimal but complete recreation of the problem,
including (e.g.) pipeline definition(s), settings, locale, etc. The easier
you make for us to reproduce it, the more likely that somebody will take the
time to look at it.

  1. Create a bucket and put some files
  2. Start logstash with minimal configuration
  3. Files not removed after processing, sincedb file not created, so next interval same files processed again.
@niekosau niekosau added the bug label Feb 15, 2022
@dabelousov
Copy link

@niekosau can you show your logstash log of input.s3? May be i have a same problem

@zeroad
Copy link

zeroad commented Jun 13, 2022

I'm using logstash 7.17.4 with plugin version 3.8.3

I think the comparison logic is the culprit here. In my case it is comparing timestamps which end with different local formats.

Before this I added

::File.open('/tmp/debug.log', 'a') { |file| file.write("object.last_modified: ", object.last_modified, ",  log.last_modified: ", log.last_modified, "\n") }

Which gives me:

object.last_modified: 2021-06-25 16:38:23 +0000,  log.last_modified: 2021-06-25 16:38:23 UTC

I'm not familiar with ruby but when comparing the dates it probably casts them to the above string representations which are not the same.

In the first step I fixed it on my side by comparing the unix time stamps

if object.last_modified.to_i == log.last_modified.to_i

Still this works not 100% as expected. The logic applied here is to save the latest timestamp seen in s3 to the sincedb. If you do not delete your files in the s3 you will always import all files which have the same timestamp as stored in the sincedb path. Which means you will import the latest file again and again.

To fix this issue I always add 1s to the timestamp written to the sincedb, after this line

since = since + 1

I'm not sure if there will be any side effects when simply adding +1 to the timestamp (at least it works for me now as expected).
One should probably fix the comparising logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants