Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporary gzip files are not removed when deleted #190

Closed
pergus opened this issue Aug 17, 2018 · 2 comments · Fixed by #193
Closed

Temporary gzip files are not removed when deleted #190

pergus opened this issue Aug 17, 2018 · 2 comments · Fixed by #193
Assignees

Comments

@pergus
Copy link

pergus commented Aug 17, 2018

Version

logstash-6.3.2 and logstash-7.0.0-alpha1-SNAPSHOT

Problem

Temporary gzip files are deleted before all file handlers to the files are closed, causing the filesystem to run out of space.

To replicate the issue, export an elasticsearch index to an S3 bucket with gzip encoding enabled. See configuration below. It is assumed that the temporary_directory variable has the default value /tmp/logstash.

input {
 elasticsearch {
   hosts    => ["elkserver"]
   index    => "${EXP_INDEX}-${EXP_YEAR}.${EXP_MONTH}"
   user     => "admin"
   password => "xxx"
 }
}

output {
 s3{
   access_key_id     => "xxx"
   secret_access_key => "xxx"
   region            => "eu-west-1"
   bucket            => "bucket"
   size_file         => 10485760
   rotation_strategy => "size"
   codec             => "json_lines"
   canned_acl        => "private"
   encoding          => "gzip"
   prefix            => "${EXP_INDEX}/year=${EXP_YEAR}/month=${EXP_MONTH}"
 }
}

While the export is running use lsof to list all files that are marked as deleted using the following command.

watch -n 5 "sudo lsof | grep deleted | grep java | wc -l"

And watch the file system usage using the following commands:

watch -n 5 ./diskCheck.sh

Content of diskCheck.sh:

#!/bin/bash
df -H /tmp 
du -ha /tmp/logstash

Workarounds

I was able to find two workarounds for the problem, but since I am not familiar with Ruby or the source code for S3 output I don't know if these solutions can cause other problems.

Workaround 1

Edit the file temporary_file_factory.rb and change line 87
from

IOWrappedGzip.new(::File.open(::File.join(path, key), FILE_MODE))

to

IOWrappedGzip.new(::File.join(path, key))

Also, comment or remove the following code from line 102:

@file_io = file_io

N.B.
I was not able to find any code that uses the file_io attribute, which is why I commented out line 102.
However, this might not be the best idea.

Workaround 2

Edit the file temporary_file_factory.rb and change line 98
from

def_delegators :@gzip_writer, :write, :close

to

def_delegators :@gzip_writer, :write

Then add the following method to the IOWrappedGzip class.

 def close
    @file_io.close
    @gzip_writer.close
 end
@yaauie
Copy link
Contributor

yaauie commented Aug 23, 2018

🤔 The GzipWriter#close is documented to also close the underlying IO, but if that's not happening I could see why this would be problematic.

In taking a closer look, it looks like we're creating the GzipWriter with GzipWriter#open, which is documented to take a filename, not an IO. It may be possible that in using GzipWriter#open, we are creating an additional IO.

If this is the case, a separate workaround may be to simply replace GzipWriter#open with GzipWriter#new, which is documented to take an IO. I think this will use the passed IO as its backing IO, and that the guarantees mentioned elsewhere about closing the IO would then apply.

diff --git a/lib/logstash/outputs/s3/temporary_file_factory.rb b/lib/logstash/outputs/s3/temporary_file_factory.rb
index 7695ac0..21a98ab 100644
--- a/lib/logstash/outputs/s3/temporary_file_factory.rb
+++ b/lib/logstash/outputs/s3/temporary_file_factory.rb
@@ -100,7 +100,7 @@ module LogStash
 
           def initialize(file_io)
             @file_io = file_io
-            @gzip_writer = Zlib::GzipWriter.open(file_io)
+            @gzip_writer = Zlib::GzipWriter.new(file_io)
           end
 
           def path

@pergus
Copy link
Author

pergus commented Aug 24, 2018

Excellent. Your patch works for me, and is much more elegant than my workarounds.

yaauie added a commit to yaauie/logstash-output-s3 that referenced this issue Sep 11, 2018
Use `Zlib::GzipWriter::new(IO)` instead of `Zlib::GzipWriter::open(Object)`, since
the latter effectively creates an additional file handle when passed an `IO`. This
ensures that when we send `Zlib::GzipWriter#close`, the original `IO` is closed in
turn.

Resolves: logstash-plugins#190
yaauie added a commit to yaauie/logstash-output-s3 that referenced this issue Sep 25, 2018
Use `Zlib::GzipWriter::new(IO)` instead of `Zlib::GzipWriter::open(Object)`, since
the latter effectively creates an additional file handle when passed an `IO`. This
ensures that when we send `Zlib::GzipWriter#close`, the original `IO` is closed in
turn.

Resolves: logstash-plugins#190
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants