Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log filled with Caused by: java.io.IOException: No space left on device errors on data drive free space consumption #20354

Closed
b1naryth1ef opened this issue Sep 7, 2016 · 11 comments
Labels
:Core/Infra/Logging Log management and logging utilities feedback_needed

Comments

@b1naryth1ef
Copy link

Elasticsearch version: 2.3.2

Plugins installed: []

JVM version: Java(TM) SE Runtime Environment (build 1.8.0_72-b15)

OS version: Ubuntu 14.04.5 LTS (3.19.0-59-generic)

Description of the problem including expected versus actual behavior:

Just had our data drive run out of space on a 1-node elastic instance which caused the logging partition to be filled by a 28GB log file (~300 million lines), filled solely with this repeating section:

Caused by: java.io.IOException: No space left on device
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
        at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
        at sun.nio.ch.IOUtil.write(IOUtil.java:65)
        at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
        at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
        at java.nio.channels.Channels.writeFully(Channels.java:101)
        at java.nio.channels.Channels.access$000(Channels.java:61)
        at java.nio.channels.Channels$1.write(Channels.java:174)
        at org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:278)
        at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:73)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
        at org.apache.lucene.store.OutputStreamIndexOutput.writeByte(OutputStreamIndexOutput.java:47)
        at org.apache.lucene.store.RateLimitedIndexOutput.writeByte(RateLimitedIndexOutput.java:66)
        at org.apache.lucene.codecs.compressing.LZ4.encodeSequence(LZ4.java:176)
        at org.apache.lucene.codecs.compressing.LZ4.compress(LZ4.java:243)
        at org.apache.lucene.codecs.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:164)
        at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:236)
        at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:163)
        at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:605)
        at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:177)
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:83)
        at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4075)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3655)
        ... 3 more
        Suppressed: java.io.IOException: No space left on device
                at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
                at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
                at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
                at sun.nio.ch.IOUtil.write(IOUtil.java:65)
                at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
                at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
                at java.nio.channels.Channels.writeFully(Channels.java:101)
                at java.nio.channels.Channels.access$000(Channels.java:61)
                at java.nio.channels.Channels$1.write(Channels.java:174)
                at org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:278)
                at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:73)
                at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
                at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
                at org.apache.lucene.store.OutputStreamIndexOutput.close(OutputStreamIndexOutput.java:68)
                at org.apache.lucene.store.RateLimitedIndexOutput.close(RateLimitedIndexOutput.java:49)
                at org.apache.lucene.util.IOUtils.close(IOUtils.java:97)
                at org.apache.lucene.util.IOUtils.close(IOUtils.java:84)
                at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.close(CompressingStoredFieldsWriter.java:138)
                at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:178)
                ... 6 more
                Suppressed: java.io.IOException: No space left on d

We've had this instances data drive fill up before without this issue being raised.

Steps to reproduce:

  1. Let elasticsearch fill up a drive
  2. Cry
@b1naryth1ef b1naryth1ef changed the title Elasticsearch fills log with Caused by: java.io.IOException: No space left on device errors Log filled with Caused by: java.io.IOException: No space left on device errors on data drive free space consumption Sep 7, 2016
@clintongormley clintongormley added feedback_needed :Core/Infra/Logging Log management and logging utilities labels Sep 7, 2016
@clintongormley
Copy link
Contributor

@b1naryth1ef What do you think we should do instead?

@b1naryth1ef
Copy link
Author

Generally I would expect any stable, reliable production database would trigger a few log lines indicating whats happening, and what state the database is entering.

@damageboy
Copy link

I expect it to be able to come up enough for me to be able to post a delete command

Or is that too much to ask of a database?

How am I supposed to be able to remove the "bad" indices if a filled up the drive causes a DATABASE to refuse to start itself?

In what world is this sane behaviour?

Can you describe a production database which touts around a major version of 5.x that refuses to start with the following beauty of a log, and is not considered as a total joke?

[2017-02-13T18:24:30,487][INFO ][o.e.n.Node               ] [elk-1] initializing ...
[2017-02-13T18:24:30,573][ERROR][o.e.b.Bootstrap          ] Exception
java.lang.IllegalStateException: Failed to created node environment
        at org.elasticsearch.node.Node.<init>(Node.java:255) ~[elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.node.Node.<init>(Node.java:232) ~[elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.bootstrap.Bootstrap$6.<init>(Bootstrap.java:241) ~[elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:241) ~[elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:333) [elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) [elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:112) [elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.cli.SettingCommand.execute(SettingCommand.java:54) [elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) [elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.cli.Command.main(Command.java:88) [elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:89) [elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:82) [elasticsearch-5.2.0.jar:5.2.0]
Caused by: java.io.IOException: No space left on device
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[?:?]
        at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) ~[?:?]
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[?:?]
        at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[?:?]
        at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211) ~[?:?]
        at java.nio.channels.Channels.writeFullyImpl(Channels.java:78) ~[?:1.8.0_121]
        at java.nio.channels.Channels.writeFully(Channels.java:101) ~[?:1.8.0_121]
        at java.nio.channels.Channels.access$000(Channels.java:61) ~[?:1.8.0_121]
        at java.nio.channels.Channels$1.write(Channels.java:174) ~[?:1.8.0_121]
        at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:73) ~[?:1.8.0_121]
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[?:1.8.0_121]
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[?:1.8.0_121]
        at org.apache.lucene.store.OutputStreamIndexOutput.getChecksum(OutputStreamIndexOutput.java:80) ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b845
5f23ddc1364124 - jim - 2017-01-17 15:57:29]
        at org.apache.lucene.codecs.CodecUtil.writeCRC(CodecUtil.java:542) ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-0
1-17 15:57:29]
        at org.apache.lucene.codecs.CodecUtil.writeFooter(CodecUtil.java:390) ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 201
7-01-17 15:57:29]
        at org.elasticsearch.gateway.MetaDataStateFormat.write(MetaDataStateFormat.java:140) ~[elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.env.NodeEnvironment.loadOrCreateNodeMetaData(NodeEnvironment.java:407) ~[elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:262) ~[elasticsearch-5.2.0.jar:5.2.0]
        at org.elasticsearch.node.Node.<init>(Node.java:252) ~[elasticsearch-5.2.0.jar:5.2.0]
        ... 11 more
        Suppressed: java.io.IOException: No space left on device
                at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[?:?]
                at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) ~[?:?]
                at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[?:?]
                at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[?:?]
                at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211) ~[?:?]
                at java.nio.channels.Channels.writeFullyImpl(Channels.java:78) ~[?:1.8.0_121]
                at java.nio.channels.Channels.writeFully(Channels.java:101) ~[?:1.8.0_121]
                at java.nio.channels.Channels.access$000(Channels.java:61) ~[?:1.8.0_121]
                at java.nio.channels.Channels$1.write(Channels.java:174) ~[?:1.8.0_121]
                at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:73) ~[?:1.8.0_121]
                at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[?:1.8.0_121]
                at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[?:1.8.0_121]
                at org.apache.lucene.store.OutputStreamIndexOutput.close(OutputStreamIndexOutput.java:68) ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8
455f23ddc1364124 - jim - 2017-01-17 15:57:29]
                at org.elasticsearch.gateway.MetaDataStateFormat.write(MetaDataStateFormat.java:141) ~[elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.env.NodeEnvironment.loadOrCreateNodeMetaData(NodeEnvironment.java:407) ~[elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:262) ~[elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.node.Node.<init>(Node.java:252) ~[elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.node.Node.<init>(Node.java:232) ~[elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.bootstrap.Bootstrap$6.<init>(Bootstrap.java:241) ~[elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:241) ~[elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:333) [elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) [elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:112) [elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.cli.SettingCommand.execute(SettingCommand.java:54) [elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) [elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.cli.Command.main(Command.java:88) [elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:89) [elasticsearch-5.2.0.jar:5.2.0]
                at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:82) [elasticsearch-5.2.0.jar:5.2.0]
                Suppressed: java.io.IOException: No space left on device
                        at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[?:?]
                        at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) ~[?:?]
                        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[?:?]
                        at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[?:?]
                        at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211) ~[?:?]
                        at java.nio.channels.Channels.writeFullyImpl(Channels.java:78) ~[?:1.8.0_121]
                        at java.nio.channels.Channels.writeFully(Channels.java:101) ~[?:1.8.0_121]
                        at java.nio.channels.Channels.access$000(Channels.java:61) ~[?:1.8.0_121]
                        at java.nio.channels.Channels$1.write(Channels.java:174) ~[?:1.8.0_121]
                        at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:73) ~[?:1.8.0_121]
                        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[?:1.8.0_121]
                        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[?:1.8.0_121]
                        at java.io.FilterOutputStream.close(FilterOutputStream.java:158) ~[?:1.8.0_121]
                        at org.apache.lucene.store.OutputStreamIndexOutput.close(OutputStreamIndexOutput.java:70) ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29]
                        at org.elasticsearch.gateway.MetaDataStateFormat.write(MetaDataStateFormat.java:141) ~[elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.env.NodeEnvironment.loadOrCreateNodeMetaData(NodeEnvironment.java:407) ~[elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:262) ~[elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.node.Node.<init>(Node.java:252) ~[elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.node.Node.<init>(Node.java:232) ~[elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.bootstrap.Bootstrap$6.<init>(Bootstrap.java:241) ~[elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:241) ~[elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:333) [elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) [elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:112) [elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.cli.SettingCommand.execute(SettingCommand.java:54) [elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) [elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.cli.Command.main(Command.java:88) [elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:89) [elasticsearch-5.2.0.jar:5.2.0]
                        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:82) [elasticsearch-5.2.0.jar:5.2.0]

I'm done with this crap.
Thanks for the fish.

@clintongormley
Copy link
Contributor

No useful suggestions presented - closing

@b1naryth1ef
Copy link
Author

@clintongormley what exactly are you looking for in the form of suggestions? Might I offer the following solutions:

  • Don't spam logs when the data disk is full
  • Actually maybe just don't ever spam logs

@nullpixel
Copy link

Wait what. In what world of production databases do people do this shit

@khionu
Copy link

khionu commented Mar 31, 2017

I think the underlying issue is that, on an exception that clear is saying "stop trying to use disk space"..... You're still doing something that is attempting to use Disk Space. While I would assume that that is a pretty obvious evaluation, I would further more assume that the course of action is also pretty obvious: this is not an exception that might go away the next time the function is ran, something is critically wrong in the environment, and this occurrence should be treated as such.

I won't pretend to be an expert in enterprise-grade software, but I would assume that critically failures in needed parts of the runtime environment would warrant a crash. Can't exactly keep hopping when the ground has been taken from under you, so why bother?

@rjernst
Copy link
Member

rjernst commented Mar 31, 2017

The disk that logs are written to is not necessarily the same that data is written to (which is where the exception above comes from). Are you suggesting we should not try to log/output anything when there is a critical exception causing the system to exit? If you have a concrete suggestion for cleaner behavior, please open a new issue detailing the feature request.

@nullpixel
Copy link

You could just make the logs not spammy.

Kill the process, and just simply say that it crashed because of no disk space

@khionu
Copy link

khionu commented Apr 1, 2017

The disk that logs are written to is not necessarily the same that data is written to (which is where the exception above comes from).

If the log writing was the source of the OOS Exception, b1nzy wouldn't have a 28GB log file filled with the exception spam.

I would do as NullPixel said, and I said in my second paragraph. If the program is encountering an issue that clearly means it's failing to operate properly, meaning, cannot be recovered while the process is live, you don't continue to operate.

I understand that enterprise-grade software needs to be reliable, but this is the result of a problem outside the scope of the software, that cannot be recovered from while the program is running. What else is there to do? Besides spamming the logs?

@nullpixel
Copy link

Got sick of this being left, opened #24299

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Logging Log management and logging utilities feedback_needed
Projects
None yet
Development

No branches or pull requests

6 participants