Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output: Backup for bad chunk #1856

Closed
repeatedly opened this issue Feb 14, 2018 · 5 comments · Fixed by #1952
Closed

Output: Backup for bad chunk #1856

repeatedly opened this issue Feb 14, 2018 · 5 comments · Fixed by #1952
Labels
feature request *Deprecated Label* Use enhancement label in general v1

Comments

@repeatedly
Copy link
Member

Fluentd's output plugin somtimes hit un-recoverable error during chunk flush.

  • chunk contains wrong record for output configuration
  • output plugin has a bug for specific record
  • broken chunk is generated by hardware problem
  • wrong setup for destination

Currently, we use retry limit and secondary for handling these chunks but it has several problems.

  • bad chunk occupy flush threads until reach retry limit
  • non retry limit environment can't rescue bad chunk

So we should care bad chunk for stability and performance.
The idea is if output plugin raises un-recoverable error during chunk flush, such chunks are routed to backup directory.

  • The recoverable errors are TypeError, NoMethodError, ArgumentError, etc
  • Use UnrecoverableError for plugin specific error

In addition, <system> directive provide backup_dir parameter. The default is /tmp/fluentd.

@mururu
Copy link
Member

mururu commented Mar 1, 2018

Is secondary ignored when UnrecoverableError is thrown? The behaviour looks good for "broken chunk" case, but not good for "wrong setup for destination" case.
I think that skipping retry of the output plugin is enough when UnrecoverableError is thrown. It is because secondary output plugin can also raise UnrecoverableError if they can't handle the chunk.

@repeatedly
Copy link
Member Author

repeatedly commented Apr 3, 2018

not good for "wrong setup for destination" case.

Yes. backup feature is for bad chunk, not for wrong setup. If plugin re-raise an error for wrong setup, it is routed to backup directory. I think wrong setup should be found during configuration or start, e.g. S3 pluing check API key at start.

I think that skipping retry of the output plugin is enough when UnrecoverableError is thrown.

There are 2 cases:

  • secondary plugin is same as primary. In this case, secondary process should be skipped because same error happens inside secondary.
  • secondary plugin is different from primary. In this case, secondary should handle bad chunk first.

To change secondary usage, adding option is one idea, force_secondary or similar parameter.

@repeatedly
Copy link
Member Author

Patch is here: #1952

@artbeglaryan
Copy link

Hello @repeatedly. Maybe this is the wrong place, but I didn't find any information in existing issues or in documentation. I have my own output plugin, which do some specific job with buffered data via network. Sometimes there may be some issue with network and my plugin failed to flush data after reaching retry_wait, retry_max_interval, retry_max_times and put that buffer chunk in secondary output plugin which is file. So I didn't lose any data. I have that chunks in my backup directory. Is there any way(I really didn't find anywhere documented or answered) to process that chunks later with the same plugin? by hand, by some external command or maybe there is some existing plugin which can process that data again?

@repeatedly
Copy link
Member Author

@artbeglaryan Currently, writing script is better for it. Here is an example:

https://groups.google.com/d/msg/fluentd/6Pn4XDOPxoU/CiYFkJXXfAEJ

I will add this to documetation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request *Deprecated Label* Use enhancement label in general v1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants