Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple file input cause loosing data #32

Closed
jordansissel opened this issue May 17, 2015 · 2 comments
Closed

Multiple file input cause loosing data #32

jordansissel opened this issue May 17, 2015 · 2 comments

Comments

@jordansissel
Copy link
Contributor

(This issue was originally filed by @arnauldvm at elastic/logstash#2135)


I'm loosing lots of records when using a setup like this:

input { file {
  path => 'D:/some/path/access/*.log'
  sincedb_path => "target/.sincedb.xxxxxxxx.log"
  start_position => "beginning"
 (...)
} }
(...)
output { elasticsearch {
  host => localhost
  index => "logstash-xxxxxxxx-%{+YYYY.MM.dd}"
} }

Results:

index name actual (wc file.log) actually indexed in elasticsearch
logstash-xxxxxxxx-2014.11.17 27493 27493
logstash-xxxxxxxx-2014.11.18 18428 18308
logstash-xxxxxxxx-2014.11.19 18871 695
logstash-xxxxxxxx-2014.11.20 15517 15399
logstash-xxxxxxxx-2014.11.21 23700 7897
logstash-xxxxxxxx-2014.11.22 1442 1442
logstash-xxxxxxxx-2014.11.23 1440 1380
logstash-xxxxxxxx-2014.11.24 10570 9042
logstash-xxxxxxxx.failed-1970.01.01 - 2

Moreover some log lines are truncated, leading to parsing errors (cf. ...failed... index).

If I do:

$ cat /D/some/path/access/*.log > /D/some/path/all_access.log

and adapt the logstash config accordingly, nothing is lost.

On another hand, keeping the multiple files input, but outputting to a file instead of elasticsearch, also leads to no loss of data !?

@jordansissel
Copy link
Contributor Author

I see two problems reported here:

  1. File input -> Elasticsearch output is missing some number of events in Elasticsearch
  2. Some lines are truncated.

For #1, I think this was a known issue that we weren't flushing the last chunk of events to Elasticsearch in a timely fashion, but this issue (and its fix) doesn't match with your logstash-xxxxxxxx-2014.11.19 exmaple which is missing nearly 17000 events.

For #2, I don't have any theories at this time. I wish to see your full (unedited, but sanitizing for safety/privacy is OK) logstash config to know more.

@guyboertje
Copy link
Contributor

Closing - not enough info and many code changes have occurred since.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants