Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logstash fails to restart after changing queue.page_capacity value #7581

Closed
shoggeh opened this issue Jun 30, 2017 · 3 comments · Fixed by #8628
Closed

Logstash fails to restart after changing queue.page_capacity value #7581

shoggeh opened this issue Jun 30, 2017 · 3 comments · Fixed by #8628
Assignees

Comments

@shoggeh
Copy link

shoggeh commented Jun 30, 2017

  • Version: 5.4.3
  • Operating System: Linux
  • Config File: queue.type: persisted
  • Steps to Reproduce:

TLDR; changing queue.page_capacity may lead to logstash being unable to start. The situation depends on whether any events were currently stored in on-disk queue. If you try to roll queue.page_capacity value change across logstash cluster you may end up breaking number of nodes. While reverting change will fix broken nodes it might break new ones.

Per @colinsurprenant request, followup for #7538 (rest of comment copy pasted)

  • I did modify queue.page_capacity size on logstash instance processing data,
  • I used non-zero values (usually 256mb, 512mb and 1024mb) as test for it.

And then situation was as follows:

  • EMPTY QUEUE: If nothing was stored in on-disk PQ at the moment of page_capacity size change, it was POSSIBLE to restart logstash without any issue. By "nothing" I mean _node/stats monitoring endpoint showing 0 under pipeline.queue.events

With example change from: 256 to 512mb:

PRIOR to change:

    "queue" : {
      "events" : 0,
      "type" : "persisted",
      "capacity" : {
        "page_capacity_in_bytes" : 262144000,

<< actual change >>

[2017-06-28T16:21:57,852][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
    "queue" : {
      "events" : 0,
      "type" : "persisted",
      "capacity" : {
        "page_capacity_in_bytes" : 536870912,

However the size of page file remained at it's former value:

-rw-r--r-- 1 logstash logstash 250M Jun 28 16:20 /var/lib/logstash/queue/main/page.505

END RESULT: Logstash resumed all of the operations properly.

  • SOMETHING IN QUEUE: However, if there only was anything stored in PQ then subsequent logstash restarts errored and logstash didn't start until I either:
    a) reverted queue.page_capacity to its former value
    b) manually removed contents of /var/lib/logstash/queue and let logstash to recreate it

PRIOR TO CHANGE:

    "queue" : {
      "events" : 22237,
      "type" : "persisted",
      "capacity" : {
        "page_capacity_in_bytes" : 262144000,

<< actual change >>

[2017-06-28T16:44:16,848][ERROR][logstash.agent           ] Cannot create pipeline {:reason=>"Page file size 262144000 different to configured page capacity 536870912 for /var/lib/logstash/queue/main/page.505"}

END RESULT: Logstash not starting.

If it's not possible to not enforce page_capacity size on already created pages, then maybe it's just worth mentioning in docs that queue needs to be drained first to allow you to change it?

@colinsurprenant colinsurprenant self-assigned this Nov 8, 2017
@colinsurprenant
Copy link
Contributor

I think we should just accept a page size change which will actually apply on the newly created pages, all existing pages will eventually get purged - I don't see the need to impose a purge process. I will go ahead and see if we can submit a simple change to that effect.

@colinsurprenant
Copy link
Contributor

PR in #8628

colinsurprenant added a commit to colinsurprenant/logstash that referenced this issue Nov 13, 2017
tests for capacity chage and page and queue level

remove dead STRICT_CAPACITY and remove unused @param comment
colinsurprenant added a commit to colinsurprenant/logstash that referenced this issue Nov 13, 2017
tests for capacity chage and page and queue level

remove dead STRICT_CAPACITY and remove unused @param comment
colinsurprenant added a commit that referenced this issue Nov 13, 2017
tests for capacity chage and page and queue level

remove dead STRICT_CAPACITY and remove unused @param comment
@colinsurprenant
Copy link
Contributor

Fixed by #8628 and will be included in the 6.1.0 release.

insukcho pushed a commit to insukcho/logstash that referenced this issue Feb 1, 2018
tests for capacity chage and page and queue level

remove dead STRICT_CAPACITY and remove unused @param comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants