Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filebeat] Okta Module Crashing Frequently #18530

Closed
SpencerLN opened this issue May 14, 2020 · 6 comments
Closed

[Filebeat] Okta Module Crashing Frequently #18530

SpencerLN opened this issue May 14, 2020 · 6 comments
Labels

Comments

@SpencerLN
Copy link
Contributor

After enabling the Okta module in Filebeat 7.7.0 we have been observing frequent crashes with the error message panic: non-positive interval for NewTicker

Debug logs:

2020-05-14T15:52:45.184Z	DEBUG	[httpjson]	httpjson/input.go:242	Rate Limit: Wait until 2020-05-14 15:52:47 +0000 UTC for the rate limit to reset.	{"url": "https://instance.okta.com/api/v1/logs"}
2020-05-14T15:52:45.600Z	DEBUG	[elasticsearch]	elasticsearch/client.go:217	PublishEvents: 50 events have been published to elasticsearch in 66.640319ms.
2020-05-14T15:52:45.600Z	DEBUG	[publisher]	memqueue/ackloop.go:160	ackloop: receive ack [236: 0, 50]
2020-05-14T15:52:45.601Z	DEBUG	[publisher]	memqueue/eventloop.go:535	broker ACK events: count=50, start-seq=11701, end-seq=11750
2020-05-14T15:52:45.601Z	DEBUG	[publisher]	memqueue/ackloop.go:128	ackloop: return ack to broker loop:50
2020-05-14T15:52:45.601Z	DEBUG	[acker]	beater/acker.go:69	stateless ack	{"count": 50}
2020-05-14T15:52:45.601Z	DEBUG	[publisher]	memqueue/ackloop.go:131	ackloop:  done send ack
2020-05-14T15:52:45.697Z	DEBUG	[elasticsearch]	elasticsearch/client.go:217	PublishEvents: 50 events have been published to elasticsearch in 96.090322ms.
2020-05-14T15:52:45.699Z	DEBUG	[publisher]	memqueue/ackloop.go:160	ackloop: receive ack [237: 0, 50]
2020-05-14T15:52:45.699Z	DEBUG	[publisher]	memqueue/eventloop.go:535	broker ACK events: count=50, start-seq=11751, end-seq=11800
2020-05-14T15:52:45.699Z	DEBUG	[publisher]	memqueue/ackloop.go:128	ackloop: return ack to broker loop:50
2020-05-14T15:52:45.699Z	DEBUG	[publisher]	memqueue/ackloop.go:131	ackloop:  done send ack
2020-05-14T15:52:45.699Z	DEBUG	[acker]	beater/acker.go:69	stateless ack	{"count": 50}
2020-05-14T15:52:45.799Z	DEBUG	[elasticsearch]	elasticsearch/client.go:217	PublishEvents: 50 events have been published to elasticsearch in 101.995502ms.
2020-05-14T15:52:45.799Z	DEBUG	[publisher]	memqueue/ackloop.go:160	ackloop: receive ack [238: 0, 50]
2020-05-14T15:52:45.799Z	DEBUG	[publisher]	memqueue/eventloop.go:535	broker ACK events: count=50, start-seq=11801, end-seq=11850
2020-05-14T15:52:45.799Z	DEBUG	[publisher]	memqueue/ackloop.go:128	ackloop: return ack to broker loop:50
2020-05-14T15:52:45.800Z	DEBUG	[publisher]	memqueue/ackloop.go:131	ackloop:  done send ack
2020-05-14T15:52:45.800Z	DEBUG	[acker]	beater/acker.go:69	stateless ack	{"count": 50}
2020-05-14T15:52:45.913Z	DEBUG	[elasticsearch]	elasticsearch/client.go:217	PublishEvents: 50 events have been published to elasticsearch in 113.897908ms.
2020-05-14T15:52:45.913Z	DEBUG	[publisher]	memqueue/ackloop.go:160	ackloop: receive ack [239: 0, 50]
2020-05-14T15:52:45.913Z	DEBUG	[publisher]	memqueue/eventloop.go:535	broker ACK events: count=50, start-seq=11851, end-seq=11900
2020-05-14T15:52:45.913Z	DEBUG	[publisher]	memqueue/ackloop.go:128	ackloop: return ack to broker loop:50
2020-05-14T15:52:45.913Z	DEBUG	[publisher]	memqueue/ackloop.go:131	ackloop:  done send ack
2020-05-14T15:52:45.914Z	DEBUG	[acker]	beater/acker.go:69	stateless ack	{"count": 50}
2020-05-14T15:52:47.000Z	DEBUG	[httpjson]	httpjson/input.go:250	Rate Limit: time is up.	{"url": "https://instance.okta.com/api/v1/logs"}
2020-05-14T15:52:47.000Z	INFO	[httpjson]	httpjson/input.go:366	Continuing with pagination to URL: https://instance.okta.com/api/v1/logs?after=1588875667507_1	{"url": "https://instance.okta.com/api/v1/logs"}
2020-05-14T15:52:47.194Z	DEBUG	[httpjson]	httpjson/input.go:302	HTTP request failed	{"url": "https://instance.okta.com/api/v1/logs", "http.response.status_code": 429, "http.response.body": "{\"errorCode\":\"E0000047\",\"errorSummary\":\"API call exceeded rate limit due to too many requests.\",\"errorLink\":\"E0000047\",\"errorId\":\"oae26ps49lmR7m6OmE6oSGLyA\",\"errorCauses\":[]}"}
2020-05-14T15:52:47.194Z	DEBUG	[httpjson]	httpjson/input.go:242	Rate Limit: Wait until 2020-05-14 15:52:47 +0000 UTC for the rate limit to reset.	{"url": "https://instance.okta.com/api/v1/logs"}
2020-05-14T15:52:47.194Z	INFO	[httpjson]	runtime/panic.go:679	httpjson input worker has stopped.	{"url": "https://instance.okta.com/api/v1/logs"}
panic: non-positive interval for NewTicker
goroutine 272 [running]:
time.NewTicker(0xfffffffff469c21f, 0xed64f604f)
	/usr/local/go/src/time/tick.go:23 +0x147
github.com/elastic/beats/v7/x-pack/filebeat/input/httpjson.(*HttpjsonInput).applyRateLimit(0xc0006460f0, 0x3e13a20, 0xc000654140, 0xc001940060, 0xc0004c5560, 0x0, 0x0)
	/go/src/github.com/elastic/beats/x-pack/filebeat/input/httpjson/input.go:243 +0x1a2
github.com/elastic/beats/v7/x-pack/filebeat/input/httpjson.(*HttpjsonInput).processHTTPRequest(0xc0006460f0, 0x3e13a20, 0xc000654140, 0xc0006425a0, 0xc0005b9e08, 0xdf8475800, 0x3da8ae0)
	/go/src/github.com/elastic/beats/x-pack/filebeat/input/httpjson/input.go:304 +0x337
github.com/elastic/beats/v7/x-pack/filebeat/input/httpjson.(*HttpjsonInput).run(0xc0006460f0, 0x0, 0x0)
	/go/src/github.com/elastic/beats/x-pack/filebeat/input/httpjson/input.go:425 +0x38f
github.com/elastic/beats/v7/x-pack/filebeat/input/httpjson.(*HttpjsonInput).Run.func1.1(0xc0006460f0)
	/go/src/github.com/elastic/beats/x-pack/filebeat/input/httpjson/input.go:122 +0x1d6
created by github.com/elastic/beats/v7/x-pack/filebeat/input/httpjson.(*HttpjsonInput).Run.func1
	/go/src/github.com/elastic/beats/x-pack/filebeat/input/httpjson/input.go:117 +0x63

Config:

   filebeat.yml: |-
    filebeat.modules:
    - module: okta
      system:
        enabled: true
        # API key to access Okta
        var.api_key: "${OKTA_API_KEY}"

        # URL of the Okta REST API
        var.url: ${OKTA_API_URL}

        var.authentication_scheme: "SSWS"
    setup.template.enabled: false

    output.elasticsearch:
        hosts: 'https://${ELASTICSEARCH_ADDRESS}'
        username: ${ELASTICSEARCH_USERNAME}
        password: ${ELASTICSEARCH_PASSWORD}
        ssl:
          enabled: true
          supported_protocols: TLSv1.2
    logging.level: debug

For confirmed bugs, please report:

  • Version: 7.7.0
  • Operating System: GKE CoreOS
@SpencerLN SpencerLN added the Filebeat Filebeat label May 14, 2020
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label May 14, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/siem (Team:SIEM)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 14, 2020
@SpencerLN SpencerLN added the bug label May 14, 2020
alakahakai pushed a commit that referenced this issue May 14, 2020
* Add check for time because time.Until does not correctly handle a time that is earlier than time.Now().

* Update getRateLimit to handle a rate limit time is earlier than now.

* Add some additional safe guard to check Ticker

* Add descriptions for getRateLimit function and test functions.

* Update CHANGELOG.next.asciidoc
alakahakai pushed a commit that referenced this issue May 15, 2020
alakahakai pushed a commit that referenced this issue May 15, 2020
* Add check for time because time.Until does not correctly handle a time that is earlier than time.Now().

* Update getRateLimit to handle a rate limit time is earlier than now.

* Add some additional safe guard to check Ticker

* Add descriptions for getRateLimit function and test functions.

* Update CHANGELOG.next.asciidoc

(cherry picked from commit db62862)
v1v added a commit to v1v/beats that referenced this issue May 15, 2020
…w-oss

* upstream/master: (27 commits)
  Disable host fields for "cloud", panw, cef modules (elastic#18223)
  [docs] Rename monitoring collection from legacy internal collection to legacy collection (elastic#18504)
  Introduce auto detection of format (elastic#18095)
  Add additional fields to address issue elastic#18465 for googlecloud audit log (elastic#18472)
  Fix libbeat import path in seccomp policy template (elastic#18418)
  Address Okta input issue elastic#18530 (elastic#18534)
  [Ingest Manager] Avoid Chown on windows (elastic#18512)
  Fix Cisco ASA/FTD msgs that use a host name as NAT address (elastic#18376)
  [CI] Optimise stash/unstash performance (elastic#18473)
  Libbeat: Remove global loggers from libbeat/metric and libbeat/cloudid (elastic#18500)
  Fix PANW bad mapping of client/source and server/dest packets and bytes (elastic#18525)
  Add a file lock to the data directory on startup to prevent multiple agents. (elastic#18483)
  Followup to 12606 (elastic#18316)
  changed input from syslog to tcp/udp due to unsupported RFC (elastic#18447)
  Improve ECS field mappings in Sysmon module. (elastic#18381)
  [Elastic Agent] Cleaner output of inspect command  (elastic#18405)
  [Elastic Agent] Pick up version from libbeat (elastic#18350)
  Update communitybeats.asciidoc (elastic#18470)
  [Metricbeat] Change visualization interval from 15m to >=15m (elastic#18466)
  docs: Fix typo in kerberos docs (elastic#18503)
  ...
@SpencerLN
Copy link
Contributor Author

It appears that this issue is still occurring after upgrade to 7.8.0:

2020-06-25T20:01:47.000Z	INFO	[httpjson]	httpjson/input.go:366	Continuing with pagination to URL: https://instance.okta.com/api/v1/logs?after=1592620709654_1	{"url": "https://instance.okta.com/api/v1/logs"}
2020-06-25T20:01:47.039Z	DEBUG	[input]	input/input.go:141	Run input
2020-06-25T20:01:47.175Z	DEBUG	[httpjson]	httpjson/input.go:302	HTTP request failed	{"url": "https://instance.okta.com/api/v1/logs", "http.response.status_code": 429, "http.response.body": "{\"errorCode\":\"E0000047\",\"errorSummary\":\"API call exceeded rate limit due to too many requests.\",\"errorLink\":\"E0000047\",\"errorId\":\"oaeWBRMYbglSkiPyH-qA_nmow\",\"errorCauses\":[]}"}
2020-06-25T20:01:47.175Z	DEBUG	[httpjson]	httpjson/input.go:242	Rate Limit: Wait until 2020-06-25 20:01:47 +0000 UTC for the rate limit to reset.	{"url": "https://instance.okta.com/api/v1/logs"}
2020-06-25T20:01:47.175Z	INFO	[httpjson]	runtime/panic.go:679	httpjson input worker has stopped.	{"url": "https://instance.okta.com/api/v1/logs"}
panic: non-positive interval for NewTicker

goroutine 260 [running]:
time.NewTicker(0xfffffffff586a07f, 0xed686f9ab)
	/usr/local/go/src/time/tick.go:23 +0x147
github.com/elastic/beats/v7/x-pack/filebeat/input/httpjson.(*HttpjsonInput).applyRateLimit(0xc0006140f0, 0x3f6d120, 0xc0006082c0, 0xc0018af200, 0xc00047ba40, 0x0, 0x0)
	/go/src/github.com/elastic/beats/x-pack/filebeat/input/httpjson/input.go:243 +0x1a2
github.com/elastic/beats/v7/x-pack/filebeat/input/httpjson.(*HttpjsonInput).processHTTPRequest(0xc0006140f0, 0x3f6d120, 0xc0006082c0, 0xc0005eb170, 0xc001ed9e08, 0xdf8475800, 0x3eff400)
	/go/src/github.com/elastic/beats/x-pack/filebeat/input/httpjson/input.go:304 +0x337
github.com/elastic/beats/v7/x-pack/filebeat/input/httpjson.(*HttpjsonInput).run(0xc0006140f0, 0x0, 0x0)
	/go/src/github.com/elastic/beats/x-pack/filebeat/input/httpjson/input.go:425 +0x38f
github.com/elastic/beats/v7/x-pack/filebeat/input/httpjson.(*HttpjsonInput).Run.func1.1(0xc0006140f0)
	/go/src/github.com/elastic/beats/x-pack/filebeat/input/httpjson/input.go:122 +0x1d6
created by github.com/elastic/beats/v7/x-pack/filebeat/input/httpjson.(*HttpjsonInput).Run.func1
	/go/src/github.com/elastic/beats/x-pack/filebeat/input/httpjson/input.go:117 +0x63

@vavra5
Copy link

vavra5 commented Jun 26, 2020

I'm experiencing the same issue. My configuration is same as provided above. I'm using docker container filebeat:7.8.0.

2020-06-26T15:04:33.000Z	INFO	[httpjson]	httpjson/input.go:366	Continuing with pagination to URL: https://my.okta.com/api/v1/logs?after=1592593036919_1	{"url": "https://my.okta.com/api/v1/logs"}
2020-06-26T15:04:33.094Z	INFO	[httpjson]	runtime/panic.go:679	httpjson input worker has stopped.	{"url": "https://my.okta.com/api/v1/logs"}
panic: non-positive interval for NewTicker
goroutine 310 [running]:
time.NewTicker(0xfffffffffa5cf692, 0xed6880581)
	/usr/local/go/src/time/tick.go:23 +0x147
github.com/elastic/beats/v7/x-pack/filebeat/input/httpjson.(*HttpjsonInput).applyRateLimit(0xc00049c3c0, 0x3f6d120, 0xc000866c40, 0xc001239e30, 0xc000782d20, 0x0, 0x0)
	/go/src/github.com/elastic/beats/x-pack/filebeat/input/httpjson/input.go:243 +0x1a2
github.com/elastic/beats/v7/x-pack/filebeat/input/httpjson.(*HttpjsonInput).processHTTPRequest(0xc00049c3c0, 0x3f6d120, 0xc000866c40, 0xc00086bb00, 0xc001b61e08, 0xdf8475800, 0x3eff400)
	/go/src/github.com/elastic/beats/x-pack/filebeat/input/httpjson/input.go:304 +0x337
github.com/elastic/beats/v7/x-pack/filebeat/input/httpjson.(*HttpjsonInput).run(0xc00049c3c0, 0x0, 0x0)
	/go/src/github.com/elastic/beats/x-pack/filebeat/input/httpjson/input.go:425 +0x38f
github.com/elastic/beats/v7/x-pack/filebeat/input/httpjson.(*HttpjsonInput).Run.func1.1(0xc00049c3c0)
	/go/src/github.com/elastic/beats/x-pack/filebeat/input/httpjson/input.go:122 +0x1d6
created by github.com/elastic/beats/v7/x-pack/filebeat/input/httpjson.(*HttpjsonInput).Run.func1
	/go/src/github.com/elastic/beats/x-pack/filebeat/input/httpjson/input.go:117 +0x63

alakahakai pushed a commit that referenced this issue Jun 29, 2020
* Add check for time because time.Until does not correctly handle a time that is earlier than time.Now().

* Update getRateLimit to handle a rate limit time is earlier than now.

* Add some additional safe guard to check Ticker

* Add descriptions for getRateLimit function and test functions.

* Update CHANGELOG.next.asciidoc

(cherry picked from commit db62862)
@SpencerLN
Copy link
Contributor Author

@alakahakai it looks like this is being corrected by handling a specific rate-limit condition. Is there any chance of handling this by adding some sort of generic retry mechanism that makes the whole input more resilient to transient errors? Right now the input is prone to crashing when any sort of unexpected behavior occurs.

@SpencerLN
Copy link
Contributor Author

@alakahakai after upgrading to 7.8.1 we saw a significant improvement and the module is no longer crashing due to rate limiting constantly, but there appear to still be conditions that cause a crash. As mentioned above, it would be ideal to implement some sort of generic retry logic for retryable error conditions.

Our most recent instance, after running for ~12 hours:

2020-07-28T08:44:24.477Z	ERROR	[httpjson]	httpjson/input.go:123	http request was unsuccessful with a status code 504	{"url": "https://instance.okta.com/api/v1/logs"}
2020-07-28T08:44:24.477Z	INFO	[httpjson]	httpjson/input.go:124	httpjson input worker has stopped.	{"url": "https://instance.okta.com/api/v1/logs"}

@andrewkroh
Copy link
Member

Two fixes for the Okta module will be released in v7.9.0.

Then looking forward to v7.10.0 we plan to make the module be able to persist the last read event timestamp and be able to resume reading from the point on restart. Part of that work is in happening in #20226.

leweafan pushed a commit to leweafan/beats that referenced this issue Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants