Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return 503 while server is in wait-for-integration loop. #5949

Closed
Mpdreamz opened this issue Aug 16, 2021 · 3 comments · Fixed by #6130
Closed

Return 503 while server is in wait-for-integration loop. #5949

Mpdreamz opened this issue Aug 16, 2021 · 3 comments · Fixed by #6130
Labels

Comments

@Mpdreamz
Copy link
Member

Currently the server delays binding HTTP until it knows the downstream dependencies are set up.
To be precise it does this when apm-server.data_streams.wait_for_integration is set to true, also defaults to true when apm-server.data_streams.enable is set.

This has a potential knock on affect on agents trying to connect and timing out their batch operations.

If the server binds immediately but is able to inform the agents its not ready yet through 503 this could be remedied.

@graphaelli
Copy link
Member

graphaelli commented Aug 18, 2021

I think apm-server should queue events until full while waiting for the integration to be installed, just like it would if es was unavailable.

@Mpdreamz
Copy link
Member Author

On the fence about queueing, ES being unavailable is something that could potentially resolve itself.

Here we know ES/Kibana is available but not configured correctly and requiring manual configuration.

In cases where the stack is updated before the server the time to resolve the failure is low and a typical upgrade should have already installed the integration before upgrading standalone server.
In cases where the server is updated before the stack the time to resolve is high and accepting events during upgrade only to fail later feels trappy.

@axw
Copy link
Member

axw commented Sep 8, 2021

We can't know if the integration is installed or not if Elasticsearch is unavailable. I'd prefer to have fewer special cases, so I'm going to go with the enqueuing approach for now. I'm about to put up a PR which just blocks all Elasticsearch client requests (aside from the index template checks) until the integration package is known to be installed. The result of this is that the server will behave as if Elasticsearch is unavailable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants