-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[POC] Fleet agent concurrency #70495
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,6 +13,11 @@ import { | |
PluginInitializerContext, | ||
SavedObjectsServiceStart, | ||
HttpServiceSetup, | ||
KibanaRequest, | ||
LifecycleResponseFactory, | ||
OnPreAuthToolkit, | ||
OnPreResponseToolkit, | ||
OnPreResponseInfo, | ||
} from 'kibana/server'; | ||
import { LicensingPluginSetup, ILicense } from '../../licensing/server'; | ||
import { | ||
|
@@ -45,7 +50,7 @@ import { | |
registerSettingsRoutes, | ||
registerAppRoutes, | ||
} from './routes'; | ||
import { IngestManagerConfigType, NewDatasource } from '../common'; | ||
import { IngestManagerConfigType, NewDatasource, AGENT_ROUTE_TAG } from '../common'; | ||
import { | ||
appContextService, | ||
licenseService, | ||
|
@@ -152,6 +157,35 @@ export class IngestManagerPlugin | |
} | ||
|
||
public async setup(core: CoreSetup, deps: IngestManagerSetupDeps) { | ||
const maxConcurrentRequests = 1; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All this logic was put here out of laziness, it should really be placed in another file... |
||
let concurrentRequests = 0; | ||
const isAgentRequest = (request: KibanaRequest) => { | ||
const tags = request.route.options.tags; | ||
return tags.includes(AGENT_ROUTE_TAG); | ||
}; | ||
core.http.registerOnPreAuth( | ||
(request: KibanaRequest, response: LifecycleResponseFactory, toolkit: OnPreAuthToolkit) => { | ||
if (!isAgentRequest(request)) { | ||
return toolkit.next(); | ||
} | ||
|
||
if (concurrentRequests >= maxConcurrentRequests) { | ||
return response.customError({ body: 'Too Many Agents', statusCode: 429 }); | ||
} | ||
|
||
concurrentRequests += 1; | ||
return toolkit.next(); | ||
} | ||
); | ||
core.http.registerOnPreResponse( | ||
(request: KibanaRequest, preResponse: OnPreResponseInfo, toolkit: OnPreResponseToolkit) => { | ||
if (isAgentRequest(request) && preResponse.statusCode !== 429) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This assumes that only the pre-auth handler could cause a response with status code 429, which isn't absolutely certain. It's possible that one of the HTTP route handlers is replying with a 429 itself, or propagating a 429 from Elasticsearch, which would mess up the counter. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we wrap There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. something along lines (not tested): import { RequestHandlerWrapper } from 'src/core/server';
class Lock {
constructor(private readonly maxConnections: number = 1) {}
private counter = 0;
increase() {
this.counter += 1;
}
decrease() {
this.counter += 1;
}
canHandle() {
return this.counter < this.maxConnections;
}
}
const lock = new Lock();
export const concurrencyLimit: RequestHandlerWrapper = (handler) => {
return async (context, request, response) => {
if (!lock.canHandle()) {
return response.customError({ body: 'Too Many Agents', statusCode: 429 });
}
try {
lock.increase();
return handler(context, request, response);
} finally {
lock.decrease();
}
};
};
concurrencyLimit(postAgentEnrollHandler);
concurrencyLimit(postAgentCheckinHandler); There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this doesn't let us reject traffic without validating API Keys as described here, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah... I haven't seen the comment. Now I'm surprised that the auth has such a huge penalty.
@kobelb for what use-cases we should consider this as a critical option? Auth & spaces do a huge number of requests to ES on every page load. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fleet's usage of Kibana APIs is different than our traditional usage. The Elastic Agent will be using Kibana APIs to enroll themselves and retrieve their configuration. As such, we're potentially dealing with thousands of Elastic Agents hitting these APIs... Whether or not this is "critical" is debatable and largely dependent on what the ingest-management team is seeing during their load-testing, but skipping auth reduces the load on Kibana when this circuit breaker is hit. @roncohen despite the current implementation being imperfect and potentially misinterpreting the
I don't think it will, Kibana will still have to make a query to Elasticsearch to return the server-side session document to authenticate the user.
Any call to an Elasticsearch API will incur some performance penalty. However, there are differences in the caching strategy for API Keys vs username/password. Regardless of the type of credentials, performing any unnecessary work before their circuit breaker has a chance to short-circuit the operation is inefficient. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
+1 I'd love to adjust the concurrency value (what should the value be) and merge this into master ASAP so we can test in cloud as we did with long polling |
||
concurrentRequests -= 1; | ||
} | ||
|
||
return toolkit.next(); | ||
} | ||
); | ||
this.httpSetup = core.http; | ||
this.licensing$ = deps.licensing.license$; | ||
if (deps.security) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
onRequest
lifecycle event occurs prior to "route lookup", which prevented the use ofrequest.route.options.tags
within the pre-auth handler to determine whether or not the route is a fleet-agent specific. Changing this fromonRequest
toonPreAuth
fixes this specific issue, but potentially introduces others.@restrry this was originally using
onRequest
as part of #36690, are you aware of any reason why this can't be changed toonPreAuth
instead?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
onRequest was required to support Spaces rewriting url.
kibana/x-pack/plugins/spaces/server/lib/request_interceptors/on_request_interceptor.ts
Line 47 in 40a456d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, thanks! Could we theoretically change
http.registerOnPreAuth
to useonPreAuth
, and introduce ahttp.registerOnPreRouting
which usesonRequest
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it shouldn't be that hard. I want to make sure it's necessary to extend API for the current implementation. Let me know and the platform team can provide an interceptor implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kobelb I put up PR #70775