Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] Fleet agent concurrency #70495

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/core/server/http/http_server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -282,7 +282,7 @@ export class HttpServer {
this.log.warn(`registerOnPreAuth called after stop`);
}

this.server.ext('onRequest', adoptToHapiOnPreAuthFormat(fn, this.log));
this.server.ext('onPreAuth', adoptToHapiOnPreAuthFormat(fn, this.log));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The onRequest lifecycle event occurs prior to "route lookup", which prevented the use of request.route.options.tags within the pre-auth handler to determine whether or not the route is a fleet-agent specific. Changing this from onRequest to onPreAuth fixes this specific issue, but potentially introduces others.

@restrry this was originally using onRequest as part of #36690, are you aware of any reason why this can't be changed to onPreAuth instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

onRequest was required to support Spaces rewriting url.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, thanks! Could we theoretically change http.registerOnPreAuth to use onPreAuth, and introduce a http.registerOnPreRouting which uses onRequest?

Copy link
Contributor

@mshustov mshustov Jul 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, thanks! Could we theoretically change http.registerOnPreAuth to use onPreAuth, and introduce a http.registerOnPreRouting which uses onRequest?

Yeah, it shouldn't be that hard. I want to make sure it's necessary to extend API for the current implementation. Let me know and the platform team can provide an interceptor implementation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kobelb I put up PR #70775

}

private registerOnPreResponse(fn: OnPreResponseHandler) {
Expand Down
2 changes: 2 additions & 0 deletions x-pack/plugins/ingest_manager/common/constants/agent.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,5 @@ export const AGENT_POLLING_THRESHOLD_MS = 30000;
export const AGENT_POLLING_INTERVAL = 1000;
export const AGENT_UPDATE_LAST_CHECKIN_INTERVAL_MS = 30000;
export const AGENT_UPDATE_ACTIONS_INTERVAL_MS = 5000;

export const AGENT_ROUTE_TAG = 'fleet:agent-route';
2 changes: 2 additions & 0 deletions x-pack/plugins/ingest_manager/server/constants/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ export {
SETUP_API_ROUTE,
SETTINGS_API_ROUTES,
APP_API_ROUTES,
// Route Tags
AGENT_ROUTE_TAG,
// Saved object types
AGENT_SAVED_OBJECT_TYPE,
AGENT_EVENT_SAVED_OBJECT_TYPE,
Expand Down
36 changes: 35 additions & 1 deletion x-pack/plugins/ingest_manager/server/plugin.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@ import {
PluginInitializerContext,
SavedObjectsServiceStart,
HttpServiceSetup,
KibanaRequest,
LifecycleResponseFactory,
OnPreAuthToolkit,
OnPreResponseToolkit,
OnPreResponseInfo,
} from 'kibana/server';
import { LicensingPluginSetup, ILicense } from '../../licensing/server';
import {
Expand Down Expand Up @@ -45,7 +50,7 @@ import {
registerSettingsRoutes,
registerAppRoutes,
} from './routes';
import { IngestManagerConfigType, NewDatasource } from '../common';
import { IngestManagerConfigType, NewDatasource, AGENT_ROUTE_TAG } from '../common';
import {
appContextService,
licenseService,
Expand Down Expand Up @@ -152,6 +157,35 @@ export class IngestManagerPlugin
}

public async setup(core: CoreSetup, deps: IngestManagerSetupDeps) {
const maxConcurrentRequests = 1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this logic was put here out of laziness, it should really be placed in another file...

let concurrentRequests = 0;
const isAgentRequest = (request: KibanaRequest) => {
const tags = request.route.options.tags;
return tags.includes(AGENT_ROUTE_TAG);
};
core.http.registerOnPreAuth(
(request: KibanaRequest, response: LifecycleResponseFactory, toolkit: OnPreAuthToolkit) => {
if (!isAgentRequest(request)) {
return toolkit.next();
}

if (concurrentRequests >= maxConcurrentRequests) {
return response.customError({ body: 'Too Many Agents', statusCode: 429 });
}

concurrentRequests += 1;
return toolkit.next();
}
);
core.http.registerOnPreResponse(
(request: KibanaRequest, preResponse: OnPreResponseInfo, toolkit: OnPreResponseToolkit) => {
if (isAgentRequest(request) && preResponse.statusCode !== 429) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that only the pre-auth handler could cause a response with status code 429, which isn't absolutely certain. It's possible that one of the HTTP route handlers is replying with a 429 itself, or propagating a 429 from Elasticsearch, which would mess up the counter.

The request object is different from the pre-auth to the pre-response, so I wasn't able to use a Set or WeakSet to track whether or not this was a 429 that we returned from within the pre-auth handler... Any other ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the preAuth handler make a value accessible to the preResponse handler?

We're moving towards using errors like FleetTooManyAgentsError so if there's any way to make new FleetTooManyAgentsError('Too Many Agents') available in the response handler it could do typeof FleetTooManyAgentsError

Copy link
Contributor

@mshustov mshustov Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we wrap Ingest manager specific route handlers in a function instead of adding global interceptors? https://github.com/elastic/kibana/pull/70495/files#diff-280f58825bd033ffbe7792f8423f6122R88
I'm also not sure that we want to provide access to response data in interceptors.

Copy link
Contributor

@mshustov mshustov Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something along lines (not tested):

import { RequestHandlerWrapper } from 'src/core/server';
class Lock {
  constructor(private readonly maxConnections: number = 1) {}
  private counter = 0;
  increase() {
    this.counter += 1;
  }
  decrease() {
    this.counter += 1;
  }
  canHandle() {
    return this.counter < this.maxConnections;
  }
}
const lock = new Lock();
export const concurrencyLimit: RequestHandlerWrapper = (handler) => {
  return async (context, request, response) => {
    if (!lock.canHandle()) {
      return response.customError({ body: 'Too Many Agents', statusCode: 429 });
    }
    try {
      lock.increase();
      return handler(context, request, response);
    } finally {
      lock.decrease();
    }
  };
};

concurrencyLimit(postAgentEnrollHandler);
concurrencyLimit(postAgentCheckinHandler);

Copy link
Contributor

@roncohen roncohen Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't let us reject traffic without validating API Keys as described here, right?

Copy link
Contributor

@mshustov mshustov Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I haven't seen the comment. Now I'm surprised that the auth has such a huge penalty.

As @kobelb mentioned, it's critical that we can reject incoming traffic as cheaply as possible. It translates to saved $$ for our customers, making us able to offer a more competitive solution.

@kobelb for what use-cases we should consider this as a critical option? Auth & spaces do a huge number of requests to ES on every page load.
Would server session help to reduce the number of requests? Does this problem exist for API keys only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fleet's usage of Kibana APIs is different than our traditional usage. The Elastic Agent will be using Kibana APIs to enroll themselves and retrieve their configuration. As such, we're potentially dealing with thousands of Elastic Agents hitting these APIs... Whether or not this is "critical" is debatable and largely dependent on what the ingest-management team is seeing during their load-testing, but skipping auth reduces the load on Kibana when this circuit breaker is hit.

@roncohen despite the current implementation being imperfect and potentially misinterpreting the 429, can we perform load-testing with the circuit breaking being done before authentication and after authentication to determine what type of impact this has on Fleet's scalability?

Would server session help to reduce the number of requests?

I don't think it will, Kibana will still have to make a query to Elasticsearch to return the server-side session document to authenticate the user.

Does this problem exist for API keys only?

Any call to an Elasticsearch API will incur some performance penalty. However, there are differences in the caching strategy for API Keys vs username/password. Regardless of the type of credentials, performing any unnecessary work before their circuit breaker has a chance to short-circuit the operation is inefficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roncohen despite the current implementation being imperfect and potentially misinterpreting the 429, can we perform load-testing with the circuit breaking being done before authentication and after authentication to determine what type of impact this has on Fleet's scalability?

+1 I'd love to adjust the concurrency value (what should the value be) and merge this into master ASAP so we can test in cloud as we did with long polling

concurrentRequests -= 1;
}

return toolkit.next();
}
);
this.httpSetup = core.http;
this.licensing$ = deps.licensing.license$;
if (deps.security) {
Expand Down
8 changes: 4 additions & 4 deletions x-pack/plugins/ingest_manager/server/routes/agent/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
*/

import { IRouter } from 'src/core/server';
import { PLUGIN_ID, AGENT_API_ROUTES } from '../../constants';
import { PLUGIN_ID, AGENT_API_ROUTES, AGENT_ROUTE_TAG } from '../../constants';
import {
GetAgentsRequestSchema,
GetOneAgentRequestSchema,
Expand Down Expand Up @@ -85,7 +85,7 @@ export const registerRoutes = (router: IRouter) => {
{
path: AGENT_API_ROUTES.CHECKIN_PATTERN,
validate: PostAgentCheckinRequestSchema,
options: { tags: [] },
options: { tags: [AGENT_ROUTE_TAG] },
},
postAgentCheckinHandler
);
Expand All @@ -95,7 +95,7 @@ export const registerRoutes = (router: IRouter) => {
{
path: AGENT_API_ROUTES.ENROLL_PATTERN,
validate: PostAgentEnrollRequestSchema,
options: { tags: [] },
options: { tags: [AGENT_ROUTE_TAG] },
},
postAgentEnrollHandler
);
Expand All @@ -105,7 +105,7 @@ export const registerRoutes = (router: IRouter) => {
{
path: AGENT_API_ROUTES.ACKS_PATTERN,
validate: PostAgentAcksRequestSchema,
options: { tags: [] },
options: { tags: [AGENT_ROUTE_TAG] },
},
postAgentAcksHandlerBuilder({
acknowledgeAgentActions: AgentService.acknowledgeAgentActions,
Expand Down