Skip to content

Commit

Permalink
@apollo/gateway: Make ApolloGateway.stop() reliable and required
Browse files Browse the repository at this point in the history
The (not particularly documented) ApolloGateway.stop() method didn't reliably
stop the gateway from polling. All it did was cancel a timeout. But if it is
called while pollServices is in the middle of running, it would do nothing, and
then pollServices would carry on and set another timeout.

Better semantics would be for stop() to reliably stop polling: allow the current
poll to complete, ensure that there will be no more polls, and then (async)
return. This PR implements those semantics, by implementing an explicit state
machine for ApolloGateway's polling.

One reason that these bugs were able to survive is that calling stop() is often
unnecessary. In apollographql/apollo-server#3105 we
chose to `unref` the polling timeout to allow the Node process to exit if it's
the only thing left on the event loop, instead of encouraging users of
`ApolloGateway` to be responsible and call `stop()` themselves. While this may
be reasonable when the gateway's lifecycle is merely a schema polling timer, we
may have future changes to the gateway where proper lifecycle handling is more
important. So this PR also moves away from the world where it's fine to not
bother to explicitly stop the gateway.

That said, in the common case, we don't want users to have to write gateway
stopping code. It makes more sense for stopping the `ApolloGateway` to be the
responsibility of `ApolloServer.stop()`, just as `ApolloServer.stop()` can
trigger events in plugins. So in the recently-released Apollo Server v2.20,
`ApolloServer.stop()` calls `ApolloGateway.stop()`. This should mean that in
most cases, the missing `unref` shouldn't keep the process running, as long as
you've run `ApolloServer.stop()` (and if you don't, it's likely that other parts
of the server are keeping your process running).

What if you're still running an old version of Apollo Server? For a bit of
backwards compatibility, `ApolloGateway` detects if it appears to be connected
to Apollo Server older than v2.18. If so, it continues to do the old
unref().  If you're using v2.18 or v2.19, the upgrade to v2.20 should be pretty
painless (it's mostly just minor bugfixes). If you're using ApolloGateway on its
own for some reason, and this change causes your processes to hang on shutdown,
adding a `stop()` call should be pretty straightforward. (If we learn that this
change really is devastating, we can always go back to an unconditional
timer.unref() later.)

Fixes #4428.
  • Loading branch information
glasser committed Feb 9, 2021
1 parent f75fbce commit d82e916
Show file tree
Hide file tree
Showing 4 changed files with 230 additions and 48 deletions.
9 changes: 8 additions & 1 deletion gateway-js/src/__tests__/gateway/lifecycle-hooks.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ describe('lifecycle hooks', () => {

expect(experimental_updateServiceDefinitions).toBeCalled();
expect(gateway.schema!.getType('Furniture')).toBeDefined();
await gateway.stop();
});

it('calls experimental_didFailComposition with a bad config', async () => {
Expand Down Expand Up @@ -181,6 +182,8 @@ describe('lifecycle hooks', () => {
// second call should have previous info in the second arg
expect(secondCall[1]!.schema).toBeDefined();
expect(secondCall[1]!.compositionMetadata!.schemaHash).toEqual('hash1');

await gateway.stop();
});

it('uses default service definition updater', async () => {
Expand All @@ -196,6 +199,8 @@ describe('lifecycle hooks', () => {
// updater, it has to use the default. If there's a valid schema, then
// the loader had to have been called.
expect(schema.getType('User')).toBeDefined();

await gateway.stop();
});

it('warns when polling on the default fetcher', async () => {
Expand Down Expand Up @@ -229,11 +234,12 @@ describe('lifecycle hooks', () => {
const schemaChangeCallback = jest.fn(() => resolve());

gateway.onSchemaChange(schemaChangeCallback);
gateway.load();
await gateway.load();

await schemaChangeBlocker;

expect(schemaChangeCallback).toBeCalledTimes(1);
await gateway.stop();
});

it('calls experimental_didResolveQueryPlan when executor is called', async () => {
Expand Down Expand Up @@ -261,5 +267,6 @@ describe('lifecycle hooks', () => {
});

expect(experimental_didResolveQueryPlan).toBeCalled();
await gateway.stop();
});
});
13 changes: 10 additions & 3 deletions gateway-js/src/__tests__/integration/configuration.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,15 @@ beforeEach(() => {
});

describe('gateway configuration warnings', () => {
let gateway: ApolloGateway | null = null;
afterEach(async () => {
if (gateway) {
await gateway.stop();
gateway = null;
}
});
it('warns when both csdl and studio configuration are provided', async () => {
const gateway = new ApolloGateway({
gateway = new ApolloGateway({
csdl: getTestingCsdl(),
logger,
});
Expand All @@ -69,7 +76,7 @@ describe('gateway configuration warnings', () => {
it('conflicting configurations are warned about when present', async () => {
mockSDLQuerySuccess(service);

const gateway = new ApolloGateway({
gateway = new ApolloGateway({
serviceList: [{ name: 'accounts', url: service.url }],
logger,
});
Expand All @@ -92,7 +99,7 @@ describe('gateway configuration warnings', () => {
mockImplementingServicesSuccess(service);
mockRawPartialSchemaSuccess(service);

const gateway = new ApolloGateway({
gateway = new ApolloGateway({
logger,
});

Expand Down
23 changes: 14 additions & 9 deletions gateway-js/src/__tests__/integration/networkRequests.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ const updatedService: MockService = {

let fetcher: typeof fetch;
let logger: Logger;
let gateway: ApolloGateway | null = null;

beforeEach(() => {
if (!nock.isActive()) nock.activate();
Expand All @@ -92,16 +93,20 @@ beforeEach(() => {
};
});

afterEach(() => {
afterEach(async () => {
expect(nock.isDone()).toBeTruthy();
nock.cleanAll();
nock.restore();
if (gateway) {
await gateway.stop();
gateway = null;
}
});

it('Queries remote endpoints for their SDLs', async () => {
mockSDLQuerySuccess(service);

const gateway = new ApolloGateway({
gateway = new ApolloGateway({
serviceList: [{ name: 'accounts', url: service.url }],
logger,
});
Expand All @@ -116,7 +121,7 @@ it('Extracts service definitions from remote storage', async () => {
mockImplementingServicesSuccess(service);
mockRawPartialSchemaSuccess(service);

const gateway = new ApolloGateway({ logger });
gateway = new ApolloGateway({ logger });

await gateway.load({
apollo: { keyHash: apiKeyHash, graphId, graphVariant: 'current' },
Expand Down Expand Up @@ -163,7 +168,7 @@ it.skip('Rollsback to a previous schema when triggered', async () => {
.mockImplementationOnce(() => secondResolve())
.mockImplementationOnce(() => thirdResolve());

const gateway = new ApolloGateway({ logger });
gateway = new ApolloGateway({ logger });
// @ts-ignore for testing purposes, a short pollInterval is ideal so we'll override here
gateway.experimental_pollInterval = 100;

Expand Down Expand Up @@ -204,7 +209,7 @@ it(`Retries GCS (up to ${GCS_RETRY_COUNT} times) on failure for each request and
failNTimes(GCS_RETRY_COUNT, () => mockRawPartialSchema(service));
mockRawPartialSchemaSuccess(service);

const gateway = new ApolloGateway({ fetcher, logger });
gateway = new ApolloGateway({ fetcher, logger });

await gateway.load({
apollo: { keyHash: apiKeyHash, graphId, graphVariant: 'current' },
Expand Down Expand Up @@ -255,7 +260,7 @@ describe('Downstream service health checks', () => {
mockSDLQuerySuccess(service);
mockServiceHealthCheckSuccess(service);

const gateway = new ApolloGateway({
gateway = new ApolloGateway({
logger,
serviceList: [{ name: 'accounts', url: service.url }],
serviceHealthCheck: true,
Expand Down Expand Up @@ -293,7 +298,7 @@ describe('Downstream service health checks', () => {

mockServiceHealthCheckSuccess(service);

const gateway = new ApolloGateway({ serviceHealthCheck: true, logger });
gateway = new ApolloGateway({ serviceHealthCheck: true, logger });

await gateway.load({
apollo: { keyHash: apiKeyHash, graphId, graphVariant: 'current' },
Expand Down Expand Up @@ -352,7 +357,7 @@ describe('Downstream service health checks', () => {
.mockImplementationOnce(() => resolve1())
.mockImplementationOnce(() => resolve2());

const gateway = new ApolloGateway({
gateway = new ApolloGateway({
serviceHealthCheck: true,
logger,
});
Expand Down Expand Up @@ -396,7 +401,7 @@ describe('Downstream service health checks', () => {
let resolve: () => void;
const schemaChangeBlocker = new Promise((res) => (resolve = res));

const gateway = new ApolloGateway({ serviceHealthCheck: true, logger });
gateway = new ApolloGateway({ serviceHealthCheck: true, logger });
// @ts-ignore for testing purposes, a short pollInterval is ideal so we'll override here
gateway.experimental_pollInterval = 100;

Expand Down
Loading

0 comments on commit d82e916

Please sign in to comment.