Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQS long poll stays active after node process exits #2350

Closed
rickharrison opened this issue Nov 13, 2018 · 17 comments
Closed

SQS long poll stays active after node process exits #2350

rickharrison opened this issue Nov 13, 2018 · 17 comments
Labels
closing-soon This issue will automatically close in 4 days unless further comments are made. guidance Question that needs advice or information.

Comments

@rickharrison
Copy link

After exiting a node process (with control + c for example), it seems the long polling request does not clean up.

For example, with this code:

sqs.receiveMessage({ WaitTimeSeconds: 20 }, (err, data) => {
  console.log(data);
});

sqs.sendMessage({ MessageBody: 'foo' }, function () {
  
});

The first time this node process is run it outputs the message. However, if you hit control + c and run the script again right away, nothing is output and then in the SQS web console it shows 1 message as in flight. Is there a way to properly clean up receiveMessage on exit? I tried calling abort but that did not seem to work.

@rickharrison
Copy link
Author

rickharrison commented Nov 13, 2018

Here is a more elaborate script demonstrating the issue

const aws = require('aws-sdk');
const onExit = require('signal-exit');

const client = new aws.SQS({
  apiVersion: '2016-07-19',
  region: 'us-west-2',
  accessKeyId: '...',
  secretAccessKey: '...',
  params: {
    QueueUrl: '...'
  }
});

let pollRequest;
let stopped = false;

function pollQueue () {
  if (stopped) {
    return;
  }

  pollRequest = client.receiveMessage({ WaitTimeSeconds: 20 }, function (err, data) {
    if (stopped) {
      return;
    }

    console.log('Received Message');

    if (data && data.Messages) {
      data.Messages.forEach(function (message) {
        console.log(message.Body);
      })
    }

    if (!stopped) {
      pollQueue();
    }
  });
}

pollQueue();

setTimeout(function () {
  client.sendMessage({ MessageBody: 'foobar' }, function (err, data) {
    console.log(`Sent: ${data.MessageId}`);
  });
}, 1000);

onExit(function () {
  console.log('aborting');

  stopped = true;
  pollRequest.abort();
});

On every script start you should expect to see a message sent and then immediately received. However, on subsequent starts, sometimes no message is output whatsoever.

@jstewmon
Copy link
Contributor

It's your responsibility to call ChangeMessageVisibility if you want the message to be immediately visible again.

@rickharrison
Copy link
Author

I understand that. What I am saying is this:

  1. You start this node process. 1 second later a message is generated and received successfully.
  2. Hit control + c on this node process.
  3. Start the node process again. 1 second later a message is generated, but is not received by this process and the AWS console shows it as "in flight". My hypothesis is that the node connection from the process that was control+c'd receives it somehow.

@jstewmon
Copy link
Contributor

Sounds like there is another process polling the queue. How do you know the process you're starting has exclusive access to the queue?

@rickharrison
Copy link
Author

rickharrison commented Nov 13, 2018

Because I am doing this in development and no other processes are running. Also, because I am using a WaitTimeSeconds value of 20, whenever I wait 20 seconds after hitting control + c, it always works. Also, if I change WaitTimeSeconds to 1. It always works as well after hitting control + c

I don't think that abort is working as expected and somehow a connection is lingering when the process has exited. I've tried this with just aws-sdk and many other SQS libraries built on top of the aws sdk.

@jstewmon
Copy link
Contributor

I suspect this may be a limitation of long-polling - if you abort the connection, the remote server may not realize the connection has been closed until it tries to write to it. So the SQS service may try to deliver the first message to a receiver which no longer exists.

If you change setTimeout to setInterval, do you only miss the first message? If so, it would seem that SQS may claim that message for the previous receiver, fail to deliver it and await the visibility timeout to deliver it to the new receiver.

@srchase srchase added the guidance Question that needs advice or information. label Nov 13, 2018
@srchase
Copy link
Contributor

srchase commented Nov 13, 2018

@rickharrison

@jstewmon is correct here. SQS is expecting your client to stay connected for the duration set by WaitTimeSeconds. As long polling can be used to receive multiple messages at a time, the visibility of those messages is immediately toggled, which is why they show as in-flight in the console.

@rickharrison
Copy link
Author

rickharrison commented Nov 13, 2018

Alright, so what is the recommended way to integrate with an SQS queue where the Maximum Receives is set to 1? At some point you will deploy new code and a long poll connection will need to be severed. It seems that those messages will just be lost forever? Or is this another way to ensure that you receive the message.

To rephrase, is there anyway to properly disconnect from a long poll during an event such as a new code deploy? How can you safely stop a node process and not lose messages?

@jstewmon
Copy link
Contributor

Unless you have a redrive policy that moves the messages to a dead-letter queue after one failed attempt, the message will be received by the new process once the visibility timeout expires for any messages that the service attempted to deliver to the previous process.

You could also await the receiveMessage response instead of aborting the request in your exit handler to gracefully stop the the process when it receives a signal.

@srchase srchase added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Nov 13, 2018
@srchase
Copy link
Contributor

srchase commented Nov 14, 2018

@jstewmon Thanks for the suggestions!

Once the visibility timeout expires, messages will get moved back into the queue. They will not be removed until you

The SQS Developer Guide has good references on long polling: SQS Long Polling, Setting Up Long Polling and Tutorial: Configuring Long Polling for an Amazon SQS Queue.

@rickharrison
Copy link
Author

rickharrison commented Nov 14, 2018

Yea, I have read all those and currently am processing millions of messages a month in SQS using long polling.

So, am I correct in assuming that there is no reliable way to use the javascript sdk with a queue that has Maximum Receives set to 1 in the redrive policy. When restarting the node process for something such as a new code deployment, you are then guaranteed to miss messages in a high throughput environment.

screen shot 2018-11-13 at 11 09 46 pm

@jstewmon Is it possible to stop a node process from exiting? I guess you could attach a success handler to the request to exit?

@no-response no-response bot removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Nov 14, 2018
@jstewmon
Copy link
Contributor

I don't think signal-exit does what you want. The exit event handler has to be synchronous. You can just handle any signals you want to result in graceful shutdown:

'use strict'

const signals = ['SIGINT', 'SIGTERM', 'SIGQUIT']
async function runTillSignaled(fn, ...args) {
    let run = true
    signals.forEach(sig => {
        process.on(sig, () => {
            console.error(`got sig ${sig}`)
            run = false
        })
    })
    while (run) {
        await fn(...args)
    }
}

runTillSignaled(
    async (before, after) => {
        console.log(before)
        await new Promise(resolve => setTimeout(resolve, 1000))
        console.log(after)
    },
    'before timer',
    'after timer'
)

@srchase srchase added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Nov 15, 2018
@rickharrison
Copy link
Author

In a scenario where my process gets shutdown in a deploy it seems that setting Maximum Receives to 1 will always result in lost messages. I guess I will just always have to set queues to maximum receives of at least 2 unless @srchase says otherwise.

@no-response no-response bot removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Nov 21, 2018
@srchase srchase added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Nov 21, 2018
@srchase
Copy link
Contributor

srchase commented Nov 21, 2018

@rickharrison

I apologize, but I'm missing where these messages are being dropped. When your process shuts down, any in-flight messages will be put back in the queue once visibility timeout expires. Maxiumum Receives being set to 1 or 2 won't change that.

@rickharrison
Copy link
Author

Here are the steps of the problem:

  1. Have a process running with long poll active
  2. Shutdown that process
  3. Start up the process again for long polling
  4. A message can still get sent to the first process that was shut down
  5. That message is not actually handled by that process and after the visibility timeout is sent back as an available message
  6. Because Maximum Receives is set to 1, that message immediately goes to the deadletter queue after trying to be received again.

@no-response no-response bot removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Nov 21, 2018
@srchase
Copy link
Contributor

srchase commented Nov 22, 2018

@rickharrison

Thanks for clarifying.

With the deadletter queue, and the Maximum Receives set to 1, you will get the same behavior for long polling with any of the SDKs. I don't see anything to indicate that there is an issue with the workings of the JavaScript SDK itself.

If you want to followup for further guidance on utilizing SQS and right queue structure for your use case, you're welcome to open a case with AWS Premium Support or post at AWS Developer Forums.

@srchase srchase added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Nov 22, 2018
@lock
Copy link

lock bot commented Sep 29, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

@lock lock bot locked as resolved and limited conversation to collaborators Sep 29, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
closing-soon This issue will automatically close in 4 days unless further comments are made. guidance Question that needs advice or information.
Projects
None yet
Development

No branches or pull requests

3 participants