SQS long poll stays active after node process exits #2350

rickharrison · 2018-11-13T00:16:24Z

After exiting a node process (with control + c for example), it seems the long polling request does not clean up.

For example, with this code:

sqs.receiveMessage({ WaitTimeSeconds: 20 }, (err, data) => {
  console.log(data);
});

sqs.sendMessage({ MessageBody: 'foo' }, function () {
  
});

The first time this node process is run it outputs the message. However, if you hit control + c and run the script again right away, nothing is output and then in the SQS web console it shows 1 message as in flight. Is there a way to properly clean up receiveMessage on exit? I tried calling abort but that did not seem to work.

The text was updated successfully, but these errors were encountered:

rickharrison · 2018-11-13T01:00:43Z

Here is a more elaborate script demonstrating the issue

const aws = require('aws-sdk');
const onExit = require('signal-exit');

const client = new aws.SQS({
  apiVersion: '2016-07-19',
  region: 'us-west-2',
  accessKeyId: '...',
  secretAccessKey: '...',
  params: {
    QueueUrl: '...'
  }
});

let pollRequest;
let stopped = false;

function pollQueue () {
  if (stopped) {
    return;
  }

  pollRequest = client.receiveMessage({ WaitTimeSeconds: 20 }, function (err, data) {
    if (stopped) {
      return;
    }

    console.log('Received Message');

    if (data && data.Messages) {
      data.Messages.forEach(function (message) {
        console.log(message.Body);
      })
    }

    if (!stopped) {
      pollQueue();
    }
  });
}

pollQueue();

setTimeout(function () {
  client.sendMessage({ MessageBody: 'foobar' }, function (err, data) {
    console.log(`Sent: ${data.MessageId}`);
  });
}, 1000);

onExit(function () {
  console.log('aborting');

  stopped = true;
  pollRequest.abort();
});

On every script start you should expect to see a message sent and then immediately received. However, on subsequent starts, sometimes no message is output whatsoever.

jstewmon · 2018-11-13T21:34:38Z

It's your responsibility to call ChangeMessageVisibility if you want the message to be immediately visible again.

rickharrison · 2018-11-13T21:36:23Z

I understand that. What I am saying is this:

You start this node process. 1 second later a message is generated and received successfully.
Hit control + c on this node process.
Start the node process again. 1 second later a message is generated, but is not received by this process and the AWS console shows it as "in flight". My hypothesis is that the node connection from the process that was control+c'd receives it somehow.

jstewmon · 2018-11-13T21:53:37Z

Sounds like there is another process polling the queue. How do you know the process you're starting has exclusive access to the queue?

rickharrison · 2018-11-13T21:55:36Z

Because I am doing this in development and no other processes are running. Also, because I am using a WaitTimeSeconds value of 20, whenever I wait 20 seconds after hitting control + c, it always works. Also, if I change WaitTimeSeconds to 1. It always works as well after hitting control + c

I don't think that abort is working as expected and somehow a connection is lingering when the process has exited. I've tried this with just aws-sdk and many other SQS libraries built on top of the aws sdk.

jstewmon · 2018-11-13T22:29:30Z

I suspect this may be a limitation of long-polling - if you abort the connection, the remote server may not realize the connection has been closed until it tries to write to it. So the SQS service may try to deliver the first message to a receiver which no longer exists.

If you change setTimeout to setInterval, do you only miss the first message? If so, it would seem that SQS may claim that message for the previous receiver, fail to deliver it and await the visibility timeout to deliver it to the new receiver.

srchase · 2018-11-13T23:01:44Z

@rickharrison

@jstewmon is correct here. SQS is expecting your client to stay connected for the duration set by WaitTimeSeconds. As long polling can be used to receive multiple messages at a time, the visibility of those messages is immediately toggled, which is why they show as in-flight in the console.

rickharrison · 2018-11-13T23:04:50Z

Alright, so what is the recommended way to integrate with an SQS queue where the Maximum Receives is set to 1? At some point you will deploy new code and a long poll connection will need to be severed. It seems that those messages will just be lost forever? Or is this another way to ensure that you receive the message.

To rephrase, is there anyway to properly disconnect from a long poll during an event such as a new code deploy? How can you safely stop a node process and not lose messages?

jstewmon · 2018-11-13T23:14:38Z

Unless you have a redrive policy that moves the messages to a dead-letter queue after one failed attempt, the message will be received by the new process once the visibility timeout expires for any messages that the service attempted to deliver to the previous process.

You could also await the receiveMessage response instead of aborting the request in your exit handler to gracefully stop the the process when it receives a signal.

srchase · 2018-11-14T00:33:06Z

@jstewmon Thanks for the suggestions!

Once the visibility timeout expires, messages will get moved back into the queue. They will not be removed until you

The SQS Developer Guide has good references on long polling: SQS Long Polling, Setting Up Long Polling and Tutorial: Configuring Long Polling for an Amazon SQS Queue.

rickharrison · 2018-11-14T07:10:43Z

Yea, I have read all those and currently am processing millions of messages a month in SQS using long polling.

So, am I correct in assuming that there is no reliable way to use the javascript sdk with a queue that has Maximum Receives set to 1 in the redrive policy. When restarting the node process for something such as a new code deployment, you are then guaranteed to miss messages in a high throughput environment.

@jstewmon Is it possible to stop a node process from exiting? I guess you could attach a success handler to the request to exit?

jstewmon · 2018-11-14T14:12:12Z

I don't think signal-exit does what you want. The exit event handler has to be synchronous. You can just handle any signals you want to result in graceful shutdown:

'use strict'

const signals = ['SIGINT', 'SIGTERM', 'SIGQUIT']
async function runTillSignaled(fn, ...args) {
    let run = true
    signals.forEach(sig => {
        process.on(sig, () => {
            console.error(`got sig ${sig}`)
            run = false
        })
    })
    while (run) {
        await fn(...args)
    }
}

runTillSignaled(
    async (before, after) => {
        console.log(before)
        await new Promise(resolve => setTimeout(resolve, 1000))
        console.log(after)
    },
    'before timer',
    'after timer'
)

rickharrison · 2018-11-21T01:49:21Z

In a scenario where my process gets shutdown in a deploy it seems that setting Maximum Receives to 1 will always result in lost messages. I guess I will just always have to set queues to maximum receives of at least 2 unless @srchase says otherwise.

srchase · 2018-11-21T18:24:00Z

@rickharrison

I apologize, but I'm missing where these messages are being dropped. When your process shuts down, any in-flight messages will be put back in the queue once visibility timeout expires. Maxiumum Receives being set to 1 or 2 won't change that.

rickharrison · 2018-11-21T21:03:35Z

Here are the steps of the problem:

Have a process running with long poll active
Shutdown that process
Start up the process again for long polling
A message can still get sent to the first process that was shut down
That message is not actually handled by that process and after the visibility timeout is sent back as an available message
Because Maximum Receives is set to 1, that message immediately goes to the deadletter queue after trying to be received again.

srchase · 2018-11-22T00:10:26Z

@rickharrison

Thanks for clarifying.

With the deadletter queue, and the Maximum Receives set to 1, you will get the same behavior for long polling with any of the SDKs. I don't see anything to indicate that there is an issue with the workings of the JavaScript SDK itself.

If you want to followup for further guidance on utilizing SQS and right queue structure for your use case, you're welcome to open a case with AWS Premium Support or post at AWS Developer Forums.

lock · 2019-09-29T13:55:46Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

srchase added the guidance Question that needs advice or information. label Nov 13, 2018

srchase added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Nov 13, 2018

no-response bot removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Nov 14, 2018

srchase added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Nov 15, 2018

no-response bot removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Nov 21, 2018

srchase added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Nov 21, 2018

no-response bot removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Nov 21, 2018

srchase added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Nov 22, 2018

rickharrison closed this as completed Nov 22, 2018

This was referenced Aug 21, 2019

SQS - Interrupting long-polling aws/aws-sdk-java#2083

Closed

SQS: Interrupt long polling aws/aws-sdk-java-v2#1395

Closed

lock bot locked as resolved and limited conversation to collaborators Sep 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQS long poll stays active after node process exits #2350

SQS long poll stays active after node process exits #2350

rickharrison commented Nov 13, 2018

rickharrison commented Nov 13, 2018 •

edited

Loading

jstewmon commented Nov 13, 2018

rickharrison commented Nov 13, 2018

jstewmon commented Nov 13, 2018

rickharrison commented Nov 13, 2018 •

edited

Loading

jstewmon commented Nov 13, 2018

srchase commented Nov 13, 2018

rickharrison commented Nov 13, 2018 •

edited

Loading

jstewmon commented Nov 13, 2018

srchase commented Nov 14, 2018

rickharrison commented Nov 14, 2018 •

edited

Loading

jstewmon commented Nov 14, 2018

rickharrison commented Nov 21, 2018

srchase commented Nov 21, 2018

rickharrison commented Nov 21, 2018

srchase commented Nov 22, 2018

lock bot commented Sep 29, 2019

SQS long poll stays active after node process exits #2350

SQS long poll stays active after node process exits #2350

Comments

rickharrison commented Nov 13, 2018

rickharrison commented Nov 13, 2018 • edited Loading

jstewmon commented Nov 13, 2018

rickharrison commented Nov 13, 2018

jstewmon commented Nov 13, 2018

rickharrison commented Nov 13, 2018 • edited Loading

jstewmon commented Nov 13, 2018

srchase commented Nov 13, 2018

rickharrison commented Nov 13, 2018 • edited Loading

jstewmon commented Nov 13, 2018

srchase commented Nov 14, 2018

rickharrison commented Nov 14, 2018 • edited Loading

jstewmon commented Nov 14, 2018

rickharrison commented Nov 21, 2018

srchase commented Nov 21, 2018

rickharrison commented Nov 21, 2018

srchase commented Nov 22, 2018

lock bot commented Sep 29, 2019

rickharrison commented Nov 13, 2018 •

edited

Loading

rickharrison commented Nov 13, 2018 •

edited

Loading

rickharrison commented Nov 13, 2018 •

edited

Loading

rickharrison commented Nov 14, 2018 •

edited

Loading