Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spawning process for every request #313

Closed
kessler opened this issue Mar 15, 2017 · 9 comments
Closed

Spawning process for every request #313

kessler opened this issue Mar 15, 2017 · 9 comments

Comments

@kessler
Copy link

kessler commented Mar 15, 2017

This issue is about this code section:

 getClientUserAgentSeeded: function(seed, cb) {
    exec('uname -a', function(err, uname) {
      var userAgent = {};
      for (var field in seed) {
        userAgent[field] = encodeURIComponent(seed[field]);
      }

      // URI-encode in case there are unusual characters in the system's uname.
      userAgent.uname = encodeURIComponent(uname) || 'UNKNOWN';

      cb(JSON.stringify(userAgent));
    });
  },

from: https://github.com/stripe/stripe-node/blob/master/lib/stripe.js#L164

This code which is run for EVERY request, spawns a process!

  1. First and foremost sniffing out details of the operating system and sending it to yourself without explicit notice is just wrong! this is a real breach of confidence.
  2. In high volume situations on stressed machines this can cause errors and memory leaks.
  3. Silencing errors is a very bad habit
@brandur-stripe
Copy link
Contributor

Hi there,

This code which is run for EVERY request, spawns a process!

Check the code here. The results of the shell out get cached on a global and re-used.

First and foremost sniffing out details of the operating system and sending it to yourself without explicit notice is just wrong! this is a real breach of confidence.

I'm sorry to hear that you think so :/ If it helps, I can assure you that this information isn't collected for any kind of malicious intent. The majority of the time it's only used in aggregate so that we can get a feel for the types of requests that are coming in and to get an idea of what kinds of platforms we need to support. Otherwise, it's used occasionally to help troubleshoot particular users who are having trouble integrating, but not for much else.

In high volume situations on stressed machines this can cause errors and memory leaks.

As mentioned above, this call is cached, but even if it weren't it's worth keeping in mind that shelling out is slow, but still ~two orders of magnitude faster than the network call that's about to happen. Also, you'd hopefully only run into a memory leak if there was a bug in the package; the memory used to spawn the child process will be appropriately reclaimed.

Silencing errors is a very bad habit

Yes, this should probably be logged or something. I think the rational was that this type of problem isn't desirable, but also not worth stopping the program over, so we blow by it.

@kessler
Copy link
Author

kessler commented Mar 15, 2017

Thanks for the swift response.

You're right, I missed that if clause there. However we did run into a situation in production where stripe spawned lots of these processes, it was only resolved after we commented out the exec directive - I will try to recreate a distilled version of this, so it can be shared.

In addition, I really think you should notify your users that you are sending uname results with requests. This can contain private IP address and other sensitive information, they might not want to give out.

Another thing, if the exec operation is expected to fail without serious consequences it should also be wrapped in try/catch as some of the errors of exec can be thrown rather than return in the callback

@brandur-stripe
Copy link
Contributor

You're right, I missed that if clause there. However we did run into a situation in production where stripe spawned lots of these processes, it was only resolved after we commented out the exec directive - I will try to recreate a distilled version of this, so it can be shared.

Oh crazy! Did you happen to notice whether they were zombies? It's possible that something isn't working as expected on our exec invocation. I checked the docs and it seems to look okay, but I guess you never know. Any luck with the repro case?

In addition, I really think you should notify your users that you are sending uname results with requests. This can contain private IP address and other sensitive information, they might not want to give out.

Cool. We should optimize in general for the much more common case, which is that there really isn't any sensitive information in uname -a (at least as far as I can think of). Even if an IP was embedded in the hostname, could it ever be considered harmful to send it out? We've already got your more valuable public IP based on you connecting to our servers.

The only trouble is that there really isn't one place that users are likely to see the notice, but I'll think about it.

Another thing, if the exec operation is expected to fail without serious consequences it should also be wrapped in try/catch as some of the errors of exec can be thrown rather than return in the callback

I was just reading the docs for exec and couldn't find any reference to thrown exceptions. Could you provide a little more detail here?

@kessler
Copy link
Author

kessler commented Mar 29, 2017

@brahn-stripe

  • Haven't gotten to write the distilled version of the problem yet.
  • Regarding the user notice, I think the README.md is a good place to do it.
  • About the exception, it is not documented, but here is an example of such exception: Repeatedly getting Error: spawn ENOMEM remy/nodemon#545 I think this still applies in latest versions of node

@brandur-stripe
Copy link
Contributor

Thanks for continuing to look.

About the exception, it is not documented, but here is an example of such exception: remy/nodemon#545 I think this still applies in latest versions of node

Ah, interesting. I guess the only thing is that in the case of being out of memory, a thrown exception might not be that bad. Even if your program is able to tolerate it, you're almost certainly just going to be seeing more problems right away until something finally cracks. I'd certainly take a look at a pull for that, but I don't think it's a huge priority to patch.

@kessler
Copy link
Author

kessler commented Apr 14, 2017

I agree it's probably not a priority, however, I followed the code path down to uv_spawn (and stopped there for now) and I'm not sure it's the only possible exception.

Thing is that if an exception is thrown then the data won't get cached, causing the process to be spawned again and again

@deontologician-stripe
Copy link

Closing due to age

@richardscarrott
Copy link

Hi @brandonl-stripe, I've just been profiling our node server and noticed a lot of CPU activity in the stripe SDK which I think is related to the fact a new process is spawned for every request.

It only looks to be problematic when running from cold as the second run shows much less activity. Here's the code:

app.get('/stripe-test/', async (req, res, next) => {
    try {
      // Get the Stripe payment intent ids from the DB:
      const docs = await db
        .collection('orders')
        .find(
          { 'payment.gateway': 'STRIPE' },
          { projection: { 'payment.reference': 1 } }
        )
        .limit(50)
        .toArray();
      const paymentIntentIds = docs.map(doc => doc.payment.reference);

      // NOTE: There's no batch endpoint for paymentIntents so having to perform 50 requests in parallel:
      const paymentIntents = await Promise.all(
        paymentIntentIds.map(id => {
          return stripe.paymentIntents.retrieve(id, {
            expand:[
              'payment_method',
              'charges.data.balance_transaction',
            ]
          });
        })
      );
      res.json(paymentIntents);
    } catch (ex) {
      res.status(500).send(ex);
    }
});

From cold:
Screenshot 2021-07-27 at 18 11 51

2nd run:
Screenshot 2021-07-27 at 18 12 57

Additionally, I noticed its way less noisy if I fetch one on its own, and then the remaining 49, e.g.

     // ...
      const firstPaymentIntent = await stripe.paymentIntents.retrieve(paymentIntentIds[0]); // I guess this pre-caches the 'client user agent'?
      const paymentIntents = await Promise.all(
        paymentIntentIds.slice(1).map(id => {
          return stripe.paymentIntents.retrieve(id, {
            expand: [
              'payment_method',
              'charges.data.balance_transaction',
            ]
          });
        })
      );
      res.json([firstPaymentIntent, ...paymentIntents]);
      // ...
});

tbh, I've no idea why it's sparking up a new process in the first place but I'm guessing it didn't intend to do so for every parallel request?

I'm testing with an older version of the SDK (7.5.3) -- I wonder if this is a known issue / has it been fixed in a newer version?

@richardscarrott
Copy link

richardscarrott commented Jul 27, 2021

Scanning through the code it looks to still be an issue in the latest version

getClientUserAgent(cb) {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants