Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confused: connect or not to connect? #1151

Closed
davidgatti opened this issue Oct 14, 2016 · 26 comments
Closed

Confused: connect or not to connect? #1151

davidgatti opened this issue Oct 14, 2016 · 26 comments
Labels

Comments

@davidgatti
Copy link

Hello,

In the project that I'm working on, instead of using Sequalize I decided to do old school, since the project is small. This means I'm spending lots of time in this repo 👻 😋. And I'm confused how to talk with the DB.

On the main README.md I read:

//call done() to release the client back to the pool

So my brain assumes that not only I have to do connect() at each request, but I also have to release the connection. Sounds fine to me.

But then I go hear https://github.com/brianc/node-postgres/wiki/Query, and suddenly no connection(), and no release to be seen in the top part of the document.

In my code I made queries using both options and they work, but since I don't see the difference I don't know which one to chose :)

How should I talk with the DB? 😎 To many options, and not enough explanation on the differences between all this approaches.

What I'm looking for:

  • to write as little code as possible
  • to have code that can handle 10,000 request a minute for example.
  • to reuse the open connection as much as possible.

Which aproceh will give the best results?

@vitaly-t
Copy link
Contributor

vitaly-t commented Oct 14, 2016

@brianc looking at ya 😸 and you know what I mean... 😄

@davidgatti
Copy link
Author

@vitaly-t I'm assuming a better dock is in the making for a long time now? 😜.

If you guys are interested I can help with that. I was a Developer Relation Manager in two companies, meaning I know how to write docs for developers. Plus the feedback that I always got was very positive.

You can check my articles on my GitHub Profile: https://github.com/davidgatti - to see if what I'm saying is true :)

@abenhamdine
Copy link
Contributor

abenhamdine commented Oct 16, 2016

The usual way to connect to pg and talk to the database is :

  • get a client with the pool method connect
  • send a query with client.query

So, as you can read in the readme, the standard way should be :

var pool = new pg.Pool(config);

pool.connect(function(err, client, done) {
  if(err) {
    return console.error('connexion error', err);
  }
  client.query('SELECT $1::int AS number', ['1'], function(err, result) {
    //call `done()` to release the client back to the pool
    done();

    if(err) {
      return console.error('error running query', err);
    }
    console.log(result.rows[0].number);
    //output: 1
  });
});

Whith this approach, you don't have to reuse the client (but you can if you want), because this is already managed by the pool :
you release the client back to the pool, and get another one as soon as you need. Because it's a pool, the connexion time is immediate.

@davidgatti
Copy link
Author

Right, but this works also without .connect(), so my question what happens if I don't use it, and more importantly don't take advantage of done()?

Right now I don't see the downside, I would like to understand the difference to be able to take a more educated decision.

@joskuijpers
Copy link
Contributor

Using .connect on a pool you will retrieve an old or new connection. Due to the async behaviour of node and everything around it, the pool does not know when that client (connection) you got is available again. So you should release it back into the pool.
If you don't, you will run out of connections, as the connections are not properly closed. They might time out after 30 seconds (or longer, not sure).

If you are handling multiple requests per second you might run into trouble when not releasing properly.

Also, you don't need to release the connection after a single query. You can do it any time. So you can get a connection when a new http request arrives and when the request ends (finished, times out) you can close/release the connection.

I hope this answers your question(s).

@Globik
Copy link

Globik commented Oct 19, 2016

Best practice says one need to use connection? Also confused.
So it looks like

router.get('/', function(req,res){
    pool.connect....?
    or without connect? At once
    pool.query...???
})

@Tigerino
Copy link

I'm also having trouble understanding the differences. Currently, I was simply calling Pool#query.... which I guess is not the best option ;)

So I believe we should do something like this?

app.use(function (req, res, next) {
    function afterResponse() {
        res.removeListener("finish", afterResponse);
        res.removeListener("close", afterResponse);

        // close connection here
       done();
    }

    res.on("finish", afterResponse);
    res.on("close", afterResponse);

    // connect here and store the client somewhere
    pool.connect(...)
    next();
});

This would hook in before and after requests.

@Globik
Copy link

Globik commented Oct 20, 2016

@Tigerino looks like a middleware. But strange those removeListeners..
As I understand a pool.query function already has all needed stuff built-in. Have a look at code. There are connect and end based on pg.Client.

@Globik
Copy link

Globik commented Oct 20, 2016

Take attention. From pg-pool module index.js (pool.query instance):

Pool.prototype.query = function (text, values, cb) {
  if (typeof values === 'function') {
    cb = values
    values = undefined
  }

  return new this.Promise(function (resolve, reject) {
    this.connect(function (err, client, done) {
      if (err) {
        if (cb) {
          cb(err)
        }
        return reject(err)
      }
      client.query(text, values, function (err, res) {
        done(err)
        err ? reject(err) : resolve(res)
        if (cb) {
          cb(err, res)
        }
      })
    })
  }.bind(this))
}

@Tigerino
Copy link

This would call connect for each query, which I guess is not desireable. Imagine there is a request coming in and you are doing like 10 database calls within the lifetime of the request, and then finally respond to the request. This would end up in 10 connects/closes. It really depends on how much database functionality you are using for each request. Yet, connecting/closing once per request is still the best option I guess, but might make it harder to implement... depending on your current stack, of course.

In my example above, I would probably use connect when the request comes in. When the promise is resolved and the client is available, I would store the client on the req object and call next (so the call will be routed to my controller). This would make the client available to only this request - concurrent requests would have their own req object and thus their own database client. When the finished handler is being called, we would call the done function in order to return the connection to the pool.

This approach also has the advantage that you keep the database connection stuff out of the actually controller implementation. The major advantage however is, that you guarantee, that when your controller logic is actually executed, that all database queries do have a connection available and therefore will be executed w/o any problems. This means, that a request will be processed in an atomic way. If there are no database connections available, the request will not be served until resources become available again. But if there are resources available, the request will be processed with the granted resources... it will not stop somewhere in the middle.

Imagine we would use Pool#query, so we would use connect for each query and each request uses about 10 queries. When the system is under heavy load, a request may come in and we start to process it. When we now call Pool#query for our 5th query, the database pool might not have any free connections left, so we now have to wait until a connection becomes available... and this situation will probably happen to like every request.

@davidgatti
Copy link
Author

@tgriesser thank you for your time and explanation. I'll try to explain what I'm doing to see if this approach is a good one. Of course it is working now, but still it is unclear what this Pool() dose.

Lets dive deeper

OK, so the majority of things on the internet uses a Socket connection for communication. A database is no exception. I'm no expert in DB but I'll assume that we have two options:

  1. To establish a connection, we open a Socket connection to the DB, send some data, wait for the response, and close the Socket connection.
  2. We open a Socket connection, send the data, wait for the response, and we don't close the connection, we just reuse this socket connection.

The theory is that opening a Socket connection take some extra time in ms, and if you have lots of connection coming in, the extra time will add up. The benefit of this is that you need less memory to store the info about the Socket connection.

The second option is that we open one Socket connection, and use it as long as possible. But then I'm assuming that you have to queue the requests, and if you have lots of them, the queue will get longer and it will take more time get a response from the DB.

Then

If this is the right assumption, then connect() and done() are opening and closing Socket connections. Where Pool() will keep the socket connection open but the SQL queries will be queued?

@joskuijpers
Copy link
Contributor

joskuijpers commented Oct 21, 2016

@davidgatti

Not exactly.

Without pool:

  • You open a connection (new Client())
  • You use the connection
  • You close the connection

With a pool:

  • You request a connection (pool.connect())
    • The pool looks for inactive connections. If it has one, returns the connection
    • If no inactive connection is found, it creates a new connection (using new Client())
  • You use the connection
  • You return the connection to the pool, so it is an 'inactive' connection to be used by the next pool.connect()
  • The pool might close the connection if it is inactive for too long

With the no-pool approach it is easier to reason where your connections are and how many you have.

Regarding queries:

  • You execute 1 query in async.
    • The library sees that no query is active so it executes the query.
  • You also execute 1 other query
    • The previous query has not finished yet (you attempt to run the queries in parallel)
    • The library queues the query for the connection
    • Once the previous query is finished it automatically executes the next query in the queue.
    • This ^ is because the pg protocol only supports one query at a time per connection

So no, pooling and query queueing is not related.

Anyways, it is important to either close the connection (using no pool) or returning it to the pool (which then might close it), otherwise you have outstanding connection that you can't close anymore because you lost the reference.

For your specific approach in your original post:

  • Add middleware that opens a connection using pool.connect(), stores the client in the request object, and add a handler on the respons that is triggered when the response is finished. This handler should close the connection.
  • Add this middlewere on any route you want to use a postgres client.
  • Configure your pool correctly for your situation. (If you want to handle 10,000 rps you might want to scale horizontally though).

@davidgatti
Copy link
Author

davidgatti commented Oct 21, 2016

@joskuijpers fantastic, now it is getting clearer and clearer. I would like to share what I did to get an opinion and see if what I'm doing is a good approach. I have an ExpressJS app, so I created a file that connects to the DB like this

let pg = require('pg');
let url = require('url');

//
//  1.  The default configuration for a local machine
//
let config = {
    user: 'davidgatti',
    password: '',
    database: 'oauth2',
    host: 'localhost',
    port: 5432,
    max: 10,                    // max number of clients in the pool
    idleTimeoutMillis: 30000,   // how long a client is allowed to remain idle
    ssl: false
 };

//
//  2.  Check if we are on a remote server which holds all the info on to how
//      to connect to a remote DB
//
if(process.env.DATABASE_URL)
{
    //
    //  1. Extract the URL parameters
    //
    let params = url.parse(process.env.DATABASE_URL);

    //
    //  2.  Split the authentication part, where we have the user name and pass
    //
    let auth = params.auth.split(':');

    //
    //  3.  Create the new configuration
    //
    config = {
        user: auth[0],
        password: auth[1],
        database: params.pathname.split('/')[1],
        host: params.hostname,
        port: params.port,
        max: 10,
        idleTimeoutMillis: 30000,
        ssl: true
    }
}

//
//  ->  Export the object
//
module.exports = new pg.Pool(config);

Then I import the file in the route that I want to use the DB connection

  • like this: let db = require("../database/connection");.
  • Then I execute the query like this db.query(querySelect, function (err, result) {});.

In the configuration I specify 10 connections, and the timeout time for the connection.

Dose this make sense?

@joskuijpers
Copy link
Contributor

Now you are essentially using pool.query, not pool.connect. It both works though I prefer connect.

@davidgatti
Copy link
Author

Right, but the effect will the one described, where I'll create a new connection if there is none available, and reuse one if the connection is free. And overall I'll have max 10 open connection.

Correct?

@Globik
Copy link

Globik commented Oct 21, 2016

I'm using Koa.js framework.
Some experiment with node-pg and its Pool mechanism. Works OK. Here is an little working example:
https://gist.github.com/Globik/5c6d2758404e88a0adef11144db2be5f

@joskuijpers
Copy link
Contributor

@davidgatti that sounds correct to me

@joskuijpers
Copy link
Contributor

That seems correct, as far as I know.

I do think however that when a new connection can’t be acquired, it might throw an error instead of waiting for a connection to free up, but I am not sure. (You can try this out by calling .connect 11 times)

On 21 Oct 2016, at 15:50, David Gatti [email protected] wrote:

Right, but the effect will the one described, where I'll create a new connection if there is none available, and reuse one if the connection is free. And overall I'll have max 10 open connection.

Correct?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #1151 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AEyN0OISNBDItotPnIPAdAgOq_ahiPDXks5q2MMkgaJpZM4KW37Z.

@davidgatti
Copy link
Author

Sounds cool. I'll play with this :)

@davidshare
Copy link

guys I am using client.connect(). and I keep getting this error

(node:2576) UnhandledPromiseRejectionWarning: Error: Client has already been connected. You cannot reuse a client.
at Client.connect (C:\Users\Davidshare\Documents\code dojo\js\StackOverflow-lite\node_modules\pg\lib\client.js:59:17)
at getQuestionById (C:/Users/Davidshare/Documents/code dojo/js/StackOverflow-lite/server/controllers/questionscontroller.js:61:12)
at Layer.handle [as handle_request] (C:\Users\Davidshare\Documents\code dojo\js\StackOverflow-lite\node_modules\express\lib\router\layer.js:95:5)
at next (C:\Users\Davidshare\Documents\code dojo\js\StackOverflow-lite\node_modules\express\lib\router\route.js:137:13)
at validateQuestionId (C:/Users/Davidshare/Documents/code dojo/js/StackOverflow-lite/server/middleware/questionValidator.js:85:12)
at Layer.handle [as handle_request] (C:\Users\Davidshare\Documents\code dojo\js\StackOverflow-lite\node_modules\express\lib\router\layer.js:95:5)
at next (C:\Users\Davidshare\Documents\code dojo\js\StackOverflow-lite\node_modules\express\lib\router\route.js:137:13)
at Route.dispatch (C:\Users\Davidshare\Documents\code dojo\js\StackOverflow-lite\node_modules\express\lib\router\route.js:112:3)
at Layer.handle [as handle_request] (C:\Users\Davidshare\Documents\code dojo\js\StackOverflow-lite\node_modules\express\lib\router\layer.js:95:5)
at C:\Users\Davidshare\Documents\code dojo\js\StackOverflow-lite\node_modules\express\lib\router\index.js:281:22
(node:2576) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:2576) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

@Globik
Copy link

Globik commented Aug 27, 2018

You should to wrap your connect call into the error catch mechanism. And you should close connection after performing some query to the database.

@johncmunson
Copy link

I too was quite confused when reading through the docs and am still unsure if my setup is following best practices. I'm using Express and went with a fairly basic setup where I create a new Pool instance and then inside my express routes I utilize pool.query()...

const pool = new Pool(connection)

app.post('/signup', async (req, res) => {
  const { email, password } = req.body
  const hash = bcrypt.hashSync(password, salt)
  await pool.query('INSERT INTO account (email, password) VALUES ($1, $2)', [email, hash])
  return res.status(200).json({ message: 'Account created' })
})

I'm not utilizing Client at all, and nowhere am I calling pool.connect() or done(). Am I doing something terribly wrong? Or is my setup just fine?

@Globik
Copy link

Globik commented Feb 5, 2019

You are doing right. Don't forget also about try{}catch(e)

@Globik
Copy link

Globik commented Feb 5, 2019

Apropos, your pool variable you can place into context middleware of express.

@johncmunson
Copy link

@Globik thanks so much. Sorry, this is off topic to the thread at this point, but I'm new to express and not quite sure what you mean by context middleware. Are you by chance referring to this package?

@Globik
Copy link

Globik commented Feb 7, 2019

@johncmunson no, I mean this:
app.use(function(req, res, next){
res.locals.your_database_instance = pool;
next();
});
And then in your router call it for queries.
let db = res.locals.your_database_instance;
try{
db.query(blablabla...)
}catch(error){}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants