WIP: Guess pgresult size allocated by libpq #23

larskanis · 2018-06-13T20:22:52Z

As published here
https://discuss.samsaffron.com/t/rubys-external-malloc-problem/431
memory allocated by libpq is not considered by Ruby's GC.

This patch tries to guess the memory size based on the field sized of a sample set of rows. It also mimics parts of libpq's allocation mechanism.

It makes use of rb_gc_adjust_memory_usage() introduced in Ruby-2.4:
https://bugs.ruby-lang.org/issues/12690
This ensures, that Ruby's GC is triggered early enough to free PG::Result objects.

Also use TypedData to allow retrieval of PG::Result sizes by

  ObjectSpace.memsize_of(result)

It's WIP because of missing tests and missing fine tuning of size approximation.

cc @ged , @SamSaffron

SamSaffron · 2018-06-13T21:05:07Z

Nice,

Would you know what the impact here is performance wise on the pathological case which is an ultra wide table (say 40 columns) single row, for the added accounting?

Also perhaps let's open a discussion on the pg mailing list about adding something to libpq here for pulling out the number cause it seems it would be as simple as number of buffers * buffer size?

larskanis · 2018-06-14T06:14:50Z

Absolutely - we should open a discussion about a libpq memsize function. Possibly there's also a smarter way for guessing, than I used. This patch is an interim workaround only. A quick test showed around 10% performance decrease. I used: https://gist.github.com/SamSaffron/409805f6c8447d344e04ad68505ec43f Interestingly roughly the same for single rows as for larger result sets with 1000 rows. I guess although only 20 rows are scanned, it causes a lot of memory cache misses.

SamSaffron · 2018-06-14T06:22:22Z

I wonder, can we make this opt-out then, I think trying for safety out of the box is fine, but AR and places that are careful to .clear should not be penalized here.

larskanis · 2018-06-14T10:40:43Z

Sure, there should be a switch to enable/disable such a performance regression. Since it worked all the years now, I think maybe opt-in.

SamSaffron · 2018-06-14T11:52:13Z

Agree maybe opt in till we get a patch in libpq to get the size cheaply

…

On Thu, 14 Jun 2018 at 8:40 pm, Lars Kanis ***@***.***> wrote: Sure, there should be a switch to enable/disable such a performance regression. Since it worked all the years now, I think maybe opt-in. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#23 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAUXTmNa394dTc8Xm-9kFBE-BYP6Uxoks5t8j2ugaJpZM4Um38T> .

As published here https://discuss.samsaffron.com/t/rubys-external-malloc-problem/431 memory allocated by libpq is not considered by Ruby's GC. This patch tries to guess the memory size based on the field sizes of a small sample set of fields. The sample fields are chose by a simple heuristic. The size calculation mimics parts of libpq's allocation mechanism. It makes use of rb_gc_adjust_memory_usage() introduced in Ruby-2.4: https://bugs.ruby-lang.org/issues/12690 This ensures, that Ruby's GC is triggered early enough to free PG::Result objects. Also use TypedData to allow retrieval of PG::Result sizes by ObjectSpace.memsize_of(result)

larskanis · 2018-06-19T20:31:57Z

I did some more fine tuning and benchmarking. The result memsize calculation is now opt-out, because the performance impact is below precision of measurement. true means memsize enabled, false memsize disabled.

Calculating -------------------------------------
         raw_id true    291.497  (± 2.1%) i/s -      1.484k in   5.093319s
        raw_all true     26.881  (± 3.7%) i/s -    136.000  in   5.062055s
      raw_all_1 true    580.691  (± 4.3%) i/s -      2.950k in   5.088647s
   raw_id_clear true    293.475  (± 1.4%) i/s -      1.484k in   5.057401s
  raw_all_clear true     26.934  (± 0.0%) i/s -    136.000  in   5.050548s
raw_all_1_clear true    554.946  (±13.0%) i/s -      2.688k in   5.034526s
        raw_id false    292.696  (± 1.4%) i/s -      1.479k in   5.054145s
       raw_all false     27.222  (± 0.0%) i/s -    138.000  in   5.070949s
     raw_all_1 false    577.804  (± 2.1%) i/s -      2.928k in   5.069737s
  raw_id_clear false    297.485  (± 1.7%) i/s -      1.512k in   5.084062s
 raw_all_clear false     26.607  (± 3.8%) i/s -    134.000  in   5.053072s
raw_all_1_clear false   563.031  (±12.1%) i/s -      2.750k in   5.069865s

I would merge this as is now, unless there are any concerns.

SamSaffron · 2018-06-19T23:35:52Z

very interesting you can not measure this anymore. curious can you confirm it is working as expected and perhaps just measure by hand with clock_gettime(CLOCK_MONOTONIC, &ts); before and after method call to see what the actual nonsec cost is?

Not against merging this in, but I am curious on the actual cost here.

larskanis · 2018-06-21T20:11:11Z

@SamSaffron Yes it's working as expected. It can be seen by the MemoryProfiler or by your exploit. It doesn't bloat memory any longer.

I measured the time for calculating the memory size per clock_gettime(CLOCK_MONOTONIC) and got the following results on my Ryzen system:

rows x cols	memsize	query time
1x45	551 nsec	125 usec
1000x45	1132 nsec	4200 usec
1000x1	561 nsec	416 usec

The measurement itself takes around 70 nsecs, so this should be substracted.

SamSaffron · 2018-06-21T22:12:30Z

Nice! I think at this cost it is cheap enough to have default on. I do still feel like we should work with libpq here so I will post something to pghackers and see what they say.

Perhaps we can merge this (and maybe Jeremy's patches) and cut a release? There is so much goodness that was added to the gem in the last 5-6 months.

larskanis force-pushed the account-result-size branch 3 times, most recently from 35b64d9 to 7138926 Compare June 18, 2018 20:02

larskanis force-pushed the account-result-size branch from 7138926 to c0cad36 Compare June 19, 2018 20:14

larskanis merged commit c0cad36 into ged:master Jun 23, 2018

larskanis deleted the account-result-size branch August 17, 2021 09:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Guess pgresult size allocated by libpq #23

WIP: Guess pgresult size allocated by libpq #23

larskanis commented Jun 13, 2018

SamSaffron commented Jun 13, 2018

larskanis commented Jun 14, 2018 via email

SamSaffron commented Jun 14, 2018

larskanis commented Jun 14, 2018

SamSaffron commented Jun 14, 2018 via email

larskanis commented Jun 19, 2018

SamSaffron commented Jun 19, 2018

larskanis commented Jun 21, 2018

SamSaffron commented Jun 21, 2018

WIP: Guess pgresult size allocated by libpq #23

WIP: Guess pgresult size allocated by libpq #23

Conversation

larskanis commented Jun 13, 2018

SamSaffron commented Jun 13, 2018

larskanis commented Jun 14, 2018 via email

SamSaffron commented Jun 14, 2018

larskanis commented Jun 14, 2018

SamSaffron commented Jun 14, 2018 via email

larskanis commented Jun 19, 2018

SamSaffron commented Jun 19, 2018

larskanis commented Jun 21, 2018

SamSaffron commented Jun 21, 2018