Search code, repositories, users, issues, pull requests...

          }
          n.mon.ReleaseMemory(int64(len(n.prefixSeen)))
          if err := n.mon.ReserveMemory(int64(len(prefix))); err != nil {

so prefixes are not tallied in allocated? Then the semantics of that field are pretty ad-hoc, I'd say. See if you like the suggestion in https://reviewable.io/reviews/cockroachdb/cockroach/8691#-KPtA9wVusepXvc-Pe86

sql/executor.go, line 134 [r1] (raw file):

type ResultColumns []ResultColumn

// NumColumns implements the parser.ColumnHeader interface.

nit: assign a dummy ResultColumns to that interface to assert that it implements it

  // Close terminates the planNode execution and suggests its
  // resources can be released.

what's this "suggests" phrasing? :)

sql/planner.go, line 228 [r1] (raw file):

  }
  defer func() {
      plan.Close()

I'm confused. How does that this work with the Executor also doing the same?

      },
      sort: &sortNode{
          // Don't use the planner context for logging: this sort node is

you're adding a distinction between using a context for logging and using it for other stuff, but we're still just passing a single context to the sortNode, so I think the distinction is not realized?

// MemoryUsageMonitor defines and object that can track and limit
// memory usage by row data during the execution of SQL queries.
type MemoryUsageMonitor struct {

I think this struct deserves a comment about how it's intended to be used. There's gonna be one of these per query? It's not thread safe (that might be a problem for some of the distsql nodes)? Can they be reused after Stop()?

// MemoryUsageMonitor defines and object that can track and limit
// memory usage by row data during the execution of SQL queries.
type MemoryUsageMonitor struct {

May I suggest a slightly different API for this guy? I think Reserve/Release are too low level and it's unfortunate that, because of this, the client needs to also keep track of what has been allocated.
How about either we make Reserve return a token that's passed back to Release, or we have a way for a client to open a "span" and to Reserves within that span (possibly multiple times within the same span), and we replace Release with an operation that destroys a span releases everything that had been allocated in that context. E.g. DistinctNode would create a span for each "prefix" and keep allocating "suffixes" in it. To make it efficient, an allocation count could be maintained in the span object (and in parallel the total would be maintained in the Monitor.)

sql/mon/mem_usage.go, line 39 [r1] (raw file):

  curAllocated int64

  // totalAllocated tracks the cumulated amount of memory allocated.

say that all these are in bytes, or even add it to the names

  // ctx carries the logging context.
  ctx context.Context

ummm so we should generally try to not store contexts in structs. Instead, operations that need one (or, ideally, pretty much every single function) should take a context from the caller, representing the operation that the caller is in the process of doing at that moment.
Exceptions are structs that do stuff in the background, which are initialized with a "base context" that they then refine further.

sql/mon/mem_usage.go, line 60 [r1] (raw file):

  // query can become.
  if mm.curAllocated > mm.maxAllocatedBudget-x {
      return fmt.Errorf("memory budget exceeded: %d requested, %s already allocated",

errors.Errorf is the new hottness
also nit: it's TODO(knz): above and everywhere

func (mm *MemoryUsageMonitor) ReleaseMemory(x int64) {
  if mm.curAllocated < x {
      log.Errorf(mm.ctx, "no memory to release, current %d, free %d", mm.curAllocated, x)

log.Fatal with debug.PrintStack() ?
or just panic, I don't think the context in the log message would be particularly useful.

  if log.V(1) && mm.curAllocated != 0 {
      log.Infof(mm.ctx, "starts with %d bytes left over (%p)", mm.curAllocated, mm)

panic? Say that Stop wasn't called (cause then that would've panicked)

sql/mon/mem_usage.go, line 131 [r1] (raw file):

var noteworthySQLQuerySize = envutil.EnvOrDefaultInt64("COCKROACH_NOTEWORTHY_SQL_QUERY_SIZE", 10000)

func getSmallTrace() string {

please document.

Is this really needed? Can you use debug.PrintStack() and friends instead?
Or do we even want this in the logs? Maybe we can use the span from a context instead.

sql/mon/mem_usage_test.go, line 30 [r1] (raw file):

  defer leaktest.AfterTest(t)()

  log.SetGlobalVerbosity(2)

I don't see why this is needed. The test doesn't seem to look at the logs? (and if it did, maybe we can find another option :) )
At the very least add a comment, and restore the old verbosity in a defer.

sql/parser/row_container.go, line 17 [r1] (raw file):

// Author: Raphael 'kena' Poss ([email protected])

package parser

does this need to be in parser? Seems to me like it's not related to... parsing.

sql/parser/row_container.go, line 28 [r1] (raw file):

// initialize a RowContainer. This is really sql.ResultColumns,
// defined here via an interface to avoid a circular dependency.
type ColumnHeader interface {

how about ColumnList ?

sql/parser/row_container.go, line 33 [r1] (raw file):

}

// RowContainer is a managed container for rows of DTuples.

I think the word "managed" doesn't mean anything, and generally it'd be good if this comment was expanded a little since I guess this type is gonna become pretty important.

sql/parser/row_container.go, line 34 [r1] (raw file):

// RowContainer is a managed container for rows of DTuples.
type RowContainer struct {

I think there's a structure that at least superficially serves the same purpose of storing a bunch of nodes, maybe with some preallocations, in distsql. Maybe these can be unified? At least make sure you know about that guy.

sql/parser/row_container.go, line 46 [r1] (raw file):

  // allocated is the current number of bytes allocated in this
  // RowContainer.
  allocated int64

put the unit in the name? allocatedBytes

sql/parser/row_container.go, line 50 [r1] (raw file):

// NewRowContainer allocates a new row container.
func NewRowContainer(mon *mon.MemoryUsageMonitor, h ColumnHeader, capacity int) *RowContainer {

please document the capacity param, and how it doesn't affect the MemoryMonitor requirements.

sql/parser/row_container.go, line 68 [r1] (raw file):

      }
  }
  res.fixedRowSize += int64(unsafe.Sizeof(make(DTuple, 0, nCols)))

ha? Despite the name, this doesn't take into account variable vs non-variable cols, or actually the type of the columns at all.
Maybe it's just the comment on this member that needs to change.

sql/parser/row_container.go, line 74 [r1] (raw file):

// Close releases the memory associated with the RowContainer.
func (c *RowContainer) Close() {

are RowContainers intended to be reused? In any case, please document that Close needs to be called on one in the struct doc.

sql/parser/row_container.go, line 76 [r1] (raw file):

func (c *RowContainer) Close() {
  if c == nil {
      return

This seems weird to me. Is it particularly convenient to support nil containers?

sql/parser/row_container.go, line 98 [r1] (raw file):

func (c *RowContainer) AddRow(row DTuple) error {
  rowSize := c.rowSize(row)
  if err := c.mon.ReserveMemory(rowSize); err != nil {

maybe we can interact with the mon less often by having the Container reserve an amount of memory, and the deduct from that until it's exhausted and then reserve some more. That would also make it more correct, since not you're reserving a capacity slice without accounting for it until it gets filled.

sql/parser/row_container.go, line 125 [r1] (raw file):

// sql.sortNode.  A pointer is returned to avoid an allocation when
// passing the DTuple as an interface{} to heap.Push().
func (c *RowContainer) PseudoPop() *DTuple {

but you can't AddRow while someone's holding a reference to a PseudoPopped row, right? Add a comment somewhere. I guess ResetLen fixes this.
Also, how does returning a pointer help with the allocation?

sql/parser/row_container.go, line 138 [r1] (raw file):

}

// Replace substitute one row for another. This does enquire to the

"inquire". We're not traveling on dual carriageways over here.

sql/pgwire/v3.go, line 712 [r1] (raw file):

}

func (c *v3Conn) sendResponse(results sql.ResultList, formatCodes []formatCode, sendDescription bool, limit int) error {

comment that this destroys the results

Comments from Reviewable

RaduBerinde · 2016-08-24T13:11:43Z

Nice stuff! I still have to go through most of the change in detail but I like the core idea of the row container.

We should think in very general terms about how we will be able to use the same infrastructure for distsql. One issue is that the main "data unit" there is not the Datum but the EncDatum. Really the problem is that we don't have an efficient way to keep a lot of small datums (EncDatum is good for large variable-sized types but it's way too much overhead for a lot of small fixed-width values. Very hard to come up with something good without some kind of union type though).

Review status: 0 of 44 files reviewed at latest revision, 32 unresolved discussions, some commit checks failed.

sql/parser/row_container.go, line 28 [r1] (raw file):

// initialize a RowContainer. This is really sql.ResultColumns,
// defined here via an interface to avoid a circular dependency.
type ColumnHeader interface {

Do we think we will have multiple implementations for this interface?

sql/parser/row_container.go, line 39 [r1] (raw file):

  // fixedRowSize is the sum of widths of fixed-width columns.
  fixedRowSize int64

it's the columns that are fixed not the row, maybe fixedColsSize

sql/parser/row_container.go, line 50 [r1] (raw file):

// NewRowContainer allocates a new row container.
func NewRowContainer(mon *mon.MemoryUsageMonitor, h ColumnHeader, capacity int) *RowContainer {

capacity is not very telling, maybe numPreallocRows

sql/parser/row_container.go, line 98 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

maybe we can interact with the mon less often by having the Container reserve an amount of memory, and the deduct from that until it's exhausted and then reserve some more. That would also make it more correct, since not you're reserving a capacity slice without accounting for it until it gets filled.

I like this idea

Comments from Reviewable

knz · 2016-08-25T16:35:20Z

I'll talk with Bram for reservations and with you both for various optimizations. In the meantime PTAL.

Review status: 0 of 42 files reviewed at latest revision, 32 unresolved discussions, some commit checks pending.

cli/start.go, line 109 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: this change seems superfluous to me

Indeed.

server/admin.go, line 178 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

thanks, Golang

I just made this slightly less ugly, PTAL.

Previously, andreimatei (Andrei Matei) wrote…

so prefixes are not tallied in allocated? Then the semantics of that field are pretty ad-hoc, I'd say. See if you like the suggestion in https://reviewable.io/reviews/cockroachdb/cockroach/8691#-KPtA9wVusepXvc-Pe86

You only need to keep track of the combined allocated size of things that are grouped together in a container (e.g. the suffixes here) so that you don't need to call Release once per item in the container. Since the prefix is not in a collection it's not necessary. I'll answer your suggestion separately in another comment below.

sql/executor.go, line 134 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: assign a dummy ResultColumns to that interface to assert that it implements it

Done.

Previously, andreimatei (Andrei Matei) wrote…

what's this "suggests" phrasing? :)

Clarified.

sql/planner.go, line 228 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I'm confused. How does that this work with the Executor also doing the same?

These are different uses. `planner.query()/queryRow()` is used by internal things like `admin.go`; whereas the logic in execStmt is for external clients.

Previously, andreimatei (Andrei Matei) wrote…

you're adding a distinction between using a context for logging and using it for other stuff, but we're still just passing a single context to the sortNode, so I think the distinction is not realized?

I do not understand what you are saying. Please clarify.

Previously, andreimatei (Andrei Matei) wrote…

I think this struct deserves a comment about how it's intended to be used. There's gonna be one of these per query? It's not thread safe (that might be a problem for some of the distsql nodes)? Can they be reused after Stop()?

I clarified this some more. This is intended to be used from a single goroutine indeed so there is no thread safety needed. More intelligence will be added in the subsequent PR to coordinate multiple instances of this without having to synchronize at every allocation.

Previously, andreimatei (Andrei Matei) wrote…

May I suggest a slightly different API for this guy? I think Reserve/Release are too low level and it's unfortunate that, because of this, the client needs to also keep track of what has been allocated.
How about either we make Reserve return a token that's passed back to Release, or we have a way for a client to open a "span" and to Reserves within that span (possibly multiple times within the same span), and we replace Release with an operation that destroys a span releases everything that had been allocated in that context. E.g. DistinctNode would create a span for each "prefix" and keep allocating "suffixes" in it. To make it efficient, an allocation count could be maintained in the span object (and in parallel the total would be maintained in the Monitor.)

I do not see the advantage. Memory-wise it will amount to the same or worse, because each span struct will need at least one counter. The amount of work will be the same. I understand your idea, but perhaps you could detail what the advantage would be?

sql/mon/mem_usage.go, line 39 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

say that all these are in bytes, or even add it to the names

I have added a comment to that effect above.

Previously, andreimatei (Andrei Matei) wrote…

ummm so we should generally try to not store contexts in structs. Instead, operations that need one (or, ideally, pretty much every single function) should take a context from the caller, representing the operation that the caller is in the process of doing at that moment.
Exceptions are structs that do stuff in the background, which are initialized with a "base context" that they then refine further.

Since MemoryUsageMonitor currently carries state over from one statement to the next, and I'd need the context from a previous statement at the start of a new one to explain where this state came from (why there was memory left over), I really need to keep track of the old context here. The proper solution would be to have a context with a span for the entire transaction; then a method on the Context type to change the current tag (used in executor to track each subsequent statement in the transaction). Until these APIs exist I think this implementation here is the cheapest that can buy us proper context tracking.

I have added a TODO for you to look into this later.

sql/mon/mem_usage.go, line 60 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

errors.Errorf is the new hottness
also nit: it's TODO(knz): above and everywhere

Done.

Previously, andreimatei (Andrei Matei) wrote…

log.Fatal with debug.PrintStack() ?
or just panic, I don't think the context in the log message would be particularly useful.

Again I do not understand what you are saying. Please clarify which code would be better. The context here carries the current SQL statement as tag and the current code enables the user to see which statement caused the panic. What you suggest does not do this.

Previously, andreimatei (Andrei Matei) wrote…

panic? Say that Stop wasn't called (cause then that would've panicked)

No here this is correct behavior. There can be memory left over between statements because of e.g. write intents.

sql/mon/mem_usage.go, line 131 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

please document.

Is this really needed? Can you use debug.PrintStack() and friends instead?
Or do we even want this in the logs? Maybe we can use the span from a context instead.

I have added a comment. Until there are different contexts with stacked tags in every function (an idea which I incidentally currently dislike with extreme prejudice) your proposal would reduce the amount of useful information output to the logs to investigate memory allocations.

sql/mon/mem_usage_test.go, line 30 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I don't see why this is needed. The test doesn't seem to look at the logs? (and if it did, maybe we can find another option :) )
At the very least add a comment, and restore the old verbosity in a defer.

I have added a comment to clarify. Why reset the verbosity? It's reset anyway between tests since there's a new server, isn't it?

sql/parser/row_container.go, line 17 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

does this need to be in parser? Seems to me like it's not related to... parsing.

As I have argued before the current name "parser" for this package is a misnomer and it should really be called "language" or "lang" or something.

Nevertheless I moved to sql which simplifies.

sql/parser/row_container.go, line 28 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

how about ColumnList ?

I killed this type entirely.

sql/parser/row_container.go, line 28 [r1] (raw file):

Previously, RaduBerinde wrote…

Do we think we will have multiple implementations for this interface?

I killed this type entirely.

sql/parser/row_container.go, line 33 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think the word "managed" doesn't mean anything, and generally it'd be good if this comment was expanded a little since I guess this type is gonna become pretty important.

Done.

sql/parser/row_container.go, line 34 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think there's a structure that at least superficially serves the same purpose of storing a bunch of nodes, maybe with some preallocations, in distsql. Maybe these can be unified? At least make sure you know about that guy.

Go doesn't really like to unify structs that work in slightly different ways. If Go(d) wanted us to do that, Go(d) would give us generics. ;)

But thanks for the hint.

sql/parser/row_container.go, line 39 [r1] (raw file):

Previously, RaduBerinde wrote…

it's the columns that are fixed not the row, maybe fixedColsSize

Done.

sql/parser/row_container.go, line 46 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

put the unit in the name? allocatedBytes

Done.

sql/parser/row_container.go, line 50 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

please document the capacity param, and how it doesn't affect the MemoryMonitor requirements.

Done.

sql/parser/row_container.go, line 50 [r1] (raw file):

Previously, RaduBerinde wrote…

capacity is not very telling, maybe numPreallocRows

Done.

sql/parser/row_container.go, line 68 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

ha? Despite the name, this doesn't take into account variable vs non-variable cols, or actually the type of the columns at all.
Maybe it's just the comment on this member that needs to change.

It does, see the loop above.

sql/parser/row_container.go, line 74 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

are RowContainers intended to be reused? In any case, please document that Close needs to be called on one in the struct doc.

I do not see how they can be reused -- our current planNode hierarchy nowhere supports reuse.

sql/parser/row_container.go, line 76 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

This seems weird to me. Is it particularly convenient to support nil containers?

Due to the way the SQL code does optimizations the same valueNode can be referenced from different branches of the same planNode hierarchy which would result in Close() being called multiple times. Since the object that contains a RowContainer reference also sets its pointer to nil (so it can be GCed), the test here is necessary.

sql/parser/row_container.go, line 98 [r1] (raw file):

Previously, RaduBerinde wrote…

I like this idea

I like your idea too but that's not the right place to implement it. The current mon object is exactly the entity responsible to perform the optimization you suggest; another object implementing the policy (and which will be shared across threads and thus require synchronization) will then provision each monitor with chunks like you suggest.

sql/parser/row_container.go, line 125 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

but you can't AddRow while someone's holding a reference to a PseudoPopped row, right? Add a comment somewhere. I guess ResetLen fixes this.
Also, how does returning a pointer help with the allocation?

This is exactly the code that was used in `sortTopKStrategy` and `values.go` previously (cc @nathanvanbenschoten). The explanation can be found in 13960c2.

sql/parser/row_container.go, line 138 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

"inquire". We're not traveling on dual carriageways over here.

See http://onelook.com/?w=enquire - both spelling are correct. I don't know where you found your definition.

sql/pgwire/v3.go, line 712 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

comment that this destroys the results

Done.

Comments from Reviewable

knz · 2016-08-25T16:36:17Z

TFYR btw ❤️

andreimatei · 2016-08-25T23:47:54Z

but I'm trying again with suggestions about changing the API of the monitor to something that's makes more explicit the lifetime of each allocation.
https://reviewable.io/reviews/cockroachdb/cockroach/8691#-KPtQvz6aW3pq220mLDb

Review status: 0 of 42 files reviewed at latest revision, 18 unresolved discussions, some commit checks failed.

server/admin.go, line 178 [r1] (raw file):

Previously, knz (kena) wrote…

I just made this slightly less ugly, PTAL.

I don't know if it's less ugly :) whatever

Previously, knz (kena) wrote…

You only need to keep track of the combined allocated size of things that are grouped together in a container (e.g. the suffixes here) so that you don't need to call Release once per item in the container. Since the prefix is not in a collection it's not necessary. I'll answer your suggestion separately in another comment below.

the conversation overlaps with the other thread, but specifically here I was complaining that the field called `allocated` that not represent everything that has been allocated by this struct.

Also, is the last prefix allocated ever released?

Previously, knz (kena) wrote…

Clarified.

but still, can't you just say "releases resources"? I understand GC latency, but that's hidden away.

sql/planner.go, line 228 [r1] (raw file):

Previously, knz (kena) wrote…

These are different uses. planner.query()/queryRow() is used by internal things like admin.go; whereas the logic in execStmt is for external clients.

bleah

sql/row_container.go, line 68 [r1] (raw file):

Previously, knz (kena) wrote…

It does, see the loop above.

ok

sql/row_container.go, line 76 [r1] (raw file):

Previously, knz (kena) wrote…

Due to the way the SQL code does optimizations the same valueNode can be referenced from different branches of the same planNode hierarchy which would result in Close() being called multiple times. Since the object that contains a RowContainer reference also sets its pointer to nil (so it can be GCed), the test here is necessary.

so can't that one funky user (the `ValueNode`) do the `nil` check before calling this?

sql/row_container.go, line 125 [r1] (raw file):

Previously, knz (kena) wrote…

This is exactly the code that was used in sortTopKStrategy and values.go previously (cc @nathanvanbenschoten). The explanation can be found in 13960c2.

ok

sql/row_container.go, line 138 [r1] (raw file):

Previously, knz (kena) wrote…

See http://onelook.com/?w=enquire - both spelling are correct. I don't know where you found your definition.

http://lmgtfy.com/?q=enquire :)

Previously, knz (kena) wrote…

I do not understand what you are saying. Please clarify.

So the `sortNode` wants a context. You're saying that the `sortNode`shouldn't use the planner's ctx _for logging_. Should it then use it for something else? Cause if so, we should be passing it. But we're not passing it. So the `sortNode` probably shouldn't use the planner's context for anything. So then your clarification is technically incorrect. At least that's my reasoning seeing these lines of code.

Previously, knz (kena) wrote…

I do not see the advantage. Memory-wise it will amount to the same or worse, because each span struct will need at least one counter. The amount of work will be the same. I understand your idea, but perhaps you could detail what the advantage would be?

yeah, each span would need a counter, but otherwise each client of the monitor needs a counter. That'd be the advantage - clients no longer need to remember what they've allocated. They just need to remember the spans in which one or more allocations happened. This would also make these allocations contexts more explicit (think RAII) - for example, in the `Distinct` node, there'd be two spans at any one point - one span for the many suffix, and one for the (single) current prefix. As opposed to what you have now - one field called `allocated` and another allocation that's not explicitly tracked.

Previously, knz (kena) wrote…

Since MemoryUsageMonitor currently carries state over from one statement to the next, and I'd need the context from a previous statement at the start of a new one to explain where this state came from (why there was memory left over), I really need to keep track of the old context here. The proper solution would be to have a context with a span for the entire transaction; then a method on the Context type to change the current tag (used in executor to track each subsequent statement in the transaction). Until these APIs exist I think this implementation here is the cheapest that can buy us proper context tracking.

I have added a TODO for you to look into this later.

how about making `StartMonitor` take a statement, and then your monitor can keep track of whatever it wants without holding on to a context?

Previously, knz (kena) wrote…

Again I do not understand what you are saying. Please clarify which code would be better. The context here carries the current SQL statement as tag and the current code enables the user to see which statement caused the panic. What you suggest does not do this.

well SQL statements are not tags, at least not currently, so I don't think the context in the message would help much. But I'm not opposed to generally using contexts too when we die. But the main point was that I don't think `log.Errorf(); panic()` is ever the right thing to do. I think we want either `panic()` or `log.Fatalf()` (which can include a stack string if you want).

Previously, knz (kena) wrote…

No here this is correct behavior. There can be memory left over between statements because of e.g. write intents.

hmm then is `StartMonitor` the right name for this method? How about `StartStatementTracking()`? But it kinda tickles me that this `Monitor` needs to understand about statements and write intents. Shouldn't it be more oblivious/general? If we move to an API based on spans (or maybe a better name would be `allocationContexts`), maybe the executor could set up such an `allocationContext` for a whole txn, and pass that down to whatever deep-ass layer is gonna actually count write intents (btw that's not done yet, is it?), besides passing the `Monitor` to the planner? This way you'd make the lifetimes of the contexts explicit - one is per SQL txn, others are per statement.

sql/mon/mem_usage_test.go, line 30 [r1] (raw file):

Previously, knz (kena) wrote…

I have added a comment to clarify. Why reset the verbosity? It's reset anyway between tests since there's a new server, isn't it?

ummm I call "we can do better" on this one. Test your stack function explicitly. You're changing a global here (it has nothing to do with a Server, it's per-process), that's gonna be a problem if we move to running tests in parallel. And it needs resetting so that other (sequential) tests running in the same process don't inherit it (it's not one process per test).

Maybe we generally should have a big test that exercises a cluster under V(100), but it should be a general thing, not related to this monitor.

Comments from Reviewable

knz · 2016-08-27T21:57:11Z

PTAL.

So thanks to Andrei's previous remarks I saw the light and refactored the entire thing to use span objects as interface between client components and the monitor. This makes the lifecycle of allocated objects more explicit.

Also I had to think more about how state is preserved in the monitor. I came to the conclusion/realization that a monitor is session-bound, not statement-bound. So I changed this too; the monitor is started and stopped with the lifecycle of the entire session.

Once this was done I could then successfully instrument the pgwire code to get prepared statements and portals also tracked by the monitor. Please check I didn't screw up anything there.

Then finally, before this goes in I'd like another round of advice as to how to best gather and log the allocation history in a session. To debug this code I have made a first attempt (currently commented out in the code with a "FIXME: andrei?") which logs all events to a bytes.Buffer and prints the contents of that buffer to the log at the end. What I would like instead is something like the SQL tracing: have a command-line flag or env var that enables the gathering of the allocation trace. However I don't think I understand what I should use here. Should I use a Context? a trace.Trace? Let's have a chat soon so I can implement the best thing.

Review status: 0 of 42 files reviewed at latest revision, 12 unresolved discussions, some commit checks failed.

Previously, andreimatei (Andrei Matei) wrote…

the conversation overlaps with the other thread, but specifically here I was complaining that the field called allocated that not represent everything that has been allocated by this struct.

Also, is the last prefix allocated ever released?

Done.

Previously, andreimatei (Andrei Matei) wrote…

but still, can't you just say "releases resources"? I understand GC latency, but that's hidden away.

I don't understand your comment. What would be the better explanation?

sql/row_container.go, line 76 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

so can't that one funky user (the ValueNode) do the nil check before calling this?

I checked and there are more than one place where this is used. I don't mind having the check one level higher, but does it make a difference really?

Previously, andreimatei (Andrei Matei) wrote…

So the sortNode wants a context. You're saying that the sortNodeshouldn't use the planner's ctx for logging. Should it then use it for something else? Cause if so, we should be passing it. But we're not passing it. So the sortNode probably shouldn't use the planner's context for anything. So then your clarification is technically incorrect.
At least that's my reasoning seeing these lines of code.

I see. I have rewritten the comment entirely to clarify. Let me know if that works for you.

Previously, andreimatei (Andrei Matei) wrote…

yeah, each span would need a counter, but otherwise each client of the monitor needs a counter. That'd be the advantage - clients no longer need to remember what they've allocated. They just need to remember the spans in which one or more allocations happened. This would also make these allocations contexts more explicit (think RAII) - for example, in the Distinct node, there'd be two spans at any one point - one span for the many suffix, and one for the (single) current prefix. As opposed to what you have now - one field called allocated and another allocation that's not explicitly tracked.

Done.

Previously, andreimatei (Andrei Matei) wrote…

how about making StartMonitor take a statement, and then your monitor can keep track of whatever it wants without holding on to a context?

I did it a different way. See comment at start of review reaction.

Previously, andreimatei (Andrei Matei) wrote…

well SQL statements are not tags, at least not currently, so I don't think the context in the message would help much. But I'm not opposed to generally using contexts too when we die.
But the main point was that I don't think log.Errorf(); panic() is ever the right thing to do. I think we want either panic() or log.Fatalf() (which can include a stack string if you want).

Done.

Previously, andreimatei (Andrei Matei) wrote…

hmm then is StartMonitor the right name for this method? How about StartStatementTracking()?
But it kinda tickles me that this Monitor needs to understand about statements and write intents. Shouldn't it be more oblivious/general? If we move to an API based on spans (or maybe a better name would be allocationContexts), maybe the executor could set up such an allocationContext for a whole txn, and pass that down to whatever deep-ass layer is gonna actually count write intents (btw that's not done yet, is it?), besides passing the Monitor to the planner? This way you'd make the lifetimes of the contexts explicit - one is per SQL txn, others are per statement.

Done.

sql/mon/mem_usage_test.go, line 30 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

ummm I call "we can do better" on this one. Test your stack function explicitly.
You're changing a global here (it has nothing to do with a Server, it's per-process), that's gonna be a problem if we move to running tests in parallel. And it needs resetting so that other (sequential) tests running in the same process don't inherit it (it's not one process per test).

Maybe we generally should have a big test that exercises a cluster under V(100), but it should be a general thing, not related to this monitor.

Done.

Comments from Reviewable

knz · 2016-08-29T20:01:06Z

sql/session_mem_usage.go

+package sql
+
+import "github.com/cockroachdb/cockroach/sql/mon"
+


@andreimatei the comment below is for you. I think I got the best I could as long as #7775 is not fixed.

RaduBerinde · 2016-09-06T16:31:56Z

Review status: 25 of 51 files reviewed at latest revision, 25 unresolved discussions, some commit checks pending.

  ctx context.Context, acc *MemoryAccount, oldSize, newSize int64,
) error {
  if err := mm.reserveMemory(ctx, newSize); err != nil {

We should panic if we don't have oldSize bytes allocated already.

Comments from Reviewable

knz · 2016-09-06T16:34:46Z

Review status: 25 of 51 files reviewed at latest revision, 25 unresolved discussions, some commit checks pending.

sql/mon/account.go, line 36 [r5] (raw file):

Previously, RaduBerinde wrote…

If GrowAccount fails, shouldn't we Close the account? (might not matter now, will matter for tracking). That would be the main benefit of this wrapper IMO.

Also [nit] would use something more suggestive than init, maybe initialReservation or initialAllocation

I changed the argument name.

Whomever will address #9122 will discover that you cannot require a Close() for a zero-valued account without a major refactoring of the entire planNode hierarchy (would need explicit Close() calls in each planNode constructor for children nodes upon errors in the constructor). I fear that the motivation for the OpenAccount API is thereby compromised and that registration is likely to be implemented in GrowAccount instead by means of if acc, ok := ... {} else { register(...) }.

Previously, RaduBerinde wrote…

I don't understand why we do this instead of growing by newSize-oldSize?

Because this reflects the actual memory usage: the new stuff is allocated in memory before the reference to the old stuff is overwritten, so they have to fit side by side for the allocation to be accepted.

Previously, RaduBerinde wrote…

We should panic if we don't have oldSize bytes allocated already.

releaseMemory takes care of this already,

Comments from Reviewable

RaduBerinde · 2016-09-06T16:39:48Z

Review status: 25 of 51 files reviewed at latest revision, 25 unresolved discussions, some commit checks pending.

Previously, knz (kena) wrote…

Because this reflects the actual memory usage: the new stuff is allocated in memory before the reference to the old stuff is overwritten, so they have to fit side by side for the allocation to be accepted.

If we want to reflect that, this is a half measure.. Another caller could get that `newSize-oldSize` memory before this caller actually gets a chance to copy stuff, so we don't really guarantee anything.

If we really wanted to account for that transient usage, it would have to be a pair of calls, one to grow before the realloc/copy, one to shrink after.

Previously, knz (kena) wrote…

releaseMemory takes care of this already,

No, because we first reserve `newSize` which presumably is bigger than `oldSize`..

Comments from Reviewable

RaduBerinde · 2016-09-06T16:44:48Z

Review status: 25 of 51 files reviewed at latest revision, 25 unresolved discussions, some commit checks pending.

sql/mon/account.go, line 36 [r5] (raw file):

Previously, knz (kena) wrote…

I changed the argument name.

Whomever will address #9122 will discover that you cannot require a Close() for a zero-valued account without a major refactoring of the entire planNode hierarchy (would need explicit Close() calls in each planNode constructor for children nodes upon errors in the constructor). I fear that the motivation for the OpenAccount API is thereby compromised and that registration is likely to be implemented in GrowAccount instead by means of if acc, ok := ... {} else { register(...) }.

Ok, I see.

I still think OpenAccount is useful, as we probably want to associate each account with the caller (file:line) that opens the account, not whoever happens to do the first Grow. So we would store the caller in the account (or in some external datastructure, e.g. map[Account*]callerInfo), even if we do the registration at the first Grow.

Comments from Reviewable

knz · 2016-09-06T16:52:01Z

Review status: 25 of 51 files reviewed at latest revision, 25 unresolved discussions, some commit checks pending.

sql/mon/account.go, line 36 [r5] (raw file):

Previously, RaduBerinde wrote…

Ok, I see.

I still think OpenAccount is useful, as we probably want to associate each account with the caller (file:line) that opens the account, not whoever happens to do the first Grow. So we would store the caller in the account (or in some external datastructure, e.g. map[Account*]callerInfo), even if we do the registration at the first Grow.

Yep that seems reasonable.

Previously, RaduBerinde wrote…

If we want to reflect that, this is a half measure.. Another caller could get that newSize-oldSize memory before this caller actually gets a chance to copy stuff, so we don't really guarantee anything.

If we really wanted to account for that transient usage, it would have to be a pair of calls, one to grow before the realloc/copy, one to shrink after.

Why would I want that? At the point where this function is called, both objects already exist in memory. The API serves to register a situation that already exists. Remember this is not an allocation API but rather a declaration API.

Previously, RaduBerinde wrote…

No, because we first reserve newSize which presumably is bigger than oldSize..

I'm not sure what you are trying to achieve here. If newSize is larger that means that a later call to releaseMemory will fail instead of this one. With the allocation trace it will be trivial to figure what went wrong. Why add a debug conditional on the critical path when there is already enough information collected in other places to troubleshoot in case of error? (Note: these checks are there to detect programming errors in tests, should never be encountered in production if testing is thorough)

Comments from Reviewable

RaduBerinde · 2016-09-06T17:01:32Z

Review status: 25 of 51 files reviewed at latest revision, 25 unresolved discussions, some commit checks pending.

Previously, knz (kena) wrote…

I'm not sure what you are trying to achieve here.
If newSize is larger that means that a later call to releaseMemory will fail instead of this one.
With the allocation trace it will be trivial to figure what went wrong.
Why add a debug conditional on the critical path when there is already enough information collected in other places to troubleshoot in case of error?
(Note: these checks are there to detect programming errors in tests, should never be encountered in production if testing is thorough)

Well the function assumes that `oldSize` is already allocated. I figured it's nice to verify the assumptions, but it's not a big deal.

Previously, knz (kena) wrote…

Why would I want that? At the point where this function is called, both objects already exist in memory. The API serves to register a situation that already exists. Remember this is not an allocation API but rather a declaration API.

Then I really don't see the point of first reserving `newSize`. If we already allocated the memory, and it's ok to use `newSize-oldSize` more bytes, what's the point of failing the operation (and forcing the caller to free the new object) rather than allowing it to proceed (and deallocate the old object)?

Comments from Reviewable

knz · 2016-09-06T17:20:40Z

Review status: 25 of 51 files reviewed at latest revision, 25 unresolved discussions, all commit checks successful.

Previously, RaduBerinde wrote…

Then I really don't see the point of first reserving newSize. If we already allocated the memory, and it's ok to use newSize-oldSize more bytes, what's the point of failing the operation (and forcing the caller to free the new object) rather than allowing it to proceed (and deallocate the old object)?

Ok now I understand what you mean. I changed the code accordingly.