sql/ir: teach the generator about field packing and primitive types #17758

knz · 2017-08-19T00:54:03Z

Note to reviewers: the patch appears large, but most of it is auto-generated code. Don't be intimidated!

There are four complementary parts in this patch:

the generator now predefines the common Go primitive types for
every input definition file (bool, string, int64, etc.)
the code generator is extended to take a few options on the command
line that control its behavior. The main() function is refactored to
make it more readable.
the code generator is taught about field packing, i.e. storing
multiple small numeric values in a variable of a larger size. See a copy
of the explanation below.
the Makefile is modified to use multiple configurations to generate
test IR environments. This is used to exercise different
combinations of the code generation parameters, to ensure that all
of them produce valid code.

A copy of the explanatory comment, that outlines the allocation of
memory slots for IR struct types, follows.

“The interesting part of code generation is fitting structs into nodes.

Each [IR] node consists of slots, i.e. spaces in memory where to put
values. The goal of slot allocation is to decide which struct field goes
to which slot.

Overview

There are four kinds of slots: numeric, string, references and extra.
The numeric, string and reference slots are called "dedicated".
Dedicated slots come in finite amount!
For example, in the default configuration, there are 2 numeric slots,
1 string slot and 2 reference slots.

In general, we prefer a dedicated slot. When dedicated slots are
exhausted for a particular type (e.g. when encountering the 3rd
numeric field in a struct in the default configuration), we spill
to the extra slots. Extra slots expand on demand without limit.

We support two modes: packed and unpacked.

Understanding unpacked mode can serve as foundation to better
understand packed mode.

How unpacked mode works

In that mode, each numeric field uses one numeric slot; each
reference to a struct uses one reference slot; and each reference
to a sum uses both a numeric slot (for the tag) and a reference
slot (for the value). Every other type uses an extra
slot. When dedicated slots are exhausted, an extra slot is also
used. For example:

type BinExprValue struct {
	Left  Expr
	Op    BinOp
	Right Expr
}
type BinExpr struct { *node }

//// Packing with 3 numeric slots:
func (x BinExprValue) R(a Allocator) BinExpr {
	node := a.new()
	node.nums[0] = numvalslot(x.Left.tag)
	node.nums[1] = numvalslot(x.Op)
	node.nums[2] = numvalslot(x.Right.tag)
	node.refs[0] = x.Left.ref
	node.refs[1] = x.Right.ref
	return BinExpr{node}
}
func (x BinExpr) Left() Expr  { return Expr{ExprTag(x.node.nums[0]), x.node.refs[0]} }
func (x BinExpr) Op() BinOp   { return BinOp(x.ref.nums[1]) }
func (x BinExpr) Right() Expr { return Expr{ExprTag(x.node.nums[2]), x.node.refs[1]} }

//// Packing with just 2 numeric slots, like in the default configuration:
type extraBinExpr struct {
	Right__Tag ExprTag
}
func (x BinExprValue) R(a Allocator) BinExpr {
	ref := a.new()
	ref.nums[0] = numvalslot(x.Left.tag)
	ref.nums[1] = numvalslot(x.Op)
	ref.refs[0] = x.Left.ref
	ref.refs[1] = x.Right.ref
	ref.extra = &extraBinExpr{}
	extra.Right__Tag = x.Right.tag
	return BinExpr{ref}
}
func (x BinExpr) Left() Expr  { return Expr{ExprTag(x.ref.nums[0]), x.ref.refs[0]} }
func (x BinExpr) Op() BinOp   { return BinOp(x.ref.nums[1]) }
func (x BinExpr) Right() Expr { return Expr{x.ref.extra.(*extraBinExpr).Right__Tag, x.ref.refs[1]} }

How packed mode works

The general idea of packing dedicated slots until they are
exhausted, and then spilling to extra slots, remains. What is
different is that the algorithm now tries to fit multiple numeric
fields in the same numeric slot, to conserve memory. The algorithm
starts with the largest fields first, to reduce fragmentation. This
incidentally implies that the fields are not stored in declaration
order.
For example:

//// Observe how all 3 numeric values are now packed in a single slot!
func (x BinExprValue) R(a Allocator) BinExpr {
	ref := a.new()
	ref.nums[0] = (ref.nums[0] &^ (BinExpr_Slot_Left__Tag_ValueMask  << BinExpr_Slot_Left__Tag_BitOffset))  | (numvalslot(x.Left.tag)  << BinExpr_Slot_Left__Tag_BitOffset)
	ref.nums[0] = (ref.nums[0] &^ (BinExpr_Slot_Op_ValueMask         << BinExpr_Slot_Op_BitOffset))         | (numvalslot(x.Op)        << BinExpr_Slot_Op_BitOffset)
	ref.nums[0] = (ref.nums[0] &^ (BinExpr_Slot_Right__Tag_ValueMask << BinExpr_Slot_Right__Tag_BitOffset)) | (numvalslot(x.Right.tag) << BinExpr_Slot_Right__Tag_BitOffset)
	ref.refs[0] = x.Left.ref
	ref.refs[1] = x.Right.ref
	return BinExpr{ref}
}

//// Note: the size in bits for sum types is computed automatically
//// depending on the number of variants.
const BinExpr_Slot_Left__Tag_BitOffset = 0
const BinExpr_Slot_Left__Tag_ValueMask = 0x3
const BinExpr_Slot_Op_BitOffset = 2
const BinExpr_Slot_Op_ValueMask = 0x3
const BinExpr_Slot_Right__Tag_BitOffset = 4
const BinExpr_Slot_Right__Tag_ValueMask = 0x3

func (x BinExpr) Left() Expr {
	return Expr{ExprTag((x.ref.nums[0] >> BinExpr_Slot_Left__Tag_BitOffset) & BinExpr_Slot_Left__Tag_ValueMask), x.ref.refs[0]}
}
func (x BinExpr) Op() BinOp {
	return BinOp((x.ref.nums[0] >> BinExpr_Slot_Op_BitOffset) & BinExpr_Slot_Op_ValueMask)
}
func (x BinExpr) Right() Expr {
	return Expr{ExprTag((x.ref.nums[0] >> BinExpr_Slot_Right__Tag_BitOffset) & BinExpr_Slot_Right__Tag_ValueMask), x.ref.refs[1]}
}

(see irgen/codegen/codegen.go for the rest of the code)

cockroach-teamcity · 2017-08-19T00:54:08Z

This change is

knz · 2017-08-19T01:00:19Z

There are groups of files that really need to be reviewed:

irgen/main.go: the top-level program. @justinj @m-schneider want to have a go?
Makefile: how things are built. Perhaps @benesch you have an opinion?
irgen/codegen/codegen.go: the packing algorithm. @jordanlewis you told me you were curious, so you could perhaps have a look? I'd be curious about what @nvanbenschoten thinks too.

I'm naming specific people to avoid inaction due to the bystander effect. Please work with me here: David E is not among us any more, and we need to get each other up to speed with this code somehow. This review should be a smooth learning curve.

knz · 2017-08-19T01:02:34Z

cc @petermattis you may be interested in the packing technique here.

Incidental note related to other discussions: I am also working on using these IR data structures to describe logical SQL plans. This will be applied for plan caching, and, I foresee, rule-based rewrites. There the packing techniques are very useful to ensure that we can have more plans cached in memory within the same memory budget.

petermattis · 2017-08-21T13:06:07Z

Do you really need both safe and unsafe modes for float values? I think you can use Float64bits and Float64frombits which internally use unsafe, but are blessed by the runtime.

In practice, we'll always be using the packed mode, right?

Review status: 0 of 40 files reviewed at latest revision, 1 unresolved discussion.

pkg/sql/ir/tests/defcfg/prims/base/base.ir.go, line 56 at r1 (raw file):

// R() method, which is type safe.
func (a Allocator) new() *node {
	nodes := *a.nodes

You need to make a.nodes a pointer to a slice in order to be able to mutate the slice. I think it would be more natural and the same number of allocations for these method to be defined on *Allocator and then nodes can be a regular slice (not a pointer to one):

type Allocator struct {
  nodes []node
}

func (a *Allocator) new() *node {
  if len(a.nodes) == 0 {
    a.nodes = make([]node, 256)
  }
  x := &nodes[0]
  a.nodes = a.nodes[1:]
  return x
}

Comments from Reviewable

CLAassistant · 2017-08-21T13:10:26Z

All committers have signed the CLA.

knz · 2017-09-04T14:22:40Z

Regarding the comment on unsafe / floats: unfortunately what the Go stdlib has to offer is not sufficient, because it does not enable packing two float32's inside a single uint64 without using (again) the unsafe package. So I'll keep the code for now.

Review status: 0 of 40 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.

pkg/sql/ir/tests/defcfg/prims/base/base.ir.go, line 56 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

You need to make a.nodes a pointer to a slice in order to be able to mutate the slice. I think it would be more natural and the same number of allocations for these method to be defined on *Allocator and then nodes can be a regular slice (not a pointer to one):
type Allocator struct {
  nodes []node
}

func (a *Allocator) new() *node {
  if len(a.nodes) == 0 {
    a.nodes = make([]node, 256)
  }
  x := &nodes[0]
  a.nodes = a.nodes[1:]
  return x
}

Done.

Comments from Reviewable

petermattis · 2017-09-05T13:52:42Z

Regarding the comment on unsafe / floats: unfortunately what the Go stdlib has to offer is not sufficient, because it does not enable packing two float32's inside a single uint64 without using (again) the unsafe package. So I'll keep the code for now.

https://play.golang.org/p/2p4rGXzKQJ

Also, even if the support in the stdlib isn't sufficient, I'm not seeing why you'd want to ever use the safe mode.

Review status: 0 of 40 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.

pkg/sql/ir/tests/defcfg/prims/base/base.ir.go, line 57 at r3 (raw file):

	a := MakeAllocator()
	return &a
}

You could get rid of both MakeAllocator and NewAllocator by making the zero-value of Allocator useful. The only purpose of MakeAllocator right now is to init nodes to 16 entries. That could be accomplished on the first call to new by an additional field indicating if it was the first call. Not sure if this is worthwhile, just pointing it out.

Comments from Reviewable

knz · 2017-09-06T14:19:30Z

https://play.golang.org/p/2p4rGXzKQJ

Okay I'll do that, just because float32 is rather uncommon. However, to shave the last hair on this yak, in terms of generated code this is somehat inferior. Your solution makes the compiler emit a 64-bit memory load followed by a shift immediately dependent on the load; mine makes the compiler emit an add (to offset the address) followed by a 32-bit load, where the result of the load may only be used much later. That's lighter on the cache and easier on the pipeline.

Also, even if the support in the stdlib isn't sufficient, I'm not seeing why you'd want to ever use the safe mode.

Because Go's manual is somewhat scary about the restrictions on performing pointer arithmetic in the uintptr type, and if there's ever a hard-to-find bug in the future, anyone working on this will want the ability to exclude any doubt about the memory semantics from this code.

a := MakeAllocator() return &a } You could get rid of both |MakeAllocator| and |NewAllocator| by making the zero-value of |Allocator| useful. The only purpose of |MakeAllocator| right now is to init |nodes| to 16 entries. That could be accomplished on the first call to |new| by an additional field indicating if it was the first call. Not sure if this is worthwhile, just pointing it out.

Yeah this looks like a good idea. Will look into it.

…

-- Raphael 'kena' Poss

knz · 2017-09-08T22:20:28Z

Okay I removed the unsafe mode by using the math float/int conversions.

Regarding NewAllocator/MakeAllocator I wasn't too happy about adding a conditional on the hot path, so I'm leaving that out for now.

@petermattis

Suggested by @petermattis.

There are four complementary parts in this patch: - the generator now predefines the common Go primitive types for every input definition file (bool, string, int64, etc.) - the code generator is extended to take a few options on the command line that control its behavior. The main() function is refactored to make it more readable. - the code generator is taught about field packing, i.e. storing multiple small numeric values in a variable of a larger size. See a copy of the explanation below. - the Makefile is modified to use multiple configurations to generate test IR environments. This is used to exercise different combinations of the code generation parameters, to ensure that all of them produce valid code. A copy of the explanatory comment, that outlines the allocation of memory slots for IR struct types, follows. “The interesting part of code generation is fitting structs into nodes. Each [IR] node consists of slots, i.e. spaces in memory where to put values. The goal of slot allocation is to decide which struct field goes to which slot. There are four kinds of slots: numeric, string, references and extra. The numeric, string and reference slots are called "dedicated". Dedicated slots come in finite amount! For example, in the default configuration, there are 2 numeric slots, 1 string slot and 2 reference slots. In general, we prefer a dedicated slot. When dedicated slots are exhausted for a particular type (e.g. when encountering the 3rd numeric field in a struct in the default configuration), we spill to the extra slots. Extra slots expand on demand without limit. We support two modes: packed and unpacked. Understanding unpacked mode can serve as foundation to better understand packed mode. In that mode, each numeric field uses one numeric slot; each reference to a struct uses one reference slot; and each reference to a sum uses both a numeric slot (for the tag) and a reference slot (for the value). Every other type uses an extra slot. When dedicated slots are exhausted, an extra slot is also used. For example: ```go type BinExprValue struct { Left Expr Op BinOp Right Expr } type BinExpr struct { *node } //// Packing with 3 numeric slots: func (x BinExprValue) R(a Allocator) BinExpr { node := a.new() node.nums[0] = numvalslot(x.Left.tag) node.nums[1] = numvalslot(x.Op) node.nums[2] = numvalslot(x.Right.tag) node.refs[0] = x.Left.ref node.refs[1] = x.Right.ref return BinExpr{node} } func (x BinExpr) Left() Expr { return Expr{ExprTag(x.node.nums[0]), x.node.refs[0]} } func (x BinExpr) Op() BinOp { return BinOp(x.ref.nums[1]) } func (x BinExpr) Right() Expr { return Expr{ExprTag(x.node.nums[2]), x.node.refs[1]} } //// Packing with just 2 numeric slots, like in the default configuration: type extraBinExpr struct { Right__Tag ExprTag } func (x BinExprValue) R(a Allocator) BinExpr { ref := a.new() ref.nums[0] = numvalslot(x.Left.tag) ref.nums[1] = numvalslot(x.Op) ref.refs[0] = x.Left.ref ref.refs[1] = x.Right.ref ref.extra = &extraBinExpr{} extra.Right__Tag = x.Right.tag return BinExpr{ref} } func (x BinExpr) Left() Expr { return Expr{ExprTag(x.ref.nums[0]), x.ref.refs[0]} } func (x BinExpr) Op() BinOp { return BinOp(x.ref.nums[1]) } func (x BinExpr) Right() Expr { return Expr{x.ref.extra.(*extraBinExpr).Right__Tag, x.ref.refs[1]} } ``` The general idea of packing dedicated slots until they are exhausted, and then spilling to extra slots, remains. What is different is that the algorithm now tries to fit multiple numeric fields in the same numeric slot, to conserve memory. The algorithm starts with the largest fields first, to reduce fragmentation. This incidentally implies that the fields are not stored in declaration order. For example: ```go //// Observe how all 3 numeric values are now packed in a single slot! func (x BinExprValue) R(a Allocator) BinExpr { ref := a.new() ref.nums[0] = (ref.nums[0] &^ (BinExpr_Slot_Left__Tag_ValueMask << BinExpr_Slot_Left__Tag_BitOffset)) | (numvalslot(x.Left.tag) << BinExpr_Slot_Left__Tag_BitOffset) ref.nums[0] = (ref.nums[0] &^ (BinExpr_Slot_Op_ValueMask << BinExpr_Slot_Op_BitOffset)) | (numvalslot(x.Op) << BinExpr_Slot_Op_BitOffset) ref.nums[0] = (ref.nums[0] &^ (BinExpr_Slot_Right__Tag_ValueMask << BinExpr_Slot_Right__Tag_BitOffset)) | (numvalslot(x.Right.tag) << BinExpr_Slot_Right__Tag_BitOffset) ref.refs[0] = x.Left.ref ref.refs[1] = x.Right.ref return BinExpr{ref} } //// Note: the size in bits for sum types is computed automatically //// depending on the number of variants. const BinExpr_Slot_Left__Tag_BitOffset = 0 const BinExpr_Slot_Left__Tag_ValueMask = 0x3 const BinExpr_Slot_Op_BitOffset = 2 const BinExpr_Slot_Op_ValueMask = 0x3 const BinExpr_Slot_Right__Tag_BitOffset = 4 const BinExpr_Slot_Right__Tag_ValueMask = 0x3 func (x BinExpr) Left() Expr { return Expr{ExprTag((x.ref.nums[0] >> BinExpr_Slot_Left__Tag_BitOffset) & BinExpr_Slot_Left__Tag_ValueMask), x.ref.refs[0]} } func (x BinExpr) Op() BinOp { return BinOp((x.ref.nums[0] >> BinExpr_Slot_Op_BitOffset) & BinExpr_Slot_Op_ValueMask) } func (x BinExpr) Right() Expr { return Expr{ExprTag((x.ref.nums[0] >> BinExpr_Slot_Right__Tag_BitOffset) & BinExpr_Slot_Right__Tag_ValueMask), x.ref.refs[1]} } ``` (see `irgen/codegen/codegen.go` for the rest of the code)

knz · 2018-01-25T21:49:31Z

Superseded by #19135.

knz requested review from a team August 19, 2017 00:54

knz requested review from a team and removed request for a team August 19, 2017 01:07

knz force-pushed the 20170819-ir-types branch 3 times, most recently from d456e95 to 137d54e Compare August 21, 2017 13:05

knz force-pushed the 20170819-ir-types branch from 137d54e to c626425 Compare September 4, 2017 14:21

knz force-pushed the 20170819-ir-types branch from c626425 to 84f438a Compare September 4, 2017 15:49

knz mentioned this pull request Sep 4, 2017

sql/ir: auto-generate an S-expression parser #18198

Closed

knz force-pushed the 20170819-ir-types branch from 84f438a to 5715634 Compare September 8, 2017 22:19

knz force-pushed the 20170819-ir-types branch 2 times, most recently from 9b7cf03 to 7b12dd5 Compare September 9, 2017 15:05

knz added 3 commits September 9, 2017 21:14

sql/ir: use a special character to highlight metavars in templates

b2a7574

ir: pass the node allocator by reference

5cba20a

Suggested by @petermattis.

knz force-pushed the 20170819-ir-types branch from 7b12dd5 to 298beba Compare September 9, 2017 19:27

knz closed this Jan 25, 2018

knz deleted the 20170819-ir-types branch April 27, 2018 18:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql/ir: teach the generator about field packing and primitive types #17758

sql/ir: teach the generator about field packing and primitive types #17758

knz commented Aug 19, 2017 •

edited

Loading

cockroach-teamcity commented Aug 19, 2017

knz commented Aug 19, 2017 •

edited

Loading

knz commented Aug 19, 2017

petermattis commented Aug 21, 2017

CLAassistant commented Aug 21, 2017 •

edited

Loading

knz commented Sep 4, 2017

petermattis commented Sep 5, 2017

knz commented Sep 6, 2017 via email

knz commented Sep 8, 2017

knz commented Jan 25, 2018

sql/ir: teach the generator about field packing and primitive types #17758

sql/ir: teach the generator about field packing and primitive types #17758

Conversation

knz commented Aug 19, 2017 • edited Loading

Overview

How unpacked mode works

How packed mode works

cockroach-teamcity commented Aug 19, 2017

knz commented Aug 19, 2017 • edited Loading

knz commented Aug 19, 2017

petermattis commented Aug 21, 2017

CLAassistant commented Aug 21, 2017 • edited Loading

knz commented Sep 4, 2017

petermattis commented Sep 5, 2017

knz commented Sep 6, 2017 via email

knz commented Sep 8, 2017

knz commented Jan 25, 2018

knz commented Aug 19, 2017 •

edited

Loading

knz commented Aug 19, 2017 •

edited

Loading

CLAassistant commented Aug 21, 2017 •

edited

Loading