sql/ir: teach the generator about field packing and primitive types

There are four complementary parts in this patch: - the generator now predefines the common Go primitive types for every input definition file (bool, string, int64, etc.) - the code generator is extended to take a few options on the command line that control its behavior. The main() function is refactored to make it more readable. - the code generator is taught about field packing, i.e. storing multiple small numeric values in a variable of a larger size. See a copy of the explanation below. - the Makefile is modified to use multiple configurations to generate test IR environments. This is used to exercise different combinations of the code generation parameters, to ensure that all of them produce valid code. A copy of the explanatory comment, that outlines the allocation of memory slots for IR struct types, follows. “The interesting part of code generation is fitting structs into nodes. Each [IR] node consists of slots, i.e. spaces in memory where to put values. The goal of slot allocation is to decide which struct field goes to which slot. There are four kinds of slots: numeric, string, references and extra. The numeric, string and reference slots are called "dedicated". Dedicated slots come in finite amount! For example, in the default configuration, there are 2 numeric slots, 1 string slot and 2 reference slots. In general, we prefer a dedicated slot. When dedicated slots are exhausted for a particular type (e.g. when encountering the 3rd numeric field in a struct in the default configuration), we spill to the extra slots. Extra slots expand on demand without limit. We support two modes: packed and unpacked. Understanding unpacked mode can serve as foundation to better understand packed mode. In that mode, each numeric field uses one numeric slot; each reference to a struct uses one reference slot; and each reference to a sum uses both a numeric slot (for the tag) and a reference slot (for the value). Every other type uses an extra slot. When dedicated slots are exhausted, an extra slot is also used. For example: ```go type BinExprValue struct { Left Expr Op BinOp Right Expr } type BinExpr struct { *node } //// Packing with 3 numeric slots: func (x BinExprValue) R(a Allocator) BinExpr { node := a.new() node.nums[0] = numvalslot(x.Left.tag) node.nums[1] = numvalslot(x.Op) node.nums[2] = numvalslot(x.Right.tag) node.refs[0] = x.Left.ref node.refs[1] = x.Right.ref return BinExpr{node} } func (x BinExpr) Left() Expr { return Expr{ExprTag(x.node.nums[0]), x.node.refs[0]} } func (x BinExpr) Op() BinOp { return BinOp(x.ref.nums[1]) } func (x BinExpr) Right() Expr { return Expr{ExprTag(x.node.nums[2]), x.node.refs[1]} } //// Packing with just 2 numeric slots, like in the default configuration: type extraBinExpr struct { Right__Tag ExprTag } func (x BinExprValue) R(a Allocator) BinExpr { ref := a.new() ref.nums[0] = numvalslot(x.Left.tag) ref.nums[1] = numvalslot(x.Op) ref.refs[0] = x.Left.ref ref.refs[1] = x.Right.ref ref.extra = &extraBinExpr{} extra.Right__Tag = x.Right.tag return BinExpr{ref} } func (x BinExpr) Left() Expr { return Expr{ExprTag(x.ref.nums[0]), x.ref.refs[0]} } func (x BinExpr) Op() BinOp { return BinOp(x.ref.nums[1]) } func (x BinExpr) Right() Expr { return Expr{x.ref.extra.(*extraBinExpr).Right__Tag, x.ref.refs[1]} } ``` The general idea of packing dedicated slots until they are exhausted, and then spilling to extra slots, remains. What is different is that the algorithm now tries to fit multiple numeric fields in the same numeric slot, to conserve memory. The algorithm starts with the largest fields first, to reduce fragmentation. This incidentally implies that the fields are not stored in declaration order. For example: ```go //// Observe how all 3 numeric values are now packed in a single slot! func (x BinExprValue) R(a Allocator) BinExpr { ref := a.new() ref.nums[0] = (ref.nums[0] &^ (BinExpr_Slot_Left__Tag_ValueMask << BinExpr_Slot_Left__Tag_BitOffset)) | (numvalslot(x.Left.tag) << BinExpr_Slot_Left__Tag_BitOffset) ref.nums[0] = (ref.nums[0] &^ (BinExpr_Slot_Op_ValueMask << BinExpr_Slot_Op_BitOffset)) | (numvalslot(x.Op) << BinExpr_Slot_Op_BitOffset) ref.nums[0] = (ref.nums[0] &^ (BinExpr_Slot_Right__Tag_ValueMask << BinExpr_Slot_Right__Tag_BitOffset)) | (numvalslot(x.Right.tag) << BinExpr_Slot_Right__Tag_BitOffset) ref.refs[0] = x.Left.ref ref.refs[1] = x.Right.ref return BinExpr{ref} } //// Note: the size in bits for sum types is computed automatically //// depending on the number of variants. const BinExpr_Slot_Left__Tag_BitOffset = 0 const BinExpr_Slot_Left__Tag_ValueMask = 0x3 const BinExpr_Slot_Op_BitOffset = 2 const BinExpr_Slot_Op_ValueMask = 0x3 const BinExpr_Slot_Right__Tag_BitOffset = 4 const BinExpr_Slot_Right__Tag_ValueMask = 0x3 func (x BinExpr) Left() Expr { return Expr{ExprTag((x.ref.nums[0] >> BinExpr_Slot_Left__Tag_BitOffset) & BinExpr_Slot_Left__Tag_ValueMask), x.ref.refs[0]} } func (x BinExpr) Op() BinOp { return BinOp((x.ref.nums[0] >> BinExpr_Slot_Op_BitOffset) & BinExpr_Slot_Op_ValueMask) } func (x BinExpr) Right() Expr { return Expr{ExprTag((x.ref.nums[0] >> BinExpr_Slot_Right__Tag_BitOffset) & BinExpr_Slot_Right__Tag_ValueMask), x.ref.refs[1]} } ``` (see `irgen/codegen/codegen.go` for the rest of the code)
cockroachdb · Sep 9, 2017 · 298beba · 298beba
1 parent 5cba20a
commit 298beba
Show file tree

Hide file tree

Showing 45 changed files with 12,511 additions and 166 deletions.
diff --git a/pkg/sql/ir/Makefile b/pkg/sql/ir/Makefile
@@ -4,9 +4,12 @@
 SHELL = /usr/bin/env bash
 
 TEMPLATES = base/base.tmpl.go base/sexpr.tmpl.go
-DEFS = example
-TARGETS = $(foreach E,$(DEFS),$(foreach T,$(TEMPLATES),$(E)/$(T:.tmpl.go=.ir.go)))
-all: $(TARGETS)
+DEFS = example prims
+CONFIGS = defcfg nopack expanded expandedpack onlyextra smallslots
+TARGETS = $(foreach C,$(CONFIGS),$(foreach E,$(DEFS),$(foreach T,$(TEMPLATES),tests/$(C)/$(E)/$(T:.tmpl.go=.ir.go))))
+TEST_TARGETS = $(foreach C,$(CONFIGS),tests/$(C)/$(C)_cfg_test.go)
+
+all: $(TARGETS) $(TEST_TARGETS)
 
 clean:
 	rm -f $(TARGETS)
@@ -17,23 +20,39 @@ irgen/irgen: $(shell find irgen -name \*.go)
 
 .SUFFIXES: .ir.go .tmpl.go
 
-# example/base/base.ir.go -> example
-ir_def_base = $(firstword $(subst /, ,$(1)))
+%_cfg_test.go: tests/irgen_test.go Makefile
+	mkdir -p $(notdir $*)
+	(echo "// Code generated by make. DO NOT EDIT."; \
+	 echo "// GENERATED FILE DO NOT EDIT"; \
+	 sed -e "s,ir/tests/defcfg,ir/tests/$(notdir $*),g;s,package main,package $(notdir $*),g" < $<) > $@.tmp
+	mv -f $@.tmp $@
+
+# tests/cfg/example/base/base.ir.go -> example
+ir_def_base = $(word 3,$(subst /, ,$(1)))
+
+# tests/cfg/example/base/base.ir.go -> cfg
+cfg_base = $(word 2,$(subst /, ,$(1)))
 
-# example/base/base.ir.go -> base/base.go
-template_path = $(subst .ir,.tmpl,$(subst $(call ir_def_base,$(1))/,,$(1)))
+# tests/cfg/example/base/base.ir.go -> base/base.tmpl.go
+template_path = $(subst .ir,.tmpl,$(subst tests/$(call cfg_base,$(1))/$(call ir_def_base,$(1))/,,$(1)))
 
 # The special phrase ".SECONDEXPANSION" allows one to express rule
 # dependencies as a function of the name of the target.
 .SECONDEXPANSION:
 
 # The dependencies in the following rules are exactly, in this order:
 # - the name of the generator program ./irgen,
+# - the configuration file,
 # - the source template,
 # - the input definition file
 # Those three items are then taken together to construct
 # a valid command line with $^
-%.ir.go: irgen/irgen $$(call template_path,$$@) $$(call ir_def_base,$$@).def
-	$^ > $@.tmp
+%.ir.go: irgen/irgen tests/configs/$$(call cfg_base,$$@) $$(call template_path,$$@) tests/$$(call ir_def_base,$$@).def
+	mkdir -p $(dir $@)
+	run() { \
+	  set -x; \
+	  $$1 `cat $$2|grep -v '^#'` $$3 $$4; \
+	}; run $^ > $@.tmp
 	mv -f $@.tmp $@
 	gofmt -s -w $@
+	goimports -w $@
diff --git a/pkg/sql/ir/base/base.tmpl.go b/pkg/sql/ir/base/base.tmpl.go
@@ -14,18 +14,27 @@
 
 package base
 
-import "fmt"
+import (
+	"fmt"
+)
 
 // node is a generic ADT node type, with slots for references to other nodes and
 // enumeration values. Values that don't fit and values of other types go in the
 // extra field.
 type node struct {
-	refs  [ºnumRefsPerNode]*node
-	enums [ºnumEnumsPerNode]enum
+	refs [ºnumRefsPerNode]*node
+	nums [ºnumNumsPerNode]numvalslot
+	strs [ºnumStrsPerNode]string
 	extra
 }
 
-type enum int32
+// enum is the type to define the tag part for working copies of IR
+// node with sum and enum types.
+type enum uint32
+
+// numvalslot is the type used to store integer values in persistent IR nodes.
+// must be larger than or as large as enum.
+type numvalslot ºnumValSlotType
 
 type extra interface {
 	extraRefs() []*node
@@ -95,6 +104,18 @@ func (x ºEnum) String() string {
 // ºStruct is the type of a reference to an immutable record.
 type ºStruct struct{ ref *node }
 
+// @for slot
+
+const ºStruct_Slot_ºslotName_Type = ºslotType
+const ºStruct_Slot_ºslotName_Num = ºslotNum
+const ºStruct_Slot_ºslotName_BitSize = ºslotBitSize
+const ºStruct_Slot_ºslotName_BitOffset = ºslotBitOffset
+const ºStruct_Slot_ºslotName_ByteSize = ºslotByteSize
+const ºStruct_Slot_ºslotName_ByteOffset = ºslotByteOffset
+const ºStruct_Slot_ºslotName_ValueMask = ºslotValueMask
+
+// @done slot
+
 // ºStructValue is the logical type of a record. Immutable records are stored in
 // nodes.
 type ºStructValue struct {
@@ -140,7 +161,7 @@ func (x ºStruct) ºItem() ºtype { return ºgetField(x.ref) }
 func (x ºStruct) V() ºStructValue {
 	return ºStructValue{
 		// @for item
-		ºgetField(x.ref),
+		ºItem: ºgetField(x.ref),
 		// @done item
 	}
 }

diff --git a/pkg/sql/ir/base/scaffolding.go b/pkg/sql/ir/base/scaffolding.go
@@ -23,7 +23,8 @@ import "bytes"
 // These constants are replaced in template code.
 const (
 	ºnumRefsPerNode = iota
-	ºnumEnumsPerNode
+	ºnumNumsPerNode
+	ºnumStrsPerNode
 	ºtag
 )
 
@@ -39,3 +40,16 @@ func (t ºtype) FormatSExpr(buf *bytes.Buffer) {}
 
 // FormatSExprºType is a stub for Sexpr formatters for primitive types.
 func FormatSExprºType(buf *bytes.Buffer, x ºtype) {}
+
+// ºnumValSlotType is overridden during code generation.
+type ºnumValSlotType uint64
+
+const (
+	ºslotType       = 0
+	ºslotNum        = 0
+	ºslotBitSize    = 0
+	ºslotBitOffset  = 0
+	ºslotByteSize   = 0
+	ºslotByteOffset = 0
+	ºslotValueMask  = 0
+)
diff --git a/pkg/sql/ir/base/sexpr.tmpl.go b/pkg/sql/ir/base/sexpr.tmpl.go
@@ -16,16 +16,25 @@ package base
 
 import (
 	"bytes"
-	"strconv"
+	"fmt"
 )
 
 type SexprFormatter interface {
 	FormatSExpr(buf *bytes.Buffer)
 }
 
-func FormatSExprInt64(buf *bytes.Buffer, x int64) {
-	buf.WriteString(strconv.FormatInt(x, 10))
-}
+func FormatSExprBool(buf *bytes.Buffer, x bool)       { fmt.Fprintf(buf, "%v", x) }
+func FormatSExprInt64(buf *bytes.Buffer, x int64)     { fmt.Fprintf(buf, "%v", x) }
+func FormatSExprInt32(buf *bytes.Buffer, x int32)     { fmt.Fprintf(buf, "%v", x) }
+func FormatSExprInt16(buf *bytes.Buffer, x int16)     { fmt.Fprintf(buf, "%v", x) }
+func FormatSExprInt8(buf *bytes.Buffer, x int8)       { fmt.Fprintf(buf, "%v", x) }
+func FormatSExprUint64(buf *bytes.Buffer, x uint64)   { fmt.Fprintf(buf, "%v", x) }
+func FormatSExprUint32(buf *bytes.Buffer, x uint32)   { fmt.Fprintf(buf, "%v", x) }
+func FormatSExprUint16(buf *bytes.Buffer, x uint16)   { fmt.Fprintf(buf, "%v", x) }
+func FormatSExprUint8(buf *bytes.Buffer, x uint8)     { fmt.Fprintf(buf, "%v", x) }
+func FormatSExprFloat32(buf *bytes.Buffer, x float32) { fmt.Fprintf(buf, "%v", x) }
+func FormatSExprFloat64(buf *bytes.Buffer, x float64) { fmt.Fprintf(buf, "%v", x) }
+func FormatSExprString(buf *bytes.Buffer, x string)   { fmt.Fprintf(buf, "%q", x) }
 
 // @for enum