Skip to content

Commit

Permalink
sql/ir: teach the generator about field packing and primitive types
Browse files Browse the repository at this point in the history
There are four complementary parts in this patch:

- the generator now predefines the common Go primitive types for
  every input definition file (bool, string, int64, etc.)

- the code generator is extended to take a few options on the command
  line that control its behavior. The main() function is refactored to
  make it more readable.

- the code generator is taught about field packing, i.e. storing
  multiple small numeric values in a variable of a larger size. See a copy
  of the explanation below.

- the Makefile is modified to use multiple configurations to generate
  test IR environments. This is used to exercise different
  combinations of the code generation parameters, to ensure that all
  of them produce valid code.

A copy of the explanatory comment, that outlines the allocation of
memory slots for IR struct types, follows.

“The interesting part of code generation is fitting structs into nodes.

Each [IR] node consists of slots, i.e. spaces in memory where to put
values. The goal of slot allocation is to decide which struct field goes
to which slot.

There are four kinds of slots: numeric, string, references and extra.
The numeric, string and reference slots are called "dedicated".
Dedicated slots come in finite amount!
For example, in the default configuration, there are 2 numeric slots,
1 string slot and 2 reference slots.

In general, we prefer a dedicated slot. When dedicated slots are
exhausted for a particular type (e.g. when encountering the 3rd
numeric field in a struct in the default configuration), we spill
to the extra slots. Extra slots expand on demand without limit.

We support two modes: packed and unpacked.

Understanding unpacked mode can serve as foundation to better
understand packed mode.

In that mode, each numeric field uses one numeric slot; each
reference to a struct uses one reference slot; and each reference
to a sum uses both a numeric slot (for the tag) and a reference
slot (for the value). Every other type uses an extra
slot. When dedicated slots are exhausted, an extra slot is also
used. For example:

```go
type BinExprValue struct {
	Left  Expr
	Op    BinOp
	Right Expr
}
type BinExpr struct { *node }

//// Packing with 3 numeric slots:
func (x BinExprValue) R(a Allocator) BinExpr {
	node := a.new()
	node.nums[0] = numvalslot(x.Left.tag)
	node.nums[1] = numvalslot(x.Op)
	node.nums[2] = numvalslot(x.Right.tag)
	node.refs[0] = x.Left.ref
	node.refs[1] = x.Right.ref
	return BinExpr{node}
}
func (x BinExpr) Left() Expr  { return Expr{ExprTag(x.node.nums[0]), x.node.refs[0]} }
func (x BinExpr) Op() BinOp   { return BinOp(x.ref.nums[1]) }
func (x BinExpr) Right() Expr { return Expr{ExprTag(x.node.nums[2]), x.node.refs[1]} }

//// Packing with just 2 numeric slots, like in the default configuration:
type extraBinExpr struct {
	Right__Tag ExprTag
}
func (x BinExprValue) R(a Allocator) BinExpr {
	ref := a.new()
	ref.nums[0] = numvalslot(x.Left.tag)
	ref.nums[1] = numvalslot(x.Op)
	ref.refs[0] = x.Left.ref
	ref.refs[1] = x.Right.ref
	ref.extra = &extraBinExpr{}
	extra.Right__Tag = x.Right.tag
	return BinExpr{ref}
}
func (x BinExpr) Left() Expr  { return Expr{ExprTag(x.ref.nums[0]), x.ref.refs[0]} }
func (x BinExpr) Op() BinOp   { return BinOp(x.ref.nums[1]) }
func (x BinExpr) Right() Expr { return Expr{x.ref.extra.(*extraBinExpr).Right__Tag, x.ref.refs[1]} }
```

The general idea of packing dedicated slots until they are
exhausted, and then spilling to extra slots, remains. What is
different is that the algorithm now tries to fit multiple numeric
fields in the same numeric slot, to conserve memory. The algorithm
starts with the largest fields first, to reduce fragmentation. This
incidentally implies that the fields are not stored in declaration
order.
For example:

```go
//// Observe how all 3 numeric values are now packed in a single slot!
func (x BinExprValue) R(a Allocator) BinExpr {
	ref := a.new()
	ref.nums[0] = (ref.nums[0] &^ (BinExpr_Slot_Left__Tag_ValueMask  << BinExpr_Slot_Left__Tag_BitOffset))  | (numvalslot(x.Left.tag)  << BinExpr_Slot_Left__Tag_BitOffset)
	ref.nums[0] = (ref.nums[0] &^ (BinExpr_Slot_Op_ValueMask         << BinExpr_Slot_Op_BitOffset))         | (numvalslot(x.Op)        << BinExpr_Slot_Op_BitOffset)
	ref.nums[0] = (ref.nums[0] &^ (BinExpr_Slot_Right__Tag_ValueMask << BinExpr_Slot_Right__Tag_BitOffset)) | (numvalslot(x.Right.tag) << BinExpr_Slot_Right__Tag_BitOffset)
	ref.refs[0] = x.Left.ref
	ref.refs[1] = x.Right.ref
	return BinExpr{ref}
}

//// Note: the size in bits for sum types is computed automatically
//// depending on the number of variants.
const BinExpr_Slot_Left__Tag_BitOffset = 0
const BinExpr_Slot_Left__Tag_ValueMask = 0x3
const BinExpr_Slot_Op_BitOffset = 2
const BinExpr_Slot_Op_ValueMask = 0x3
const BinExpr_Slot_Right__Tag_BitOffset = 4
const BinExpr_Slot_Right__Tag_ValueMask = 0x3

func (x BinExpr) Left() Expr {
	return Expr{ExprTag((x.ref.nums[0] >> BinExpr_Slot_Left__Tag_BitOffset) & BinExpr_Slot_Left__Tag_ValueMask), x.ref.refs[0]}
}
func (x BinExpr) Op() BinOp {
	return BinOp((x.ref.nums[0] >> BinExpr_Slot_Op_BitOffset) & BinExpr_Slot_Op_ValueMask)
}
func (x BinExpr) Right() Expr {
	return Expr{ExprTag((x.ref.nums[0] >> BinExpr_Slot_Right__Tag_BitOffset) & BinExpr_Slot_Right__Tag_ValueMask), x.ref.refs[1]}
}
```

(see `irgen/codegen/codegen.go` for the rest of the code)
  • Loading branch information
knz committed Sep 9, 2017
1 parent 5cba20a commit 298beba
Show file tree
Hide file tree
Showing 45 changed files with 12,511 additions and 166 deletions.
37 changes: 28 additions & 9 deletions pkg/sql/ir/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,12 @@
SHELL = /usr/bin/env bash

TEMPLATES = base/base.tmpl.go base/sexpr.tmpl.go
DEFS = example
TARGETS = $(foreach E,$(DEFS),$(foreach T,$(TEMPLATES),$(E)/$(T:.tmpl.go=.ir.go)))
all: $(TARGETS)
DEFS = example prims
CONFIGS = defcfg nopack expanded expandedpack onlyextra smallslots
TARGETS = $(foreach C,$(CONFIGS),$(foreach E,$(DEFS),$(foreach T,$(TEMPLATES),tests/$(C)/$(E)/$(T:.tmpl.go=.ir.go))))
TEST_TARGETS = $(foreach C,$(CONFIGS),tests/$(C)/$(C)_cfg_test.go)

all: $(TARGETS) $(TEST_TARGETS)

clean:
rm -f $(TARGETS)
Expand All @@ -17,23 +20,39 @@ irgen/irgen: $(shell find irgen -name \*.go)

.SUFFIXES: .ir.go .tmpl.go

# example/base/base.ir.go -> example
ir_def_base = $(firstword $(subst /, ,$(1)))
%_cfg_test.go: tests/irgen_test.go Makefile
mkdir -p $(notdir $*)
(echo "// Code generated by make. DO NOT EDIT."; \
echo "// GENERATED FILE DO NOT EDIT"; \
sed -e "s,ir/tests/defcfg,ir/tests/$(notdir $*),g;s,package main,package $(notdir $*),g" < $<) > $@.tmp
mv -f $@.tmp $@

# tests/cfg/example/base/base.ir.go -> example
ir_def_base = $(word 3,$(subst /, ,$(1)))

# tests/cfg/example/base/base.ir.go -> cfg
cfg_base = $(word 2,$(subst /, ,$(1)))

# example/base/base.ir.go -> base/base.go
template_path = $(subst .ir,.tmpl,$(subst $(call ir_def_base,$(1))/,,$(1)))
# tests/cfg/example/base/base.ir.go -> base/base.tmpl.go
template_path = $(subst .ir,.tmpl,$(subst tests/$(call cfg_base,$(1))/$(call ir_def_base,$(1))/,,$(1)))

# The special phrase ".SECONDEXPANSION" allows one to express rule
# dependencies as a function of the name of the target.
.SECONDEXPANSION:

# The dependencies in the following rules are exactly, in this order:
# - the name of the generator program ./irgen,
# - the configuration file,
# - the source template,
# - the input definition file
# Those three items are then taken together to construct
# a valid command line with $^
%.ir.go: irgen/irgen $$(call template_path,$$@) $$(call ir_def_base,$$@).def
$^ > $@.tmp
%.ir.go: irgen/irgen tests/configs/$$(call cfg_base,$$@) $$(call template_path,$$@) tests/$$(call ir_def_base,$$@).def
mkdir -p $(dir $@)
run() { \
set -x; \
$$1 `cat $$2|grep -v '^#'` $$3 $$4; \
}; run $^ > $@.tmp
mv -f $@.tmp $@
gofmt -s -w $@
goimports -w $@
31 changes: 26 additions & 5 deletions pkg/sql/ir/base/base.tmpl.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,18 +14,27 @@

package base

import "fmt"
import (
"fmt"
)

// node is a generic ADT node type, with slots for references to other nodes and
// enumeration values. Values that don't fit and values of other types go in the
// extra field.
type node struct {
refs [ºnumRefsPerNode]*node
enums [ºnumEnumsPerNode]enum
refs [ºnumRefsPerNode]*node
nums [ºnumNumsPerNode]numvalslot
strs [ºnumStrsPerNode]string
extra
}

type enum int32
// enum is the type to define the tag part for working copies of IR
// node with sum and enum types.
type enum uint32

// numvalslot is the type used to store integer values in persistent IR nodes.
// must be larger than or as large as enum.
type numvalslot ºnumValSlotType

type extra interface {
extraRefs() []*node
Expand Down Expand Up @@ -95,6 +104,18 @@ func (x ºEnum) String() string {
// ºStruct is the type of a reference to an immutable record.
type ºStruct struct{ ref *node }

// @for slot

const ºStruct_Slot_ºslotName_Type = ºslotType
const ºStruct_Slot_ºslotName_Num = ºslotNum
const ºStruct_Slot_ºslotName_BitSize = ºslotBitSize
const ºStruct_Slot_ºslotName_BitOffset = ºslotBitOffset
const ºStruct_Slot_ºslotName_ByteSize = ºslotByteSize
const ºStruct_Slot_ºslotName_ByteOffset = ºslotByteOffset
const ºStruct_Slot_ºslotName_ValueMask = ºslotValueMask

// @done slot

// ºStructValue is the logical type of a record. Immutable records are stored in
// nodes.
type ºStructValue struct {
Expand Down Expand Up @@ -140,7 +161,7 @@ func (x ºStruct) ºItem() ºtype { return ºgetField(x.ref) }
func (x ºStruct) V() ºStructValue {
return ºStructValue{
// @for item
ºgetField(x.ref),
ºItem: ºgetField(x.ref),
// @done item
}
}
Expand Down
16 changes: 15 additions & 1 deletion pkg/sql/ir/base/scaffolding.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@ import "bytes"
// These constants are replaced in template code.
const (
ºnumRefsPerNode = iota
ºnumEnumsPerNode
ºnumNumsPerNode
ºnumStrsPerNode
ºtag
)

Expand All @@ -39,3 +40,16 @@ func (t ºtype) FormatSExpr(buf *bytes.Buffer) {}

// FormatSExprºType is a stub for Sexpr formatters for primitive types.
func FormatSExprºType(buf *bytes.Buffer, x ºtype) {}

// ºnumValSlotType is overridden during code generation.
type ºnumValSlotType uint64

const (
ºslotType = 0
ºslotNum = 0
ºslotBitSize = 0
ºslotBitOffset = 0
ºslotByteSize = 0
ºslotByteOffset = 0
ºslotValueMask = 0
)
17 changes: 13 additions & 4 deletions pkg/sql/ir/base/sexpr.tmpl.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,25 @@ package base

import (
"bytes"
"strconv"
"fmt"
)

type SexprFormatter interface {
FormatSExpr(buf *bytes.Buffer)
}

func FormatSExprInt64(buf *bytes.Buffer, x int64) {
buf.WriteString(strconv.FormatInt(x, 10))
}
func FormatSExprBool(buf *bytes.Buffer, x bool) { fmt.Fprintf(buf, "%v", x) }
func FormatSExprInt64(buf *bytes.Buffer, x int64) { fmt.Fprintf(buf, "%v", x) }
func FormatSExprInt32(buf *bytes.Buffer, x int32) { fmt.Fprintf(buf, "%v", x) }
func FormatSExprInt16(buf *bytes.Buffer, x int16) { fmt.Fprintf(buf, "%v", x) }
func FormatSExprInt8(buf *bytes.Buffer, x int8) { fmt.Fprintf(buf, "%v", x) }
func FormatSExprUint64(buf *bytes.Buffer, x uint64) { fmt.Fprintf(buf, "%v", x) }
func FormatSExprUint32(buf *bytes.Buffer, x uint32) { fmt.Fprintf(buf, "%v", x) }
func FormatSExprUint16(buf *bytes.Buffer, x uint16) { fmt.Fprintf(buf, "%v", x) }
func FormatSExprUint8(buf *bytes.Buffer, x uint8) { fmt.Fprintf(buf, "%v", x) }
func FormatSExprFloat32(buf *bytes.Buffer, x float32) { fmt.Fprintf(buf, "%v", x) }
func FormatSExprFloat64(buf *bytes.Buffer, x float64) { fmt.Fprintf(buf, "%v", x) }
func FormatSExprString(buf *bytes.Buffer, x string) { fmt.Fprintf(buf, "%q", x) }

// @for enum

Expand Down
Loading

0 comments on commit 298beba

Please sign in to comment.