Skip to content

Commit

Permalink
Add intern.Table.Query (#385)
Browse files Browse the repository at this point in the history
Suppose we have `m map[intern.ID]T` and `s string`. We want to query if
this string is in the map. We could write `m[table.Intern(s)]`, but if
`s` is not already interned, this is quite wasteful! Interning a
never-before-seen string costs three map hits, plus the hit to `m`.
However, if `s` is not interned, it cannot possibly be in `m`, so we can
cut the cost of every lookup to one hit in `Query` and one in `m[id]`.

In other words, this lets us write this function:

```go
func get(tab intern.Table, m map[intern.ID]T, s string) (T, bool) {
  id, ok := tab.Query(s)
  if !ok { return T{}, false }
  
  v, ok := m[id]
  return v, ok
}
```

char6-inlined IDs are treated as always being interned, which means that
for very small IDs, we only have to hit one map: the lookup in `m[id]`.
  • Loading branch information
mcy authored Dec 10, 2024
1 parent 974ba35 commit 7a4632c
Showing 1 changed file with 28 additions and 10 deletions.
38 changes: 28 additions & 10 deletions internal/intern/intern.go
Original file line number Diff line number Diff line change
Expand Up @@ -76,23 +76,41 @@ type Table struct {
//
// This function may be called by multiple goroutines concurrently.
func (t *Table) Intern(s string) ID {
if char6, ok := encodeChar6(s); ok {
return char6
}

// Fast path for strings that have already been interned. In the common case
// all strings are interned, so we can take a read lock to avoid needing
// to trap to the scheduler on concurrent access (all calls to Intern() will
// still contend mu.readCount, because RLock atomically increments it).
if id, ok := t.Query(s); ok {
return id
}

// Outline the fallback for when we haven't interned, to promote inlining
// of Intern().
return t.internSlow(s)
}

// Query will query whether s has already been interned.
//
// If s has never been interned, returns false. This is useful for e.g. querying
// an intern-keyed map using a string: a failed query indicates that the string
// has never been seen before, so searching the map will be futile.
//
// If s is small enough to be inlined in an ID, it is treated as always being
// interned.
func (t *Table) Query(s string) (ID, bool) {
if char6, ok := encodeChar6(s); ok {
// This also handles s == "".
return char6, true
}

t.mu.RLock()
id, ok := t.index[s]
t.mu.RUnlock()
if ok {
// We never delete from this map, so if we see ok here, that cannot be
// changed by another goroutine.
return id
}

return id, ok
}

func (t *Table) internSlow(s string) ID {
// Intern tables are expected to be long-lived. Avoid holding onto a larger
// buffer that s is an internal pointer to by cloning it.
s = strings.Clone(s)
Expand All @@ -116,7 +134,7 @@ func (t *Table) Intern(s string) ID {
t.table = append(t.table, s)

// The first ID will have value 1. ID 0 is reserved for "".
id = ID(len(t.table))
id := ID(len(t.table))
if id < 0 {
panic(fmt.Sprintf("internal/intern: %d interning IDs exhausted", len(t.table)))
}
Expand Down

0 comments on commit 7a4632c

Please sign in to comment.