Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
97697: tsearch: add ts_rank functionality r=jordanlewis a=jordanlewis

Updates: #41288
Epic: CRDB-22357
All but the last commit are from #92966, #97677, and #97685

    This commit adds ts_rank, the family of builtins that allow ranking of
    text search results. The function takes a tsquery and a tsvector and
    returns a float that indicates how good the match is. The function can
    be modified by passing in a custom array of weights that matches the
    text search weights A, B, C, and D, and a bitmask that controls the
    ranking behavior in various detailed ways.

    See the excellent Postgres documentation here for details:
    https://www.postgresql.org/docs/current/textsearch-controls.html

    Release note (sql change): add the ts_rank function for ranking text
    search query results

98776: storage: unify storage/fs.FS and pebble/vfs.FS r=jbowens a=jbowens

The storage/fs.FS had largely the same interface as vfs.FS. The storage/fs.FS interface was intended as a temporary stepping stone to using pebble's vfs.FS interface throughout Cockroach for all filesystem access. This commit unifies the two.

Epic: None
Release note: None

99114: kvserver: fix and unskip TestCheckConsistencyInconsistent r=erikgrinaker a=pavelkalinnikov

This PR unskips `TestCheckConsistencyInconsistent` which was skipped for a reason that no longer holds.

It also fixes the race possible in `TestCheckConsistencyInconsistent`:
- Node 2 is corrupted.
- The second phase of `runConsistency` check times out on node 1, and returns early when only nodes 2 and 3 have created the storage checkpoint.
- Node 1 haven't created the checkpoint, but has started doing so.
- Node 2 "fatals" (mocked out in the test) shortly after the check is complete.
- Node 1 is still creating its checkpoint, but has probably created the directory by now.
- Hence, the test assumes that the checkpoint has been created, and proceeds to open it and compute the checksum of the range.

The test now waits for the moment when all the checkpoint are known to be fully populated.

Fixes #81819
Epic: none
Release note: none

Co-authored-by: Jordan Lewis <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
Co-authored-by: Pavel Kalinnikov <[email protected]>
  • Loading branch information
4 people committed Mar 22, 2023
4 parents 4be5775 + f1ce8cf + def09e0 + 6372b6e commit f592975
Show file tree
Hide file tree
Showing 36 changed files with 636 additions and 223 deletions.
10 changes: 9 additions & 1 deletion docs/generated/sql/functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -905,7 +905,15 @@ available replica will error.</p>
<tr><td><a name="to_tsvector"></a><code>to_tsvector(text: <a href="string.html">string</a>) &rarr; tsvector</code></td><td><span class="funcdesc"><p>Converts text to a tsvector, normalizing words according to the default configuration. Position information is included in the result.</p>
</span></td><td>Stable</td></tr>
<tr><td><a name="ts_parse"></a><code>ts_parse(parser_name: <a href="string.html">string</a>, document: <a href="string.html">string</a>) &rarr; tuple{int AS tokid, string AS token}</code></td><td><span class="funcdesc"><p>ts_parse parses the given document and returns a series of records, one for each token produced by parsing. Each record includes a tokid showing the assigned token type and a token which is the text of the token.</p>
</span></td><td>Stable</td></tr></tbody>
</span></td><td>Stable</td></tr>
<tr><td><a name="ts_rank"></a><code>ts_rank(vector: tsvector, query: tsquery) &rarr; float4</code></td><td><span class="funcdesc"><p>Ranks vectors based on the frequency of their matching lexemes.</p>
</span></td><td>Immutable</td></tr>
<tr><td><a name="ts_rank"></a><code>ts_rank(vector: tsvector, query: tsquery, normalization: <a href="int.html">int</a>) &rarr; float4</code></td><td><span class="funcdesc"><p>Ranks vectors based on the frequency of their matching lexemes.</p>
</span></td><td>Immutable</td></tr>
<tr><td><a name="ts_rank"></a><code>ts_rank(weights: <a href="float.html">float</a>[], vector: tsvector, query: tsquery) &rarr; float4</code></td><td><span class="funcdesc"><p>Ranks vectors based on the frequency of their matching lexemes.</p>
</span></td><td>Immutable</td></tr>
<tr><td><a name="ts_rank"></a><code>ts_rank(weights: <a href="float.html">float</a>[], vector: tsvector, query: tsquery, normalization: <a href="int.html">int</a>) &rarr; float4</code></td><td><span class="funcdesc"><p>Ranks vectors based on the frequency of their matching lexemes.</p>
</span></td><td>Immutable</td></tr></tbody>
</table>

### Fuzzy String Matching functions
Expand Down
2 changes: 2 additions & 0 deletions pkg/ccl/cliccl/ear_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ import (
"crypto/rand"
"fmt"
"path/filepath"
"sort"
"strings"
"testing"

Expand Down Expand Up @@ -56,6 +57,7 @@ func TestDecrypt(t *testing.T) {
// Find a manifest file to check.
files, err := p.List(dir)
require.NoError(t, err)
sort.Strings(files)
var manifestPath string
for _, basename := range files {
if strings.HasPrefix(basename, "MANIFEST-") {
Expand Down
1 change: 1 addition & 0 deletions pkg/kv/kvserver/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,7 @@ go_library(
"@com_github_cockroachdb_logtags//:logtags",
"@com_github_cockroachdb_pebble//:pebble",
"@com_github_cockroachdb_pebble//objstorage",
"@com_github_cockroachdb_pebble//vfs",
"@com_github_cockroachdb_redact//:redact",
"@com_github_gogo_protobuf//proto",
"@com_github_google_btree//:btree",
Expand Down
3 changes: 2 additions & 1 deletion pkg/kv/kvserver/client_replica_gc_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ package kvserver_test

import (
"context"
"os"
"path/filepath"
"testing"
"time"
Expand Down Expand Up @@ -112,7 +113,7 @@ func TestReplicaGCQueueDropReplicaDirect(t *testing.T) {
if dir == "" {
t.Fatal("no sideloaded directory")
}
if err := eng.MkdirAll(dir); err != nil {
if err := eng.MkdirAll(dir, os.ModePerm); err != nil {
t.Fatal(err)
}
if err := fs.WriteFile(eng, filepath.Join(dir, "i1000000.t100000"), []byte("foo")); err != nil {
Expand Down
45 changes: 23 additions & 22 deletions pkg/kv/kvserver/consistency_queue_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,6 @@ import (
"github.com/cockroachdb/cockroach/pkg/storage"
"github.com/cockroachdb/cockroach/pkg/storage/enginepb"
"github.com/cockroachdb/cockroach/pkg/testutils"
"github.com/cockroachdb/cockroach/pkg/testutils/skip"
"github.com/cockroachdb/cockroach/pkg/testutils/testcluster"
"github.com/cockroachdb/cockroach/pkg/util/hlc"
"github.com/cockroachdb/cockroach/pkg/util/leaktest"
Expand Down Expand Up @@ -243,14 +242,6 @@ func TestCheckConsistencyInconsistent(t *testing.T) {
defer leaktest.AfterTest(t)()
defer log.Scope(t).Close(t)

// TODO(pavelkalinnikov): not if we remove TestingSetRedactable below?
skip.UnderRaceWithIssue(t, 81819, "slow test, and TestingSetRedactable triggers race detector")

// This test prints a consistency checker diff, so it's
// good to make sure we're overly redacting said diff.
// TODO(pavelkalinnikov): remove this since we don't print diffs anymore?
defer log.TestingSetRedactable(true)()

// Test expects simple MVCC value encoding.
storage.DisableMetamorphicSimpleValueEncoding(t)

Expand Down Expand Up @@ -347,12 +338,12 @@ func TestCheckConsistencyInconsistent(t *testing.T) {
assert.Empty(t, onDiskCheckpointPaths(i))
}

// Write some arbitrary data only to store 1. Inconsistent key "e"!
store1 := tc.GetFirstStoreFromServer(t, 1)
// Write some arbitrary data only to store on n2. Inconsistent key "e"!
s2 := tc.GetFirstStoreFromServer(t, 1)
var val roachpb.Value
val.SetInt(42)
// Put an inconsistent key "e" to s2, and have s1 and s3 still agree.
require.NoError(t, storage.MVCCPut(context.Background(), store1.TODOEngine(), nil,
require.NoError(t, storage.MVCCPut(context.Background(), s2.TODOEngine(), nil,
roachpb.Key("e"), tc.Server(0).Clock().Now(), hlc.ClockTimestamp{}, val, nil))

// Run consistency check again, this time it should find something.
Expand All @@ -368,22 +359,32 @@ func TestCheckConsistencyInconsistent(t *testing.T) {
assert.Contains(t, resp.Result[0].Detail, `[minority]`)
assert.Contains(t, resp.Result[0].Detail, `stats`)

// Checkpoints should have been created on all stores.
hashes := make([][]byte, numStores)
// Make sure that all the stores started creating a checkpoint. The metric
// measures the number of checkpoint directories, but a directory can
// represent an incomplete checkpoint that is still being populated.
for i := 0; i < numStores; i++ {
cps := onDiskCheckpointPaths(i)
require.Len(t, cps, 1)
t.Logf("found a checkpoint at %s", cps[0])
// The checkpoint must have been finalized.
require.False(t, strings.HasSuffix(cps[0], "_pending"))

metric := tc.GetFirstStoreFromServer(t, i).Metrics().RdbCheckpoints
testutils.SucceedsSoon(t, func() error {
if got, want := metric.Value(), int64(1); got != want {
return errors.Errorf("%s is %d, want %d", metric.Name, got, want)
}
return nil
})
}
// As discussed in https://github.com/cockroachdb/cockroach/issues/81819, it
// is possible that the check completes while there are still checkpoints in
// flight. Waiting for the server termination makes sure that checkpoints are
// fully created.
tc.Stopper().Stop(context.Background())

// Checkpoints should have been created on all stores.
hashes := make([][]byte, numStores)
for i := 0; i < numStores; i++ {
cps := onDiskCheckpointPaths(i)
require.Len(t, cps, 1)
t.Logf("found a checkpoint at %s", cps[0])
// The checkpoint must have been finalized.
require.False(t, strings.HasSuffix(cps[0], "_pending"))

// Create a new store on top of checkpoint location inside existing in-mem
// VFS to verify its contents.
Expand Down Expand Up @@ -414,8 +415,8 @@ func TestCheckConsistencyInconsistent(t *testing.T) {
assert.Equal(t, hashes[0], hashes[2]) // s1 and s3 agree
assert.NotEqual(t, hashes[0], hashes[1]) // s2 diverged

// A death rattle should have been written on s2 (store index 1).
eng := store1.TODOEngine()
// A death rattle should have been written on s2.
eng := s2.TODOEngine()
f, err := eng.Open(base.PreventedStartupFile(eng.GetAuxiliaryDir()))
require.NoError(t, err)
b, err := io.ReadAll(f)
Expand Down
2 changes: 1 addition & 1 deletion pkg/kv/kvserver/kvserverbase/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ go_library(
"//pkg/roachpb",
"//pkg/settings",
"//pkg/settings/cluster",
"//pkg/storage/fs",
"//pkg/util/errorutil",
"//pkg/util/hlc",
"//pkg/util/humanizeutil",
Expand All @@ -30,6 +29,7 @@ go_library(
"//pkg/util/timeutil",
"//pkg/util/tracing/tracingpb",
"@com_github_cockroachdb_errors//:errors",
"@com_github_cockroachdb_pebble//vfs",
"@com_github_cockroachdb_redact//:redact",
"@org_golang_x_time//rate",
],
Expand Down
4 changes: 2 additions & 2 deletions pkg/kv/kvserver/kvserverbase/syncing_write.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@ import (

"github.com/cockroachdb/cockroach/pkg/settings"
"github.com/cockroachdb/cockroach/pkg/settings/cluster"
"github.com/cockroachdb/cockroach/pkg/storage/fs"
"github.com/cockroachdb/cockroach/pkg/util/log"
"github.com/cockroachdb/cockroach/pkg/util/timeutil"
"github.com/cockroachdb/errors"
"github.com/cockroachdb/pebble/vfs"
"golang.org/x/time/rate"
)

Expand Down Expand Up @@ -82,7 +82,7 @@ func WriteFileSyncing(
ctx context.Context,
filename string,
data []byte,
fs fs.FS,
fs vfs.FS,
perm os.FileMode,
settings *cluster.Settings,
limiter *rate.Limiter,
Expand Down
3 changes: 2 additions & 1 deletion pkg/kv/kvserver/logstore/sideload_disk.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ package logstore
import (
"context"
"fmt"
"os"
"path/filepath"
"strconv"
"strings"
Expand Down Expand Up @@ -75,7 +76,7 @@ func NewDiskSideloadStorage(
}

func (ss *DiskSideloadStorage) createDir() error {
err := ss.eng.MkdirAll(ss.dir)
err := ss.eng.MkdirAll(ss.dir, os.ModePerm)
ss.dirCreated = ss.dirCreated || err == nil
return err
}
Expand Down
3 changes: 2 additions & 1 deletion pkg/kv/kvserver/replica_consistency.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import (
"crypto/sha512"
"encoding/binary"
"fmt"
"os"
"sync"
"time"

Expand Down Expand Up @@ -741,7 +742,7 @@ func (r *Replica) computeChecksumPostApply(
// certain of completing the check. Since we're already in a goroutine
// that's about to end, just sleep for a few seconds and then terminate.
auxDir := r.store.TODOEngine().GetAuxiliaryDir()
_ = r.store.TODOEngine().MkdirAll(auxDir)
_ = r.store.TODOEngine().MkdirAll(auxDir, os.ModePerm)
path := base.PreventedStartupFile(auxDir)

const attentionFmt = `ATTENTION:
Expand Down
3 changes: 2 additions & 1 deletion pkg/kv/kvserver/replica_corruption.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ package kvserver
import (
"context"
"fmt"
"os"

"github.com/cockroachdb/cockroach/pkg/base"
"github.com/cockroachdb/cockroach/pkg/kv/kvpb"
Expand Down Expand Up @@ -50,7 +51,7 @@ func (r *Replica) setCorruptRaftMuLocked(
r.mu.destroyStatus.Set(cErr, destroyReasonRemoved)

auxDir := r.store.TODOEngine().GetAuxiliaryDir()
_ = r.store.TODOEngine().MkdirAll(auxDir)
_ = r.store.TODOEngine().MkdirAll(auxDir, os.ModePerm)
path := base.PreventedStartupFile(auxDir)

preventStartupMsg := fmt.Sprintf(`ATTENTION:
Expand Down
3 changes: 2 additions & 1 deletion pkg/kv/kvserver/replica_proposal.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ package kvserver

import (
"context"
"os"
"path/filepath"
"time"
"unsafe"
Expand Down Expand Up @@ -561,7 +562,7 @@ func addSSTablePreApply(

// TODO(tschottdorf): remove this once sideloaded storage guarantees its
// existence.
if err := eng.MkdirAll(filepath.Dir(ingestPath)); err != nil {
if err := eng.MkdirAll(filepath.Dir(ingestPath), os.ModePerm); err != nil {
panic(err)
}
if _, err := eng.Stat(ingestPath); err == nil {
Expand Down
8 changes: 5 additions & 3 deletions pkg/kv/kvserver/replica_sst_snapshot_storage.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ package kvserver
import (
"context"
"fmt"
"os"
"path/filepath"
"strconv"

Expand All @@ -24,6 +25,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/util/uuid"
"github.com/cockroachdb/errors"
"github.com/cockroachdb/pebble/objstorage"
"github.com/cockroachdb/pebble/vfs"
"golang.org/x/time/rate"
)

Expand Down Expand Up @@ -113,7 +115,7 @@ func (s *SSTSnapshotStorageScratch) filename(id int) string {
}

func (s *SSTSnapshotStorageScratch) createDir() error {
err := s.storage.engine.MkdirAll(s.snapDir)
err := s.storage.engine.MkdirAll(s.snapDir, os.ModePerm)
s.dirCreated = s.dirCreated || err == nil
return err
}
Expand Down Expand Up @@ -182,7 +184,7 @@ func (s *SSTSnapshotStorageScratch) Close() error {
type SSTSnapshotStorageFile struct {
scratch *SSTSnapshotStorageScratch
created bool
file fs.File
file vfs.File
filename string
ctx context.Context
bytesPerSync int64
Expand All @@ -207,7 +209,7 @@ func (f *SSTSnapshotStorageFile) ensureFile() error {
}
var err error
if f.bytesPerSync > 0 {
f.file, err = f.scratch.storage.engine.CreateWithSync(f.filename, int(f.bytesPerSync))
f.file, err = fs.CreateWithSync(f.scratch.storage.engine, f.filename, int(f.bytesPerSync))
} else {
f.file, err = f.scratch.storage.engine.Create(f.filename)
}
Expand Down
3 changes: 2 additions & 1 deletion pkg/kv/kvserver/store.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import (
"context"
"fmt"
"math"
"os"
"path/filepath"
"runtime"
"sort"
Expand Down Expand Up @@ -3083,7 +3084,7 @@ func (s *Store) checkpointSpans(desc *roachpb.RangeDescriptor) []roachpb.Span {
// the provided key spans. If spans is empty, it includes the entire store.
func (s *Store) checkpoint(tag string, spans []roachpb.Span) (string, error) {
checkpointBase := s.checkpointsDir()
_ = s.TODOEngine().MkdirAll(checkpointBase)
_ = s.TODOEngine().MkdirAll(checkpointBase, os.ModePerm)
// Create the checkpoint in a "pending" directory first. If we fail midway, it
// should be clear that the directory contains an incomplete checkpoint.
pendingDir := filepath.Join(checkpointBase, tag+"_pending")
Expand Down
3 changes: 2 additions & 1 deletion pkg/sql/colcontainer/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ go_library(
"//pkg/util/mon",
"//pkg/util/uuid",
"@com_github_cockroachdb_errors//:errors",
"@com_github_cockroachdb_pebble//vfs",
"@com_github_golang_snappy//:snappy",
"@com_github_marusama_semaphore//:semaphore",
],
Expand All @@ -46,14 +47,14 @@ go_test(
"//pkg/sql/pgwire/pgcode",
"//pkg/sql/pgwire/pgerror",
"//pkg/sql/types",
"//pkg/storage/fs",
"//pkg/testutils/colcontainerutils",
"//pkg/testutils/skip",
"//pkg/util/humanizeutil",
"//pkg/util/leaktest",
"//pkg/util/log",
"//pkg/util/mon",
"//pkg/util/randutil",
"@com_github_cockroachdb_pebble//vfs",
"@com_github_marusama_semaphore//:semaphore",
"@com_github_stretchr_testify//require",
],
Expand Down
Loading

0 comments on commit f592975

Please sign in to comment.