Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
57181: workload/schemachange: add backups and ndjson logging r=ajwerner a=jayshrivastava

### roachtest: save database backup on schemachange random load failure 

Previously, there was no way to inspect the state of the database
when the schemachange random load workload failed. This change
updates roachtest to save a backup in the artifacts directory
so that the database state can be recovered for debugging or
investigation.

See #56118 

Release note: None

### roachtest: add transaction log to schemachange/random-load 
Release note: None

### workload/schemachange: add json logging infrastructure 

This change adds mechanisms for writing json transaction logs
to a file while the workload is running.

Workers now use atomicLog structs to write transactions atomically
to log files or stdout. Also, each worker stores log entries
inside its internal state to make coordinating log writes easier.

A preErrorHook function is introduced and implemented
to ensure that partially completed transactions can be written
to a log when the workload terminates due to an error. All log
entries are protected by mutexes so this hook can halt
other workers while it flushes their transaction logs. This
hook can only be called once per workload and is protected
by a sync.Once struct. If "tolerate-errors" is true, then
the worker will restart and release the locks it has on other
workers.

Release note: None

### workload/schemachange: refactor logging to use new infrastructure 

Closes: #57471
Release note: None

### workload/schemachange: implement addForeignKeyConstraint op

addForeignKeyConstraint adds a single, non-composite, foreign key relation
between two random columns.

Release note: None

### workload/schemachange: implement addUniqueConstraint op

The addUniqueConstraint op adds a unique constraint on a single
column.

Release note: None

58070: eventpb: make the Timestamp field an int64 r=itsbilal a=knz

First commit from  #58066.

While exploring how to use the data in practice, I noticed that the
timestamp was encoded as a string in the JSON output, and that was
making the events hard to parse and to use. There are many
more functions that can operate on a number.

Moreover, the integer representation is just more compact.

Release note (api change):  The 'Timestamp' field of structured
notable events is now numeric, and encodes a number of nanoseconds
since the unix epoch. (Note that this API has not yet been published
in a released version of CockroachDB. The release note exists only to
track the list of relevant items for the doc project.)

58082: workload/schemachange: update sequence number initialization and getTableColumns  error handling r=ajwerner a=jayshrivastava

### workload/schemachange: update sequence number initialization

Previously, it was possible for the seqNum of operationGeneratorParams
to be initialized to an incorrect value. When the workload would start,
it would check all table names, view names, and sequence names to
determine the starting value of the sequence number. This change
updates this behavior to ensure that enum names, index names,
schema names, and column names are checked as well. This ensures
that the sequence number starts at an appropriate value each
time the workload starts. This value is important for uniqueness
assumptions when generating names throughout the workload.

Previously, failing to check the names of enums, indexes, schemas,
and columns to initialize the global sequence number would cause
seg faults as outlined in the issue below.

Closes: #58076

Release note: None

### workload/schemachange: return error when a table has no columns

Previously, unexpected syntax errors would occur if the workload
attempted to create an index on a table with no columns.
For example, the workload would generate a statements such as
`CREATE INDEX index2 ON table1 ()`, which are syntactically
invalid. This change updates the function `getTableColumns` to
return an error if a table has no columns to use when creating
an index. The workload will try to generate another operation
when it sees this error.

Release note: None


Co-authored-by: Jayant Shrivastava <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
  • Loading branch information
3 people committed Dec 21, 2020
4 parents 638c03a + 0e59c4a + 3605e9c + 4e31a6a commit d4bc228
Show file tree
Hide file tree
Showing 14 changed files with 540 additions and 228 deletions.
110 changes: 55 additions & 55 deletions docs/generated/eventlog.md

Large diffs are not rendered by default.

43 changes: 41 additions & 2 deletions pkg/cmd/roachtest/schemachange_random_load.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ import (
"context"
gosql "database/sql"
"fmt"
"path/filepath"
"strings"
)

Expand Down Expand Up @@ -123,16 +124,26 @@ func runSchemaChangeRandomLoad(ctx context.Context, t *test, c *cluster, maxOps,
c.Start(ctx, t, roachNodes)
c.Run(ctx, loadNode, "./workload init schemachange")

storeDirectory, err := c.RunWithBuffer(ctx, c.l, c.Node(1), "echo", "-n", "{store-dir}")
if err != nil {
c.l.Printf("Failed to retrieve store directory from node 1: %v\n", err.Error())
}

runCmd := []string{
"./workload run schemachange --verbose=1",
"--tolerate-errors=false",
// Save the histograms so that they can be reported to https://roachperf.crdb.dev/.
" --histograms=" + perfArtifactsDir + "/stats.json",
fmt.Sprintf("--max-ops %d", maxOps),
fmt.Sprintf("--concurrency %d", concurrency),
fmt.Sprintf("--txn-log %s", filepath.Join(string(storeDirectory), "transactions.json")),
}
t.Status("running schemachange workload")
c.Run(ctx, loadNode, runCmd...)
err = c.RunE(ctx, loadNode, runCmd...)
if err != nil {
saveArtifacts(ctx, c, string(storeDirectory))
c.t.Fatal(err)
}

// Drop the database to test the correctness of DROP DATABASE CASCADE, which
// has been a source of schema change bugs (mostly orphaned descriptors) in
Expand All @@ -148,10 +159,38 @@ func runSchemaChangeRandomLoad(ctx context.Context, t *test, c *cluster, maxOps,
t.Status("performing validation after workload")
validate(db)
t.Status("dropping database")
_, err := db.ExecContext(ctx, `USE defaultdb; DROP DATABASE schemachange CASCADE;`)
_, err = db.ExecContext(ctx, `USE defaultdb; DROP DATABASE schemachange CASCADE;`)
if err != nil {
t.Fatal(err)
}
t.Status("performing validation after dropping database")
validate(db)
}

// saveArtifacts saves important test artifacts in the artifacts directory.
func saveArtifacts(ctx context.Context, c *cluster, storeDirectory string) {
db := c.Conn(ctx, 1)

// Save a backup file called schemachange to the store directory.
_, err := db.Exec("BACKUP DATABASE schemachange to 'nodelocal://1/schemachange'")
if err != nil {
c.l.Printf("Failed execute backup command on node 1: %v\n", err.Error())
}

remoteBackupFilePath := filepath.Join(storeDirectory, "extern", "schemachange")
localBackupFilePath := filepath.Join(c.t.ArtifactsDir(), "backup")
remoteTransactionsFilePath := filepath.Join(storeDirectory, "transactions.ndjson")
localTransactionsFilePath := filepath.Join(c.t.ArtifactsDir(), "transactions.ndjson")

// Copy the backup from the store directory to the artifacts directory.
err = c.Get(ctx, c.l, remoteBackupFilePath, localBackupFilePath, c.Node(1))
if err != nil {
c.l.Printf("Failed to copy backup file from node 1 to artifacts directory: %v\n", err.Error())
}

// Copy the txn log from the store directory to the artifacts directory.
err = c.Get(ctx, c.l, remoteTransactionsFilePath, localTransactionsFilePath, c.Node(1))
if err != nil {
c.l.Printf("Failed to copy txn log file from node 1 to artifacts directory: %v\n", err.Error())
}
}
2 changes: 1 addition & 1 deletion pkg/server/node.go
Original file line number Diff line number Diff line change
Expand Up @@ -797,7 +797,7 @@ func (n *Node) recordJoinEvent(ctx context.Context) {
nodeDetails = &ev.CommonNodeEventDetails
nodeDetails.LastUp = n.startedAt
}
event.CommonDetails().Timestamp = timeutil.Now()
event.CommonDetails().Timestamp = timeutil.Now().UnixNano()
nodeDetails.StartedAt = n.startedAt
nodeDetails.NodeID = int32(n.Descriptor.NodeID)

Expand Down
2 changes: 1 addition & 1 deletion pkg/server/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -2032,7 +2032,7 @@ func (s *Server) Decommission(
} else {
panic("unexpected target membership status")
}
event.CommonDetails().Timestamp = timeutil.Now()
event.CommonDetails().Timestamp = timeutil.Now().UnixNano()
nodeDetails.RequestingNodeID = int32(s.NodeID())

for _, nodeID := range nodeIDs {
Expand Down
11 changes: 5 additions & 6 deletions pkg/sql/event_log.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ package sql
import (
"context"
"encoding/json"
"time"

"github.com/cockroachdb/cockroach/pkg/base"
"github.com/cockroachdb/cockroach/pkg/kv"
Expand All @@ -23,6 +22,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/sql/sem/tree"
"github.com/cockroachdb/cockroach/pkg/util/log"
"github.com/cockroachdb/cockroach/pkg/util/log/eventpb"
"github.com/cockroachdb/cockroach/pkg/util/timeutil"
"github.com/cockroachdb/errors"
)

Expand All @@ -49,7 +49,7 @@ func logEventInternalForSchemaChanges(
mutationID descpb.MutationID,
event eventpb.EventPayload,
) error {
event.CommonDetails().Timestamp = txn.ReadTimestamp().GoTime()
event.CommonDetails().Timestamp = txn.ReadTimestamp().WallTime
scCommon, ok := event.(eventpb.EventWithCommonSchemaChangePayload)
if !ok {
return errors.AssertionFailedf("unknown event type: %T", event)
Expand Down Expand Up @@ -84,7 +84,7 @@ func logEventInternalForSQLStatements(
event eventpb.EventPayload,
) error {
// Inject the common fields into the payload provided by the caller.
event.CommonDetails().Timestamp = txn.ReadTimestamp().GoTime()
event.CommonDetails().Timestamp = txn.ReadTimestamp().WallTime
sqlCommon, ok := event.(eventpb.EventWithCommonSQLPayload)
if !ok {
return errors.AssertionFailedf("unknown event type: %T", event)
Expand Down Expand Up @@ -135,8 +135,7 @@ func InsertEventRecord(
info.CommonDetails().EventType = eventType

// The caller is responsible for the timestamp field.
var zeroTime time.Time
if info.CommonDetails().Timestamp == zeroTime {
if info.CommonDetails().Timestamp == 0 {
return errors.AssertionFailedf("programming error: timestamp field in event not populated: %T", info)
}

Expand All @@ -160,7 +159,7 @@ VALUES(
)
`
args := []interface{}{
info.CommonDetails().Timestamp,
timeutil.Unix(0, info.CommonDetails().Timestamp),
eventType,
targetID,
reportingID,
Expand Down
6 changes: 2 additions & 4 deletions pkg/util/log/event_log.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ package log
import (
"context"
"encoding/json"
"time"

"github.com/cockroachdb/cockroach/pkg/util/log/eventpb"
"github.com/cockroachdb/cockroach/pkg/util/log/severity"
Expand All @@ -24,9 +23,8 @@ import (
func StructuredEvent(ctx context.Context, event eventpb.EventPayload) {
// Populate the missing common fields.
common := event.CommonDetails()
var zeroTime time.Time
if common.Timestamp == zeroTime {
common.Timestamp = timeutil.Now()
if common.Timestamp == 0 {
common.Timestamp = timeutil.Now().UnixNano()
}
if len(common.EventType) == 0 {
common.EventType = eventpb.GetEventTypeName(event)
Expand Down
1 change: 0 additions & 1 deletion pkg/util/log/eventpb/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,5 @@ go_library(
deps = [
"//pkg/util/log/logpb",
"//vendor/github.com/gogo/protobuf/proto",
"//vendor/github.com/gogo/protobuf/types",
],
)
95 changes: 38 additions & 57 deletions pkg/util/log/eventpb/events.pb.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 3 additions & 2 deletions pkg/util/log/eventpb/events.proto
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,9 @@ import "google/protobuf/timestamp.proto";

// CommonEventDetails contains the fields common to all events.
message CommonEventDetails {
// The timestamp of the event.
google.protobuf.Timestamp timestamp = 1 [(gogoproto.nullable) = false, (gogoproto.stdtime) = true, (gogoproto.jsontag) = ",omitempty"];
// The timestamp of the event. Expressed as nanoseconds since
// the Unix epoch.
int64 timestamp = 1 [(gogoproto.jsontag) = ",omitempty"];
// The type of the event.
string event_type = 2 [(gogoproto.jsontag) = ",omitempty"];
}
Expand Down
1 change: 1 addition & 0 deletions pkg/workload/schemachange/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ go_library(
"operation_generator.go",
"optype_string.go",
"schemachange.go",
"txstatus_string.go",
"type_resolver.go",
],
importpath = "github.com/cockroachdb/cockroach/pkg/workload/schemachange",
Expand Down
2 changes: 1 addition & 1 deletion pkg/workload/schemachange/error_code_set.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ func (set errorCodeSet) contains(code pgcode.Code) bool {
return ok
}

func (set errorCodeSet) string() string {
func (set errorCodeSet) String() string {
var codes []string
for code := range set {
codes = append(codes, code.String())
Expand Down
Loading

0 comments on commit d4bc228

Please sign in to comment.