Skip to content

Commit

Permalink
Merge #66640
Browse files Browse the repository at this point in the history
66640: sql: introduce crdb_internal.index_usage_stats virtual table r=Azhng a=Azhng

This commit introduce crdb_internal.index_usage_stats virtual
table that is backed by new clusterindexusagestats package. This
new package implements a variant of the indexusagestats interface
and serves the data by issuing cluster RPC fanout.

Release note (sql change): introduce crdb_internal.index_usage_statistics
virtual table to surface index usage statistics.
sql.metrics.index_usage_stats.enabled cluster setting can be used to
turn on/off the subsystem. It is default to true.
sql.metrics.index_usage_stats.reset_interval can change the reset
interval of the collected statistics. It is default to 1 hour.

Addresses #64740

Followup to #66639

Co-authored-by: Azhng <[email protected]>
  • Loading branch information
craig[bot] and Azhng committed Aug 5, 2021
2 parents ca77dc5 + 47459d6 commit d33af93
Show file tree
Hide file tree
Showing 20 changed files with 1,723 additions and 1,589 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -1725,5 +1725,5 @@ let $null_table_id
SELECT table_id FROM crdb_internal.tables WHERE schema_name = 'crdb_internal' AND name = 'tables'

# Validate that builtin errors if called on a table id
query error pq: crdb_internal\.reset_multi_region_zone_configs_for_database\(\): database "\[4294967249\]" does not exist
query error pq: crdb_internal\.reset_multi_region_zone_configs_for_database\(\): database "\[4294967248\]" does not exist
SELECT crdb_internal.reset_multi_region_zone_configs_for_database($null_table_id)
1 change: 1 addition & 0 deletions pkg/cli/testdata/zip/partial1
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ debug zip --concurrency=1 --cpu-profile-duration=0s /dev/null
[cluster] retrieving SQL data for crdb_internal.partitions... writing output: debug/crdb_internal.partitions.txt... done
[cluster] retrieving SQL data for crdb_internal.zones... writing output: debug/crdb_internal.zones.txt... done
[cluster] retrieving SQL data for crdb_internal.invalid_objects... writing output: debug/crdb_internal.invalid_objects.txt... done
[cluster] retrieving SQL data for crdb_internal.index_usage_statistics... writing output: debug/crdb_internal.index_usage_statistics.txt... done
[cluster] requesting nodes... received response... converting to JSON... writing binary output: debug/nodes.json... done
[cluster] requesting liveness... received response... converting to JSON... writing binary output: debug/liveness.json... done
[node 1] node status... converting to JSON... writing binary output: debug/nodes/1/status.json... done
Expand Down
1 change: 1 addition & 0 deletions pkg/cli/testdata/zip/partial1_excluded
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ debug zip /dev/null --concurrency=1 --exclude-nodes=2 --cpu-profile-duration=0
[cluster] retrieving SQL data for crdb_internal.partitions... writing output: debug/crdb_internal.partitions.txt... done
[cluster] retrieving SQL data for crdb_internal.zones... writing output: debug/crdb_internal.zones.txt... done
[cluster] retrieving SQL data for crdb_internal.invalid_objects... writing output: debug/crdb_internal.invalid_objects.txt... done
[cluster] retrieving SQL data for crdb_internal.index_usage_statistics... writing output: debug/crdb_internal.index_usage_statistics.txt... done
[cluster] requesting nodes... received response... converting to JSON... writing binary output: debug/nodes.json... done
[cluster] requesting liveness... received response... converting to JSON... writing binary output: debug/liveness.json... done
[node 1] node status... converting to JSON... writing binary output: debug/nodes/1/status.json... done
Expand Down
1 change: 1 addition & 0 deletions pkg/cli/testdata/zip/partial2
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ debug zip --concurrency=1 --cpu-profile-duration=0 /dev/null
[cluster] retrieving SQL data for crdb_internal.partitions... writing output: debug/crdb_internal.partitions.txt... done
[cluster] retrieving SQL data for crdb_internal.zones... writing output: debug/crdb_internal.zones.txt... done
[cluster] retrieving SQL data for crdb_internal.invalid_objects... writing output: debug/crdb_internal.invalid_objects.txt... done
[cluster] retrieving SQL data for crdb_internal.index_usage_statistics... writing output: debug/crdb_internal.index_usage_statistics.txt... done
[cluster] requesting nodes... received response... converting to JSON... writing binary output: debug/nodes.json... done
[cluster] requesting liveness... received response... converting to JSON... writing binary output: debug/liveness.json... done
[node 1] node status... converting to JSON... writing binary output: debug/nodes/1/status.json... done
Expand Down
1 change: 1 addition & 0 deletions pkg/cli/testdata/zip/testzip
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ debug zip --concurrency=1 --cpu-profile-duration=1s /dev/null
[cluster] retrieving SQL data for crdb_internal.partitions... writing output: debug/crdb_internal.partitions.txt... done
[cluster] retrieving SQL data for crdb_internal.zones... writing output: debug/crdb_internal.zones.txt... done
[cluster] retrieving SQL data for crdb_internal.invalid_objects... writing output: debug/crdb_internal.invalid_objects.txt... done
[cluster] retrieving SQL data for crdb_internal.index_usage_statistics... writing output: debug/crdb_internal.index_usage_statistics.txt... done
[cluster] requesting nodes... received response... converting to JSON... writing binary output: debug/nodes.json... done
[cluster] requesting liveness... received response... converting to JSON... writing binary output: debug/liveness.json... done
[cluster] requesting CPU profiles
Expand Down
3 changes: 3 additions & 0 deletions pkg/cli/testdata/zip/testzip_concurrent
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,9 @@ zip
[cluster] retrieving SQL data for crdb_internal.default_privileges...
[cluster] retrieving SQL data for crdb_internal.default_privileges: done
[cluster] retrieving SQL data for crdb_internal.default_privileges: writing output: debug/crdb_internal.default_privileges.txt...
[cluster] retrieving SQL data for crdb_internal.index_usage_statistics...
[cluster] retrieving SQL data for crdb_internal.index_usage_statistics: done
[cluster] retrieving SQL data for crdb_internal.index_usage_statistics: writing output: debug/crdb_internal.index_usage_statistics.txt...
[cluster] retrieving SQL data for crdb_internal.invalid_objects...
[cluster] retrieving SQL data for crdb_internal.invalid_objects: done
[cluster] retrieving SQL data for crdb_internal.invalid_objects: writing output: debug/crdb_internal.invalid_objects.txt...
Expand Down
1 change: 1 addition & 0 deletions pkg/cli/zip_cluster_wide.go
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ var debugZipTablesPerCluster = []string{
"crdb_internal.partitions",
"crdb_internal.zones",
"crdb_internal.invalid_objects",
"crdb_internal.index_usage_statistics",
}

// collectClusterData runs the data collection that only needs to
Expand Down
12 changes: 6 additions & 6 deletions pkg/server/index_usage_stats_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -220,24 +220,24 @@ func TestStatusAPIIndexUsage(t *testing.T) {
}, /* expectedKeys */ 4 /* expectedEventCnt*/, 5*time.Second /* timeout */)

// First node should have nothing.
stats := firstLocalStatsReader.Get(indexKeyA)
stats := firstLocalStatsReader.Get(indexKeyA.TableID, indexKeyA.IndexID)
require.Equal(t, roachpb.IndexUsageStatistics{}, stats, "expecting empty stats on node 1, but found %v", stats)

stats = firstLocalStatsReader.Get(indexKeyB)
stats = firstLocalStatsReader.Get(indexKeyB.TableID, indexKeyB.IndexID)
require.Equal(t, roachpb.IndexUsageStatistics{}, stats, "expecting empty stats on node 1, but found %v", stats)

// Third node should have nothing.
stats = thirdLocalStatsReader.Get(indexKeyA)
stats = thirdLocalStatsReader.Get(indexKeyA.TableID, indexKeyA.IndexID)
require.Equal(t, roachpb.IndexUsageStatistics{}, stats, "expecting empty stats on node 3, but found %v", stats)

stats = thirdLocalStatsReader.Get(indexKeyB)
stats = thirdLocalStatsReader.Get(indexKeyB.TableID, indexKeyB.IndexID)
require.Equal(t, roachpb.IndexUsageStatistics{}, stats, "expecting empty stats on node 1, but found %v", stats)

// Second server should have nonempty local storage.
stats = secondLocalStatsReader.Get(indexKeyA)
stats = secondLocalStatsReader.Get(indexKeyA.TableID, indexKeyA.IndexID)
compareStatsHelper(t, expectedStatsIndexA, stats, time.Minute)

stats = secondLocalStatsReader.Get(indexKeyB)
stats = secondLocalStatsReader.Get(indexKeyB.TableID, indexKeyB.IndexID)
compareStatsHelper(t, expectedStatsIndexB, stats, time.Minute)

// Test cluster-wide RPC.
Expand Down
1 change: 1 addition & 0 deletions pkg/sql/catalog/catconstants/constants.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ const (
CrdbInternalGossipLivenessTableID
CrdbInternalGossipNetworkTableID
CrdbInternalIndexColumnsTableID
CrdbInternalIndexUsageStatisticsTableID
CrdbInternalInflightTraceSpanTableID
CrdbInternalJobsTableID
CrdbInternalKVNodeStatusTableID
Expand Down
56 changes: 56 additions & 0 deletions pkg/sql/crdb_internal.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/sql/catalog/descpb"
"github.com/cockroachdb/cockroach/pkg/sql/catalog/schemaexpr"
"github.com/cockroachdb/cockroach/pkg/sql/catalog/tabledesc"
"github.com/cockroachdb/cockroach/pkg/sql/idxusage"
"github.com/cockroachdb/cockroach/pkg/sql/pgwire/pgcode"
"github.com/cockroachdb/cockroach/pkg/sql/pgwire/pgerror"
"github.com/cockroachdb/cockroach/pkg/sql/privilege"
Expand Down Expand Up @@ -110,6 +111,7 @@ var crdbInternal = virtualSchema{
catconstants.CrdbInternalGossipLivenessTableID: crdbInternalGossipLivenessTable,
catconstants.CrdbInternalGossipNetworkTableID: crdbInternalGossipNetworkTable,
catconstants.CrdbInternalIndexColumnsTableID: crdbInternalIndexColumnsTable,
catconstants.CrdbInternalIndexUsageStatisticsTableID: crdbInternalIndexUsageStatistics,
catconstants.CrdbInternalInflightTraceSpanTableID: crdbInternalInflightTraceSpanTable,
catconstants.CrdbInternalJobsTableID: crdbInternalJobsTable,
catconstants.CrdbInternalKVNodeStatusTableID: crdbInternalKVNodeStatusTable,
Expand Down Expand Up @@ -4883,3 +4885,57 @@ CREATE TABLE crdb_internal.default_privileges (
})
},
}

var crdbInternalIndexUsageStatistics = virtualSchemaTable{
comment: `cluster-wide index usage statistics (in-memory, not durable).` +
`Querying this table is an expensive operation since it creates a` +
`cluster-wide RPC fanout.`,
schema: `
CREATE TABLE crdb_internal.index_usage_statistics (
table_id INT NOT NULL,
index_id INT NOT NULL,
total_reads INT NOT NULL,
last_read TIMESTAMPTZ
);`,
generator: func(ctx context.Context, p *planner, dbContext catalog.DatabaseDescriptor, stopper *stop.Stopper) (virtualTableGenerator, cleanupFunc, error) {
// Perform RPC Fanout.
stats, err :=
p.extendedEvalCtx.SQLStatusServer.IndexUsageStatistics(ctx, &serverpb.IndexUsageStatisticsRequest{})
if err != nil {
return nil, nil, err
}
indexStats := idxusage.NewLocalIndexUsageStatsFromExistingStats(&idxusage.Config{}, stats.Statistics)

row := make(tree.Datums, 4 /* number of columns for this virtual table */)
worker := func(pusher rowPusher) error {
return forEachTableDescAll(ctx, p, dbContext, hideVirtual,
func(db catalog.DatabaseDescriptor, _ string, table catalog.TableDescriptor) error {
tableID := table.GetID()
return catalog.ForEachIndex(table, catalog.IndexOpts{}, func(idx catalog.Index) error {
indexID := idx.GetID()
stats := indexStats.Get(roachpb.TableID(tableID), roachpb.IndexID(indexID))

lastScanTs := tree.DNull
if !stats.LastRead.IsZero() {
lastScanTs, err = tree.MakeDTimestampTZ(stats.LastRead, time.Nanosecond)
if err != nil {
return err
}
}

row = row[:0]

row = append(row,
tree.NewDInt(tree.DInt(tableID)), // tableID
tree.NewDInt(tree.DInt(indexID)), // indexID
tree.NewDInt(tree.DInt(stats.TotalReadCount)), // total_reads
lastScanTs, // last_scan
)

return pusher.pushRow(row...)
})
})
}
return setupGenerator(ctx, worker, stopper)
},
}
85 changes: 64 additions & 21 deletions pkg/sql/idxusage/local_idx_usage_stats.go
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,20 @@ func NewLocalIndexUsageStats(cfg *Config) *LocalIndexUsageStats {
return is
}

// NewLocalIndexUsageStatsFromExistingStats returns a new instance of
// LocalIndexUsageStats that is populated using given
// []roachpb.CollectedIndexUsageStatistics. This constructor can be used to
// quickly aggregate the index usage statistics received from the RPC fanout
// and it is more efficient than the regular insert path because it performs
// insert without taking the RWMutex lock.
func NewLocalIndexUsageStatsFromExistingStats(
cfg *Config, stats []roachpb.CollectedIndexUsageStatistics,
) *LocalIndexUsageStats {
s := NewLocalIndexUsageStats(cfg)
s.batchInsertUnsafe(stats)
return s
}

// Start starts the background goroutine that is responsible for collecting
// index usage statistics.
func (s *LocalIndexUsageStats) Start(ctx context.Context, stopper *stop.Stopper) {
Expand Down Expand Up @@ -159,11 +173,13 @@ func (s *LocalIndexUsageStats) record(ctx context.Context, payload indexUse) {
}

// Get returns the index usage statistics for a given key.
func (s *LocalIndexUsageStats) Get(key roachpb.IndexUsageKey) roachpb.IndexUsageStatistics {
func (s *LocalIndexUsageStats) Get(
tableID roachpb.TableID, indexID roachpb.IndexID,
) roachpb.IndexUsageStatistics {
s.mu.RLock()
defer s.mu.RUnlock()

table, ok := s.mu.usageStats[key.TableID]
table, ok := s.mu.usageStats[tableID]
if !ok {
// We return a copy of the empty stats.
emptyStats := emptyIndexUsageStats
Expand All @@ -173,7 +189,7 @@ func (s *LocalIndexUsageStats) Get(key roachpb.IndexUsageKey) roachpb.IndexUsage
table.RLock()
defer table.RUnlock()

indexStats, ok := table.stats[key.IndexID]
indexStats, ok := table.stats[indexID]
if !ok {
emptyStats := emptyIndexUsageStats
return emptyStats
Expand Down Expand Up @@ -209,7 +225,7 @@ func (s *LocalIndexUsageStats) ForEach(options IteratorOptions, visitor StatsVis
s.mu.RUnlock()

for _, tableID := range tableIDLists {
tableIdxStats := s.getStatsForTableID(tableID, false /* createIfNotExists */)
tableIdxStats := s.getStatsForTableID(tableID, false /* createIfNotExists */, false /* unsafe */)

// This means the data s being cleared before we can fetch it. It's not an
// error, so we simply just skip over it.
Expand All @@ -231,6 +247,20 @@ func (s *LocalIndexUsageStats) ForEach(options IteratorOptions, visitor StatsVis
return nil
}

// batchInsertUnsafe inserts otherStats into s without taking on write lock.
// This should only be called during initialization when we can be sure there's
// no other users of s. This avoids the locking overhead when it's not
// necessary.
func (s *LocalIndexUsageStats) batchInsertUnsafe(
otherStats []roachpb.CollectedIndexUsageStatistics,
) {
for _, newStats := range otherStats {
tableIndexStats := s.getStatsForTableID(newStats.Key.TableID, true /* createIfNotExists */, true /* unsafe */)
stats := tableIndexStats.getStatsForIndexID(newStats.Key.IndexID, true /* createIfNotExists */, true /* unsafe */)
stats.Add(&newStats.Stats)
}
}

func (s *LocalIndexUsageStats) clear() {
s.mu.Lock()
defer s.mu.Unlock()
Expand All @@ -241,8 +271,8 @@ func (s *LocalIndexUsageStats) clear() {
}

func (s *LocalIndexUsageStats) insertIndexUsage(idxUse *indexUse) {
tableStats := s.getStatsForTableID(idxUse.key.TableID, true /* createIfNotExists */)
indexStats := tableStats.getStatsForIndexID(idxUse.key.IndexID, true /* createIfNotExists */)
tableStats := s.getStatsForTableID(idxUse.key.TableID, true /* createIfNotExists */, false /* unsafe */)
indexStats := tableStats.getStatsForIndexID(idxUse.key.IndexID, true /* createIfNotExists */, false /* unsafe */)
indexStats.Lock()
defer indexStats.Unlock()
switch idxUse.usageTyp {
Expand All @@ -259,15 +289,21 @@ func (s *LocalIndexUsageStats) insertIndexUsage(idxUse *indexUse) {
}
}

// getStatsForTableID returns the tableIndexStats for the given roachpb.TableID.
// If unsafe is set to true, then the lookup is performed without locking to the
// internal RWMutex lock. This can be used when LocalIndexUsageStats is not
// being concurrently accessed.
func (s *LocalIndexUsageStats) getStatsForTableID(
id roachpb.TableID, createIfNotExists bool,
id roachpb.TableID, createIfNotExists bool, unsafe bool,
) *tableIndexStats {
if createIfNotExists {
s.mu.Lock()
defer s.mu.Unlock()
} else {
s.mu.RLock()
defer s.mu.RUnlock()
if !unsafe {
if createIfNotExists {
s.mu.Lock()
defer s.mu.Unlock()
} else {
s.mu.RLock()
defer s.mu.RUnlock()
}
}

if tableIndexStats, ok := s.mu.usageStats[id]; ok {
Expand All @@ -286,15 +322,22 @@ func (s *LocalIndexUsageStats) getStatsForTableID(
return nil
}

// getStatsForIndexID returns the indexStats for the given roachpb.IndexID.
// If unsafe is set to true, then the lookup is performed without locking to the
// internal RWMutex lock. This can be used when tableIndexStats is not being
// concurrently accessed.
func (t *tableIndexStats) getStatsForIndexID(
id roachpb.IndexID, createIfNotExists bool,
id roachpb.IndexID, createIfNotExists bool, unsafe bool,
) *indexStats {
if createIfNotExists {
t.Lock()
defer t.Unlock()
} else {
t.RLock()
defer t.RUnlock()
if !unsafe {
if createIfNotExists {
t.Lock()
defer t.Unlock()
} else {
t.RLock()
defer t.RUnlock()
}

}

if stats, ok := t.stats[id]; ok {
Expand Down Expand Up @@ -329,7 +372,7 @@ func (t *tableIndexStats) iterateIndexStats(
}

for _, indexID := range indexIDs {
indexStats := t.getStatsForIndexID(indexID, false /* createIfNotExists */)
indexStats := t.getStatsForIndexID(indexID, false /* createIfNotExists */, false /* unsafe */)

// This means the data is being cleared before we can fetch it. It's not an
// error, so we simply just skip over it.
Expand Down
2 changes: 1 addition & 1 deletion pkg/sql/idxusage/local_index_usage_stats_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ func TestIndexUsageStatisticsSubsystem(t *testing.T) {
t.Run("point lookup", func(t *testing.T) {
actualEntryCount := 0
for _, index := range indices {
stats := localIndexUsage.Get(index)
stats := localIndexUsage.Get(index.TableID, index.IndexID)
require.NotNil(t, stats)

actualEntryCount++
Expand Down
1 change: 1 addition & 0 deletions pkg/sql/logictest/testdata/logic_test/crdb_internal
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ crdb_internal gossip_liveness table NULL NULL NULL
crdb_internal gossip_network table NULL NULL NULL
crdb_internal gossip_nodes table NULL NULL NULL
crdb_internal index_columns table NULL NULL NULL
crdb_internal index_usage_statistics table NULL NULL NULL
crdb_internal interleaved table NULL NULL NULL
crdb_internal invalid_objects table NULL NULL NULL
crdb_internal jobs table NULL NULL NULL
Expand Down
1 change: 1 addition & 0 deletions pkg/sql/logictest/testdata/logic_test/crdb_internal_tenant
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ crdb_internal gossip_liveness table NULL NULL NULL
crdb_internal gossip_network table NULL NULL NULL
crdb_internal gossip_nodes table NULL NULL NULL
crdb_internal index_columns table NULL NULL NULL
crdb_internal index_usage_statistics table NULL NULL NULL
crdb_internal interleaved table NULL NULL NULL
crdb_internal invalid_objects table NULL NULL NULL
crdb_internal jobs table NULL NULL NULL
Expand Down
11 changes: 11 additions & 0 deletions pkg/sql/logictest/testdata/logic_test/create_statements
Original file line number Diff line number Diff line change
Expand Up @@ -427,6 +427,17 @@ CREATE TABLE crdb_internal.index_columns (
column_direction STRING NULL,
implicit BOOL NULL
) {} {}
CREATE TABLE crdb_internal.index_usage_statistics (
table_id INT8 NOT NULL,
index_id INT8 NOT NULL,
total_reads INT8 NOT NULL,
last_read TIMESTAMPTZ NULL
) CREATE TABLE crdb_internal.index_usage_statistics (
table_id INT8 NOT NULL,
index_id INT8 NOT NULL,
total_reads INT8 NOT NULL,
last_read TIMESTAMPTZ NULL
) {} {}
CREATE TABLE crdb_internal.interleaved (
database_name STRING NOT NULL,
schema_name STRING NOT NULL,
Expand Down
1 change: 1 addition & 0 deletions pkg/sql/logictest/testdata/logic_test/grant_table
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ test crdb_internal gossip_liveness public
test crdb_internal gossip_network public SELECT
test crdb_internal gossip_nodes public SELECT
test crdb_internal index_columns public SELECT
test crdb_internal index_usage_statistics public SELECT
test crdb_internal interleaved public SELECT
test crdb_internal invalid_objects public SELECT
test crdb_internal jobs public SELECT
Expand Down
Loading

0 comments on commit d33af93

Please sign in to comment.