-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql,log: productionize the event logging #57737
Conversation
@itsbilal this is the draft branch we talked about yesterday. What I am missing here is an approach that works for the 3 odd events, those that are node-level: node restart, join, decommission. How to best factor that code? |
436a268
to
9c71171
Compare
9c71171
to
55e0735
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me so far! I like the CommonDetails/CommonSQLDetails etc setup of the protobuf structs. I made one suggestion around handling the node events by having a more specialized method than InsertEventRecord
that fills in node-specific details. From my code observation so far, that seems to make sense to be, but let me know if it doesn't.
Reviewed 16 of 16 files at r1, 21 of 21 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz)
pkg/sql/event_log.go, line 233 at r2 (raw file):
// InsertEventRecord inserts a single event into the event log as part of the // provided transaction. func (ev EventLogger) InsertEventRecord(
I'm guessing all uses of this method will soon be from just this file. Might be worth unexporting the method and having a separate one, LogNodeEvent
, that takes in a partially-filled NodeEvent, a node descriptor/ctx/etc, and fills in common details + common node details?
55e0735
to
95665be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good for a first round of review.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal)
pkg/sql/event_log.go, line 233 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
I'm guessing all uses of this method will soon be from just this file. Might be worth unexporting the method and having a separate one,
LogNodeEvent
, that takes in a partially-filled NodeEvent, a node descriptor/ctx/etc, and fills in common details + common node details?
Done.
279e1d0
to
d4186bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great so far! Just some minor comments. Looking forward to seeing the rest of the changes come in.
Reviewed 45 of 45 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz)
pkg/sql/event_log.go, line 150 at r3 (raw file):
` args := []interface{}{ info.CommonDetails().Timestamp,
Might be a good idea returning an error here if this is unset?
pkg/util/log/eventpb/events.proto, line 21 at r3 (raw file):
// // Notes: // - the fields have jsontag ",..." to override the default
,omitempty
intead of ... ? I'm not sure if I understand this comment as-is.
d4186bc
to
3ece831
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal)
pkg/sql/event_log.go, line 150 at r3 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Might be a good idea returning an error here if this is unset?
Good idea. Done.
pkg/util/log/eventpb/events.proto, line 21 at r3 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
,omitempty
intead of ... ? I'm not sure if I understand this comment as-is.
Explained in comment.
3ece831
to
715a692
Compare
b66fd00
to
a708667
Compare
CockroachDB needs to report structured events for "important" (notable) events on the logical structure of a cluster, including changes to the SQL logical schema, node activity, privilege changes etc. Prior to this patch, these events were reported mainly into the table `system.eventlog`, with a partial copy of the payload into the debugging external log (`cockroach.log`). This solution was incomplete and unsatisfactory in many ways: - the event payloads were not documented. - the event payloads were not centrally defined, which was preventing the generation of automatic documentation. - the payload types were declared "inline" in the log calls, which made it easy for team members to inadvertently change the structure of the payload and make them backward-incompatible for users consuming this data externally. - the payload fields were inconsistently named across event types. - the metadata fields on the payloads were incompletely and inconsistently populated: - the SQL instance ID was missing in some cases. - the descriptor ID of affected descriptor was missing in some cases. - the same event type was mistakenly used for different events (e.g. "rename_database" for both RENAME DATABASE and CONVERT TO SCHEMA) - the same event type was abusingly over-used for multiple separate operations, e.g. a single event would be generated for a multi-table, multi-user GRANT or REVOKE operation. - events could be omitted entirely from external logs, e.g. if the system.eventlog table was not available during a node restart. - the copy in the external log was not parseable. Generally, the logging package was unaware of the internal structure of events and would “flatten” them. - no provision was available to partially redact events. From the logging system's perspective, the entire payload is sensitive. This commit changes the situation as follows: - it centralizes the payload definitions and standardizes them into a new package `eventspb`. - it groups events into categories and associates logging channels to each event category. - it enables automatic generation of documentation for events. - it ensures that field names are consistent across event payloads. - it ensures that event metadata is consistently populted. - it decomposes complex GRANT/REVOKE operations into individual events. - it automates the generation of a reference documentation for all event types. - it ensures that events are sent to external logs unconditionally. The following problems continue to exist and need to be resolved separately: - privilege changes that occur as a side effect of certain operations do not get events logged: cockroachdb#57573 cockroachdb#57576 cockroachdb#57739 cockroachdb#57741 - the name fields in certain DDL events is not properly qualified, which prevents the determination of the logical schema or database where the object was altered: cockroachdb#57734 cockroachdb#57735 cockroachdb#57738 cockroachdb#57740 Release note (sql change): The cluster event logging system has been modernized. In particular, the schema of the entries for the `info` column in `system.eventlog` has been stabilized. Release note (cli change): The events that were previously only stored in `system.eventlog` are now also directed unconditionally to an external logging channel using a JSON format. Refer to the configuration to see how to customize how events are directed to external sinks. Note that the exact external output format (and thus how to detect/parse the events from e.g. log files) is not yet stabilized and remains subject to change. Release note (doc change): The cluster event logging system has been standardized. A reference documentation is now available (auto-generated from source code); changes to non-reserved payloads will now be announced at least one release version in advance. The event types are organized into broad categories: SQL Logical Schema Changes, SQL Privilege Changes, SQL User Management, CLuster-level events and SQL Miscellaneous operations. Release note (backward-incompatible change): The payload fields for certain event types in `system.eventlog` have been changed and/or renamed. Note that the payloads in `system.eventlog` were an undocumented, reserved feature so no guarantee was made about cross-version compatibility to this point. The list of changes includes (but is not limited to): - `TargetID` has been renamed to `NodeID` for `node_join` - `TargetID` has been renamed to `TargetNodeID` for `node_decommissioning` / `node_decommissioned` / `node_recommissioned`. - `NewDatabaseName` has been renamed to `NewDatabaseParent` for `convert_to_schema`. - `grant_privilege` and `revoke_privilege` have been removed; they are replaced by `change_database_privilege`, `change_schema_privilege`, `change_type_privilege` and `change_table_privilege`. Each event only reports change for one user/role, so the `Grantees` field was renamed to `Grantee`. - Each `drop_role` event now pertains to a single user/role. Release note (sql change): The `targetID` and `reportingID` columns in `system.eventlog` are now deprecated. Their values, for relevant event types, can be found as fields inside the `info` column instead.
a708667
to
b6f03db
Compare
bors r=itsbilal |
Build succeeded: |
Informs #57629
CockroachDB needs to report structured events for
"important" (notable) events on the logical structure of a cluster,
including changes to the SQL logical schema, node activity, privilege
changes etc.
Prior to this patch, these events were reported mainly into the
table
system.eventlog
, with a partial copy of the payload into thedebugging external log (
cockroach.log
).This solution was incomplete and unsatisfactory in many ways:
the event payloads were not documented.
the event payloads were not centrally defined, which was preventing
the generation of automatic documentation.
the payload types were declared "inline" in the log calls, which
made it easy for team members to inadvertently change the structure of
the payload and make them backward-incompatible for users consuming
this data externally.
the payload fields were inconsistently named across event types.
the metadata fields on the payloads were incompletely and
inconsistently populated:
cases.
the same event type was mistakenly used for different events (e.g.
"rename_database" for both RENAME DATABASE and CONVERT TO SCHEMA)
the same event type was abusingly over-used for multiple separate
operations, e.g. a single event would be generated for a multi-table,
multi-user GRANT or REVOKE operation.
events could be omitted entirely from external logs, e.g.
if the system.eventlog table was not available during a node
restart.
the copy in the external log was not parseable. Generally, the
logging package was unaware of the internal structure of events
and would “flatten” them.
no provision was available to partially redact events. From the
logging system's perspective, the entire payload is sensitive.
This commit changes the situation as follows:
new package
eventspb
.channels to each event category.
events.
event types.
The following problems continue to exist and need to be resolved
separately:
privilege changes that occur as a side effect of certain operations
do not get events logged:
sql: missing event log entries for privilege changes in
crdb_internal.unsafe_xxx
#57573backupccl: missing event log entries for user/role and privilege changes incurred by backup/restore #57576
sql: missing event log entry for CREATE STATISTICS statement #57739
sql: missing event log entries for ALTER .. SET SCHEMA statements #57741
the name fields in certain DDL events is not properly qualified,
which prevents the determination of the logical schema or database
where the object was altered:
sql: event log entries for ALTER/DROP TYPE do not properly include the fully qualified type name #57734
sql: event log entries for DROP TABLE,INDEX CASCADE + ALTER TABLE do not properly include the fully qualified names for dropped views #57735
sql: event log entries for DDL for user-defined schemas do not include fully qualified names #57738
sql: event log entry for COMMENT ON INDEX,COLUMN does not properly qualify its TableName payload #57740
Release note (sql change): The cluster event logging system has been
modernized. In particular, the schema of the entries for the
info
column in
system.eventlog
has been stabilized.Release note (cli change): The events that were previously only stored
in
system.eventlog
are now also directed unconditionally to anexternal logging channel using a JSON format. Refer to the
configuration to see how to customize how events are directed to
external sinks. Note that the exact external output format (and thus
how to detect/parse the events from e.g. log files) is not yet
stabilized and remains subject to change.
Release note (doc change): The cluster event logging system has been
standardized. A reference documentation is now
available (auto-generated from source code); changes to non-reserved
payloads will now be announced at least one release version in
advance. The event types are organized into broad categories: SQL
Logical Schema Changes, SQL Privilege Changes, SQL User Management,
CLuster-level events and SQL Miscellaneous operations.
Release note (backward-incompatible change): The payload fields for
certain event types in
system.eventlog
have been changed and/orrenamed. Note that the payloads in
system.eventlog
were anundocumented, reserved feature so no guarantee was made about
cross-version compatibility to this point. The list of changes
includes (but is not limited to):
TargetID
has been renamed toNodeID
fornode_join
TargetID
has been renamed toTargetNodeID
fornode_decommissioning
/node_decommissioned
/node_recommissioned
.NewDatabaseName
has been renamed toNewDatabaseParent
forconvert_to_schema
.grant_privilege
andrevoke_privilege
have been removed;they are replaced by
change_database_privilege
,change_schema_privilege
,change_type_privilege
andchange_table_privilege
.Each event only reports change for one user/role, so the
Grantees
field was renamed toGrantee
.drop_role
event now pertains to a single user/role.Release note (sql change): The
targetID
andreportingID
columns insystem.eventlog
are now deprecated. Their values, for relevant eventtypes, can be found as fields inside the
info
column instead.