Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sqlite explain graph #3064

Merged
merged 24 commits into from
May 31, 2024
Merged

Sqlite explain graph #3064

merged 24 commits into from
May 31, 2024

Conversation

tyrelr
Copy link
Contributor

@tyrelr tyrelr commented Feb 18, 2024

Improve diagnostic log message of sqlite explain processing to illustrate HOW the explain planner arrives to the output data types it finds. Then add a few simple fixes found with the improved diagnostic view.

Example command to get the dot format graphviz:

RUST_LOG="sqlx::explain=trace" DATABASE_URL=sqlite://tests/sqlite/sqlite.db cargo test --features sqlite,runtime-tokio,tls-rustls,chrono -- --nocapture update

Then the graphviz markup can be copy-pasted into the graphviz renderer/editor of choice. The process isn't elegant, but it works.

Warning: this only actually works when running a program which directly uses the 'describe' operation (such as sqlx's sqlite unit tests). Logger output is not available when compiling a library USING sqlx (the output is ignored - probably by rustc itself?).

Does your PR solve an issue?

Fixes test cases derived from #2939 & #1923 (which involves breaking changes)

@tyrelr

This comment was marked as resolved.

@abonander
Copy link
Collaborator

@tyrelr you should be able to convert this to a draft PR so we don't merge it by accident.

@tyrelr tyrelr marked this pull request as draft February 19, 2024 00:10
@tyrelr

This comment was marked as resolved.

@tyrelr

This comment was marked as outdated.

@tyrelr tyrelr marked this pull request as ready for review February 20, 2024 03:00
@tyrelr
Copy link
Contributor Author

tyrelr commented Feb 20, 2024

Performance is comparable after some code cleanup

group               02f196b4                               98bc9076
-----               --------                               --------
insert/fks          1.01    372.6±4.78µs        ? ?/sec    1.00    369.3±5.90µs        ? ?/sec
insert/returning    1.02    362.1±5.47µs        ? ?/sec    1.00    353.8±3.54µs        ? ?/sec
select/recursive    1.29      9.3±0.08ms        ? ?/sec    1.00      7.2±0.02ms        ? ?/sec
select/trivial      1.00    227.7±9.29µs        ? ?/sec    1.00   227.1±10.82µs        ? ?/sec

@abonander abonander added breaking db:sqlite Related to SQLite labels Mar 6, 2024
@abonander
Copy link
Collaborator

abonander commented Mar 6, 2024

This is really cool, but does the graph generation always run? Can it be made optional, maybe with a specific describe*() variant, or maybe checking if the sqlx::explain span is enabled at trace? Or maybe instead of logging the graphviz output, it could save it to a file with a path given by an environment variable?

Fixes test cases derived from #2939 & #1923 (which involves breaking changes)

So to clarify this is a breaking PR?

@tyrelr
Copy link
Contributor Author

tyrelr commented Mar 9, 2024

This is really cool, but does the graph generation always run?

The generation is tied to the logging level of "sqlx::explain". So both the capture of the data it needs, and the calculations/formatting of the output are behind log_level checks.

Unfortunately, based on my recollection it IS possible to 'enable logging' during compilation time... using RUST_LOG="sqlx::explain=trace" (or RUST_LOG="sqlx=trace", or RUST_LOG="trace") makes the code TRY to log, but I believe rustc discards STDERR/STDOUT from proc macro's, so enabling logging doesn't actually mean you get to SEE the logs...

Can it be made optional, maybe with a specific describe*() variant, or maybe checking if the sqlx::explain span is enabled at trace?

That makes sense, I've changed it to require trace instead of the statement log level.

Logging is a very poor location for this. The primary situation this graph will be useful is when a query result type doesn't match developer expectations, and their crate is now failing compilation. As-is, that means either stubbing out a bunch of code, or creating a new crate using exactly the same sqlx version, query, and database to call describe with trace logging enabled...

Or maybe instead of logging the graphviz output, it could save it to a file with a path given by an environment variable?

A single file-path might get weird, since a single compilation will hit hundreds of queries.

But long-term, I would like to store this 'query result type evidence' alongside the offline query results. Then enhance sqlx-cli with an 'inspect' command to show types and the evidence. Of course, I have NO IDEA if any other database types have any helpful evidence information they might want to provide. But that's a huge change, so just tweaking the logs was an attempt to avoid boiling the ocean.

So to clarify this is a breaking PR?

Yes, the PR as a whole is a breaking change. With a bit of effort, it could be split into 2 PR's to land the strictly-logging-related stuff without breaking changes, and the analysis fixes with breaking changes. But in my mind the features is so niche & relevant to query analysis correctness, that it made sense to just leave them tied together. Especially given the size of the change 'just to get diagnostics'

@abonander abonander merged commit f960d5b into launchbadge:main May 31, 2024
62 checks passed
jayy-lmao pushed a commit to jayy-lmao/sqlx that referenced this pull request Jun 6, 2024
* convert logger to output a query graph

* avoid duplicating branch paths to shrink output graph

* separate different branching paths

* include all branches which found unique states

* track the reason for ending each branches execution

* track the result type of each branch

* make edges rely on history index instead of program_id, to avoid errors when looping

* add state diff to query graph

* drop redundant table info

* rework graph to show state changes, rework logger to store state snapshots

* show state on the previous operation

* gather duplicate state changes into clusters to reduce repetition

* draw invisible connections between unknown instructions by program_i

* clean up dot format string escaping

* add test case from launchbadge#1960 (update returning all columns)

* add tests for launchbadge#2939 (update returning only the PK column)

* allow inserting into a table using only the index

* improve null handling of IfNull, fix output type of NewRowId

* add NoResult nodes for branches which don't log a result, as a sanity check

* add short-circuit to all logging operations

* remove duplicate logging checks, and make logging enabled/disabled consistently depend on sqlx::explain instead of sqlx for capture & sqlx::explain for output

* add failing test for awkwardly nested/filtered count subquery

* handle special case of return operation to fix failing test

* require trace log level instead of using whatever log level  statement logging was configured to use
jrasanen pushed a commit to jrasanen/sqlx that referenced this pull request Oct 14, 2024
* convert logger to output a query graph

* avoid duplicating branch paths to shrink output graph

* separate different branching paths

* include all branches which found unique states

* track the reason for ending each branches execution

* track the result type of each branch

* make edges rely on history index instead of program_id, to avoid errors when looping

* add state diff to query graph

* drop redundant table info

* rework graph to show state changes, rework logger to store state snapshots

* show state on the previous operation

* gather duplicate state changes into clusters to reduce repetition

* draw invisible connections between unknown instructions by program_i

* clean up dot format string escaping

* add test case from launchbadge#1960 (update returning all columns)

* add tests for launchbadge#2939 (update returning only the PK column)

* allow inserting into a table using only the index

* improve null handling of IfNull, fix output type of NewRowId

* add NoResult nodes for branches which don't log a result, as a sanity check

* add short-circuit to all logging operations

* remove duplicate logging checks, and make logging enabled/disabled consistently depend on sqlx::explain instead of sqlx for capture & sqlx::explain for output

* add failing test for awkwardly nested/filtered count subquery

* handle special case of return operation to fix failing test

* require trace log level instead of using whatever log level  statement logging was configured to use
jrasanen pushed a commit to jrasanen/sqlx that referenced this pull request Oct 14, 2024
* convert logger to output a query graph

* avoid duplicating branch paths to shrink output graph

* separate different branching paths

* include all branches which found unique states

* track the reason for ending each branches execution

* track the result type of each branch

* make edges rely on history index instead of program_id, to avoid errors when looping

* add state diff to query graph

* drop redundant table info

* rework graph to show state changes, rework logger to store state snapshots

* show state on the previous operation

* gather duplicate state changes into clusters to reduce repetition

* draw invisible connections between unknown instructions by program_i

* clean up dot format string escaping

* add test case from launchbadge#1960 (update returning all columns)

* add tests for launchbadge#2939 (update returning only the PK column)

* allow inserting into a table using only the index

* improve null handling of IfNull, fix output type of NewRowId

* add NoResult nodes for branches which don't log a result, as a sanity check

* add short-circuit to all logging operations

* remove duplicate logging checks, and make logging enabled/disabled consistently depend on sqlx::explain instead of sqlx for capture & sqlx::explain for output

* add failing test for awkwardly nested/filtered count subquery

* handle special case of return operation to fix failing test

* require trace log level instead of using whatever log level  statement logging was configured to use
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking db:sqlite Related to SQLite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants