Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution Context integration for Database write operations #7072

Merged
merged 34 commits into from
Jun 27, 2023

Conversation

radeusgd
Copy link
Member

@radeusgd radeusgd commented Jun 19, 2023

Pull Request Description

Closes #6887

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed, the GUI was tested when built using ./run ide build.

@radeusgd radeusgd self-assigned this Jun 19, 2023
@radeusgd radeusgd force-pushed the wip/radeusgd/db-write-execution-contexts branch 2 times, most recently from e29fc61 to 90a21d4 Compare June 20, 2023 16:30
@radeusgd radeusgd marked this pull request as ready for review June 20, 2023 16:30
that the operation checks for errors like missing columns.

More expensive checks, like clashing keys are checked only on a sample of
1000 rows, so it is possible for the operation to encounter one of these
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this optimization should be a separate flag; 'dry run' means 'no real side effects' rather than 'speed up expensive checks'. The two flags might often be used together, but perhaps they should be different.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dry run is not really a flag, but specifies the behaviour when the Output context is disabled.

The only scenario we could do is dry run with checking all values or checking none at all (then the actual check will happen on the proper execution).

I don't think we should be adding additional flags for this - as this parameter would not at all apply when the context is enabled.

We can reconsider if we want to run the check at all. I think checking a subset is a compromise that will allow us to catch some errors while retaining the dry run behaviour and not sabotaging the performance. Do you think we should instead not check at all in dry run mode? Or check all entries?


I like checking all entries because it actually does a 'proper' dry run - it verifies the behaviour that will happen on actual execution. The only worry was that it may be too expensive.

In the in memory backend, this operation will be as expensive as any operations preparing the data before hand (roughly O(N) cost).

Problem is a bit bigger in the DB backend, as all operations are done 'lazily' by default - they just construct more and more complex SQL queries, but do not run them. Now this check would require to run the query. Still, the cost is comparable to the cost of attaching a Table visualization to any of the queries.

And then, there is the issue that while update_database_table does the check, but does not actually retain the data; select_into_database_table is meant to create a temporary dry run table. If we create it with all the data, it's not much different from the 'proper' run, only difference is table name and that it is a temporary table. Still processing all the data gives us the closest experience to the actual run, apart from side effects.


I like processing all data in the dry runs. I would consider trying it and going back to smaller samples if the performance is really unsatisfactory. @jdunkerley what do you think?

@radeusgd radeusgd force-pushed the wip/radeusgd/db-write-execution-contexts branch from 90a21d4 to fce9a4f Compare June 20, 2023 22:53
Copy link
Member

@jdunkerley jdunkerley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good.
A few comment style suggestions.
And some code style suggestions.


Some operations, like writing to tables, require their target to be a
trivial query.
is_trivial_query : Boolean ! Table_Not_Found
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like something that the Context should answer.
Likewise, we can choose to insert with some columns removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why would it be the context? it's a property of the whole table, exactly because of (2) - I also want to check if the columns are intact and unmodified

For example, table.set "[X] + 2" "X" will have context completely unchanged, but the contents of the column X will all be shifted by 2. And so when inserting to this table, should we join along values of the original X or the updated X? It is ill-defined.

  1. And that's why - I don't think we should allow any column modifications, be it rename or removal. After such operations it's no longer the same table, and we are inserting to the original one, not the modified one.

e.g. what if I have table T with columns A, B and C and remove C. Then I append to this new table a table with columns A and B and have error_on_missing_columns=True. C is missing from the table I will actually append into, but not from the table that I set as my target. Do I error on this missing C or not? It's unclear.

So due to these examples, IMO it only makes sense to append to a 'trivial' table.

Comment on lines +219 to +245

? Side Effects

Note that the `read` method is running without restrictions when the
output context is disabled, but it can technically cause side effects,
if it is provided with a DML query. Usually it is preferred to use
`execute_update` for DML queries, or if they are supposed to return
results, the `read` should be wrapped in an execution context check.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this needs work to be reworded - on such a primitive and core function this feel like will confuse end users but agree we need some message here. One for the doc review work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. Just a note that it is a core but pretty advanced function. I guess once someone knows SQL they should be aware that executing an UPDATE RETURNING will not only read but will cause changes. What may not be as clear to even experienced SQL developers, is that such read in the IDE may be re-run many times when the workflow is being modified, and thus the side effect may be invoked multiple times as well (that's why we hide the effects behind the execution context, which we do not have here).

Maybe we should actually detect some keywords like UPDATE, CREATE, DROP, ALTER, INSERT, DELETE and warn here?

@radeusgd radeusgd force-pushed the wip/radeusgd/db-write-execution-contexts branch 2 times, most recently from 26cbf3f to dfed355 Compare June 21, 2023 16:41
@radeusgd radeusgd force-pushed the wip/radeusgd/db-write-execution-contexts branch from dfed355 to 5b8fc83 Compare June 23, 2023 12:31
@radeusgd radeusgd force-pushed the wip/radeusgd/db-write-execution-contexts branch from 5b8fc83 to c6d5914 Compare June 23, 2023 16:01
build.sbt Outdated
"org.netbeans.api" % "org-openide-util-lookup" % netbeansApiVersion % "provided",
"org.xerial" % "sqlite-jdbc" % sqliteVersion,
"org.postgresql" % "postgresql" % "42.4.0"
"org.graalvm.truffle" % "truffle-api" % graalVersion % "provided",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need GraalVM dependency for:

  • importing Value type - we need to return Value not Object, as Object will cause a polyglot conversion that loses Enso warnings,
  • accessing TruffleLogger to log that a maintenance operation failed.

Copy link
Member

@JaroslavTulach JaroslavTulach Jun 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot use truffle-api in standard libraries. Try sdk - that's the JAR which exposes Value.

Btw. truffle-api API transitively depends on sdk - e.g. bringing in more may seem to work, but it is not good idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

truffle-sdk does not seem to exist

[warn]  Note: Unresolved dependencies path:
[error] stack trace is suppressed; run 'last common-polyglot-core-utils / update' for the full output
[error] (common-polyglot-core-utils / update) sbt.librarymanagement.ResolveException: Error downloading org.graalvm.truffle:truffle-sdk:22.3.1
[error]   Not found
[error]   Not found
[error]   not found: C:\Users\progr\.ivy2\localorg.graalvm.truffle\truffle-sdk\22.3.1\ivys\ivy.xml
[error]   not found: https://repo1.maven.org/maven2/org/graalvm/truffle/truffle-sdk/22.3.1/truffle-sdk-22.3.1.pom

Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

truffle-api is an API for those who write interpreters. That is a different "level of Java" than the one used in standard libraries. Moreover I don't think the Java types are even accessible.

build.sbt Outdated
"org.netbeans.api" % "org-openide-util-lookup" % netbeansApiVersion % "provided",
"org.xerial" % "sqlite-jdbc" % sqliteVersion,
"org.postgresql" % "postgresql" % "42.4.0"
"org.graalvm.truffle" % "truffle-api" % graalVersion % "provided",
Copy link
Member

@JaroslavTulach JaroslavTulach Jun 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot use truffle-api in standard libraries. Try sdk - that's the JAR which exposes Value.

Btw. truffle-api API transitively depends on sdk - e.g. bringing in more may seem to work, but it is not good idea.

@radeusgd radeusgd force-pushed the wip/radeusgd/db-write-execution-contexts branch from 9f58d7d to 046c91c Compare June 26, 2023 09:55
@radeusgd radeusgd requested a review from JaroslavTulach June 26, 2023 10:05
@radeusgd radeusgd force-pushed the wip/radeusgd/db-write-execution-contexts branch from bc62843 to 7bbc16f Compare June 26, 2023 12:47
"com.ibm.icu" % "icu4j" % icuVersion,
"org.graalvm.truffle" % "truffle-api" % graalVersion % "provided"
"com.ibm.icu" % "icu4j" % icuVersion,
"org.graalvm.sdk" % "graal-sdk" % graalVersion % "provided"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

graal-sdk is the API to use in Java parts of standard libraries. It's javadoc is available here: https://www.graalvm.org/sdk/javadoc/

Btw. the Truffle javadoc also contains the org.graalvm classes, but that's because of transitive dependency.

@radeusgd radeusgd force-pushed the wip/radeusgd/db-write-execution-contexts branch from 2f06f55 to 516bba6 Compare June 27, 2023 09:39
radeusgd added 24 commits June 27, 2023 15:35
Add a check for transaction support.

remove outdated check

grow builder on seal to ensure all present

CR1: Connection

CR2: rephrasing docs

CR2: Dry_Run_Operation

clearer method names

CR4: code style

javafmt
improve ref counting

checkpoint

add a test for not overwriting pre-existing tables

OperationSynchronizer
…before some DB operations. Better keep track of allocated dry run tables and ensure that a dry run name does not collide with a pre-existing user table.
fetch much less data when checking table existence

fixes

how have I missed dry run for update????; also improve random table generator

fix tests and add a test case for upload

my bad, that was actually correct the first time...
…ibs"

This reverts commit 0c98ff81dd4d95ad38cfbfe5e8872b9c546cbeb7.
`"org.graalvm.sdk" % "graal-sdk" % graalVersion % "provided"` in helper Java libs
@radeusgd radeusgd force-pushed the wip/radeusgd/db-write-execution-contexts branch from 37598ba to fdafcab Compare June 27, 2023 13:35
@radeusgd radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label Jun 27, 2023
@mergify mergify bot merged commit 2bac9cc into develop Jun 27, 2023
@mergify mergify bot deleted the wip/radeusgd/db-write-execution-contexts branch June 27, 2023 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add execution context control to the Database write operations.
4 participants