Execution Context integration for Database write operations #7072

radeusgd · 2023-06-19T14:40:49Z

Pull Request Description

Closes #6887

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

The documentation has been updated, if necessary.
Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
All code follows the
Scala,
Java,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
All code has been tested:
- Unit tests have been written where possible.
- If GUI codebase was changed, the GUI was tested when built using ./run ide build.

distribution/lib/Standard/Database/0.0.0-dev/src/Connection/Connection.enso

GregoryTravis · 2023-06-20T18:01:06Z

distribution/lib/Standard/Database/0.0.0-dev/src/Extensions/Upload_Database_Table.enso

+     that the operation checks for errors like missing columns.
+
+     More expensive checks, like clashing keys are checked only on a sample of
+     1000 rows, so it is possible for the operation to encounter one of these


I wonder if this optimization should be a separate flag; 'dry run' means 'no real side effects' rather than 'speed up expensive checks'. The two flags might often be used together, but perhaps they should be different.

The dry run is not really a flag, but specifies the behaviour when the Output context is disabled.

The only scenario we could do is dry run with checking all values or checking none at all (then the actual check will happen on the proper execution).

I don't think we should be adding additional flags for this - as this parameter would not at all apply when the context is enabled.

We can reconsider if we want to run the check at all. I think checking a subset is a compromise that will allow us to catch some errors while retaining the dry run behaviour and not sabotaging the performance. Do you think we should instead not check at all in dry run mode? Or check all entries?

I like checking all entries because it actually does a 'proper' dry run - it verifies the behaviour that will happen on actual execution. The only worry was that it may be too expensive.

In the in memory backend, this operation will be as expensive as any operations preparing the data before hand (roughly O(N) cost).

Problem is a bit bigger in the DB backend, as all operations are done 'lazily' by default - they just construct more and more complex SQL queries, but do not run them. Now this check would require to run the query. Still, the cost is comparable to the cost of attaching a Table visualization to any of the queries.

And then, there is the issue that while update_database_table does the check, but does not actually retain the data; select_into_database_table is meant to create a temporary dry run table. If we create it with all the data, it's not much different from the 'proper' run, only difference is table name and that it is a temporary table. Still processing all the data gives us the closest experience to the actual run, apart from side effects.

I like processing all data in the dry runs. I would consider trying it and going back to smaller samples if the performance is really unsatisfactory. @jdunkerley what do you think?

distribution/lib/Standard/Database/0.0.0-dev/src/Internal/Postgres/Postgres_Connection.enso

jdunkerley

Generally looks good.
A few comment style suggestions.
And some code style suggestions.

distribution/lib/Standard/Database/0.0.0-dev/src/Connection/Connection.enso

jdunkerley · 2023-06-21T09:34:56Z

distribution/lib/Standard/Database/0.0.0-dev/src/Data/Table.enso

+
+       Some operations, like writing to tables, require their target to be a
+       trivial query.
+    is_trivial_query : Boolean ! Table_Not_Found


This feels like something that the Context should answer.
Likewise, we can choose to insert with some columns removed?

Why would it be the context? it's a property of the whole table, exactly because of (2) - I also want to check if the columns are intact and unmodified

For example, table.set "[X] + 2" "X" will have context completely unchanged, but the contents of the column X will all be shifted by 2. And so when inserting to this table, should we join along values of the original X or the updated X? It is ill-defined.

And that's why - I don't think we should allow any column modifications, be it rename or removal. After such operations it's no longer the same table, and we are inserting to the original one, not the modified one.

e.g. what if I have table T with columns A, B and C and remove C. Then I append to this new table a table with columns A and B and have error_on_missing_columns=True. C is missing from the table I will actually append into, but not from the table that I set as my target. Do I error on this missing C or not? It's unclear.

So due to these examples, IMO it only makes sense to append to a 'trivial' table.

distribution/lib/Standard/Database/0.0.0-dev/src/Connection/Connection.enso

jdunkerley · 2023-06-21T09:52:37Z

distribution/lib/Standard/Database/0.0.0-dev/src/Connection/Connection.enso

+
+       ? Side Effects
+
+         Note that the `read` method is running without restrictions when the
+         output context is disabled, but it can technically cause side effects,
+         if it is provided with a DML query. Usually it is preferred to use
+         `execute_update` for DML queries, or if they are supposed to return
+         results, the `read` should be wrapped in an execution context check.


I feel this needs work to be reworded - on such a primitive and core function this feel like will confuse end users but agree we need some message here. One for the doc review work.

Fair. Just a note that it is a core but pretty advanced function. I guess once someone knows SQL they should be aware that executing an UPDATE RETURNING will not only read but will cause changes. What may not be as clear to even experienced SQL developers, is that such read in the IDE may be re-run many times when the workflow is being modified, and thus the side effect may be invoked multiple times as well (that's why we hide the effects behind the execution context, which we do not have here).

Maybe we should actually detect some keywords like UPDATE, CREATE, DROP, ALTER, INSERT, DELETE and warn here?

distribution/lib/Standard/Database/0.0.0-dev/src/Internal/Upload_Table.enso

radeusgd · 2023-06-23T17:05:48Z

build.sbt

-      "org.netbeans.api" % "org-openide-util-lookup" % netbeansApiVersion % "provided",
-      "org.xerial"       % "sqlite-jdbc"             % sqliteVersion,
-      "org.postgresql"   % "postgresql"              % "42.4.0"
+      "org.graalvm.truffle" % "truffle-api"             % graalVersion       % "provided",


We need GraalVM dependency for:

importing Value type - we need to return Value not Object, as Object will cause a polyglot conversion that loses Enso warnings,

accessing TruffleLogger to log that a maintenance operation failed.

You cannot use truffle-api in standard libraries. Try sdk - that's the JAR which exposes Value.

Btw. truffle-api API transitively depends on sdk - e.g. bringing in more may seem to work, but it is not good idea.

truffle-sdk does not seem to exist

[warn] Note: Unresolved dependencies path: [error] stack trace is suppressed; run 'last common-polyglot-core-utils / update' for the full output [error] (common-polyglot-core-utils / update) sbt.librarymanagement.ResolveException: Error downloading org.graalvm.truffle:truffle-sdk:22.3.1 [error] Not found [error] Not found [error] not found: C:\Users\progr\.ivy2\localorg.graalvm.truffle\truffle-sdk\22.3.1\ivys\ivy.xml [error] not found: https://repo1.maven.org/maven2/org/graalvm/truffle/truffle-sdk/22.3.1/truffle-sdk-22.3.1.pom

distribution/lib/Standard/Database/0.0.0-dev/src/Connection/Connection.enso

JaroslavTulach

truffle-api is an API for those who write interpreters. That is a different "level of Java" than the one used in standard libraries. Moreover I don't think the Java types are even accessible.

JaroslavTulach · 2023-06-24T05:22:03Z

build.sbt

-      "org.netbeans.api" % "org-openide-util-lookup" % netbeansApiVersion % "provided",
-      "org.xerial"       % "sqlite-jdbc"             % sqliteVersion,
-      "org.postgresql"   % "postgresql"              % "42.4.0"
+      "org.graalvm.truffle" % "truffle-api"             % graalVersion       % "provided",


You cannot use truffle-api in standard libraries. Try sdk - that's the JAR which exposes Value.

Btw. truffle-api API transitively depends on sdk - e.g. bringing in more may seem to work, but it is not good idea.

JaroslavTulach · 2023-06-27T05:21:00Z

build.sbt

-      "com.ibm.icu"         % "icu4j"       % icuVersion,
-      "org.graalvm.truffle" % "truffle-api" % graalVersion % "provided"
+      "com.ibm.icu"     % "icu4j"     % icuVersion,
+      "org.graalvm.sdk" % "graal-sdk" % graalVersion % "provided"


graal-sdk is the API to use in Java parts of standard libraries. It's javadoc is available here: https://www.graalvm.org/sdk/javadoc/

Btw. the Truffle javadoc also contains the org.graalvm classes, but that's because of transitive dependency.

Add a check for transaction support. remove outdated check grow builder on seal to ensure all present CR1: Connection CR2: rephrasing docs CR2: Dry_Run_Operation clearer method names CR4: code style javafmt

improve ref counting checkpoint add a test for not overwriting pre-existing tables OperationSynchronizer

…before some DB operations. Better keep track of allocated dry run tables and ensure that a dry run name does not collide with a pre-existing user table.

fetch much less data when checking table existence fixes how have I missed dry run for update????; also improve random table generator fix tests and add a test case for upload my bad, that was actually correct the first time...

fix fixes

…ibs" This reverts commit 0c98ff81dd4d95ad38cfbfe5e8872b9c546cbeb7.

`"org.graalvm.sdk" % "graal-sdk" % graalVersion % "provided"` in helper Java libs

…n failure will now be reported to stderr

radeusgd self-assigned this Jun 19, 2023

radeusgd force-pushed the wip/radeusgd/db-write-execution-contexts branch 2 times, most recently from e29fc61 to 90a21d4 Compare June 20, 2023 16:30

radeusgd marked this pull request as ready for review June 20, 2023 16:30

radeusgd requested review from jdunkerley and GregoryTravis as code owners June 20, 2023 16:30

GregoryTravis approved these changes Jun 20, 2023

View reviewed changes

radeusgd force-pushed the wip/radeusgd/db-write-execution-contexts branch from 90a21d4 to fce9a4f Compare June 20, 2023 22:53

jdunkerley approved these changes Jun 21, 2023

View reviewed changes

radeusgd force-pushed the wip/radeusgd/db-write-execution-contexts branch 2 times, most recently from 26cbf3f to dfed355 Compare June 21, 2023 16:41

GregoryTravis approved these changes Jun 21, 2023

View reviewed changes

radeusgd force-pushed the wip/radeusgd/db-write-execution-contexts branch from dfed355 to 5b8fc83 Compare June 23, 2023 12:31

radeusgd requested review from 4e6, JaroslavTulach, hubertp and Akirathan as code owners June 23, 2023 12:31

radeusgd mentioned this pull request Jun 23, 2023

State and Contexts are not preserved through the Polyglot boundary #7117

Open

radeusgd force-pushed the wip/radeusgd/db-write-execution-contexts branch from 5b8fc83 to c6d5914 Compare June 23, 2023 16:01

radeusgd commented Jun 23, 2023

View reviewed changes

radeusgd requested review from jdunkerley and GregoryTravis June 23, 2023 17:07

GregoryTravis approved these changes Jun 23, 2023

View reviewed changes

distribution/lib/Standard/Database/0.0.0-dev/src/Connection/Connection.enso Outdated Show resolved Hide resolved

JaroslavTulach requested changes Jun 24, 2023

View reviewed changes

radeusgd force-pushed the wip/radeusgd/db-write-execution-contexts branch from 9f58d7d to 046c91c Compare June 26, 2023 09:55

radeusgd requested a review from JaroslavTulach June 26, 2023 10:05

radeusgd force-pushed the wip/radeusgd/db-write-execution-contexts branch from bc62843 to 7bbc16f Compare June 26, 2023 12:47

JaroslavTulach reviewed Jun 27, 2023

View reviewed changes

radeusgd force-pushed the wip/radeusgd/db-write-execution-contexts branch from 2f06f55 to 516bba6 Compare June 27, 2023 09:39

JaroslavTulach approved these changes Jun 27, 2023

View reviewed changes

radeusgd added 24 commits June 27, 2023 15:35

proper workaround for #7093

0d8a5d3

CR - part 2

bfd329d

Add a check for transaction support. remove outdated check grow builder on seal to ensure all present CR1: Connection CR2: rephrasing docs CR2: Dry_Run_Operation clearer method names CR4: code style javafmt

make the test more stable

ee6e1b7

synchronization - attempt 1

11e9de7

improve ref counting checkpoint add a test for not overwriting pre-existing tables OperationSynchronizer

Simplify synchronizer, make registry smarter

7978166

Rewrite synchronizer and registry to drop tables at maintenance just …

904c979

…before some DB operations. Better keep track of allocated dry run tables and ensure that a dry run name does not collide with a pre-existing user table.

fixes

6be6417

fetch much less data when checking table existence fixes how have I missed dry run for update????; also improve random table generator fix tests and add a test case for upload my bad, that was actually correct the first time...

workaround for #7117 bug

4bcdc16

fixes, formatting

2e5b7eb

fix fixes

switch to truffle-sdk instead of truffle-api for helper libs

499b619

Revert "switch to truffle-sdk instead of truffle-api for helper l…

f8a0da9

…ibs" This reverts commit 0c98ff81dd4d95ad38cfbfe5e8872b9c546cbeb7.

CR

0535aac

better error messages

c569385

fix missing dep

b021973

fixes

5d0830d

fixing legal review pt1

85d2d8b

change truffle-api to

6f13755

`"org.graalvm.sdk" % "graal-sdk" % graalVersion % "provided"` in helper Java libs

remove references to Truffle API from std-database; maintenance actio…

9faa8bf

…n failure will now be reported to stderr

Update LR report

326ae9e

testing logger

c4a501f

better error messages

eee0da3

javafmt

04557a9

fix the is_trivial_query check

dcb079d

CR

fdafcab

radeusgd force-pushed the wip/radeusgd/db-write-execution-contexts branch from 37598ba to fdafcab Compare June 27, 2023 13:35

radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label Jun 27, 2023

radeusgd mentioned this pull request Jun 27, 2023

Dataflow error in a default of a type-ascribed argument results in a type error instead of being propagated #7137

Closed

mergify bot merged commit 2bac9cc into develop Jun 27, 2023

mergify bot deleted the wip/radeusgd/db-write-execution-contexts branch June 27, 2023 15:51

radeusgd mentioned this pull request Jun 27, 2023

Ability to list hidden tables #7142

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execution Context integration for Database write operations #7072

Execution Context integration for Database write operations #7072

radeusgd commented Jun 19, 2023 •

edited

Loading

GregoryTravis Jun 20, 2023

radeusgd Jun 21, 2023

jdunkerley left a comment

jdunkerley Jun 21, 2023

radeusgd Jun 21, 2023

jdunkerley Jun 21, 2023

radeusgd Jun 21, 2023

radeusgd Jun 23, 2023

JaroslavTulach Jun 24, 2023 •

edited

Loading

radeusgd Jun 26, 2023

JaroslavTulach left a comment

JaroslavTulach Jun 24, 2023 •

edited

Loading

JaroslavTulach Jun 27, 2023

Execution Context integration for Database write operations #7072

Execution Context integration for Database write operations #7072

Conversation

radeusgd commented Jun 19, 2023 • edited Loading

Pull Request Description

Important Notes

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdunkerley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JaroslavTulach Jun 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JaroslavTulach left a comment

Choose a reason for hiding this comment

JaroslavTulach Jun 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

radeusgd commented Jun 19, 2023 •

edited

Loading

JaroslavTulach Jun 24, 2023 •

edited

Loading

JaroslavTulach Jun 24, 2023 •

edited

Loading