Add 'rows affected' output when using sql engine #2051

edalfon · 2021-09-25T22:21:12Z

Adds a default output for sql chunks where is_sql_update_query evaluates to TRUE, as proposed in #2050.

It does it simply by:

Commit 1: propagating DBI::DBI::dbExecute's result to data, so that users can set output.var and get the number of affected rows assigned to a variable.
Commit 2: adding a default output, indicating the number of rows affected.

I included a line in NEWS.md describing this. However, I did not include a test. I was not familiar with knitr's testing procedures, but I learned from this recent PR (#1987) that there is one major test for knitr's sql engine in another repository (https://raw.githubusercontent.com/yihui/knitr-examples/master/115-engine-sql.Rmd). This already covers the changes in this PR. I used it to test the changes and everything works. So I didn't really think it was necessary to include additional tests.

A final note: as discussed in knitr's issue #1637, I think a similar change would be needed on RStudio's side, to make it also work when running chunks interactively (RStudio's Run current chunk, Run all, etc.). I haven't looked into that, though, and I have no idea who to poke for this, but I'd be happy to take a look.

Huge thanks for knitr and hope this helps a tiny bit.

also return data ("scalar numeric that specifies the number of rows affected") in case of an is_sql_update_query, and not just SELECT-like queries

and print by default a message with number of rows affected

CLAassistant · 2021-09-25T22:21:16Z

All committers have signed the CLA.

cderv

I had a quick look. This looks fine overall but my main concern is about knitr writing text output. It is not the same to write a resulting table than writing text including a data result. I believe we must not presuppose what a user would like to have written as body text in its output document. Also, it seems the output text is not really relevant. Let's look into that.

First, I am all for saving data for the result of dbExecute() so that it can be saved in output_var and then used later.

Regarding printing, I think it would be ok to have like a informative message() printed, written in english and formatted as we want, but that would output like code result by default. knitr already give such messages in several places. However, it is not the same to return as body text (using results = 'asis'), all the more a formatted text in English with no way to customize. See my inline comments for why.

If we want to stay with the idea of outputting body text (results = 'asis'), we could offer configuration so that it can be customized. This is done already for the table when no sql.print function is provided. options$tab.cap can be specified to customize the table caption. Otherwise a default caption is shown.
See line 611 in the engine.R file for the SQL engine.
However, a table caption is not the same a text in a report. Personally, I would prefer we output a log-like message, like a message.

If we think customization, a user could save the result in output_var, and format itself the text he would like to show, as he would do with any other chunk output I believe, using the variable created and inline r chunks. So is this important to print a formatted output asis ?

Also, regarding the textual output, I am thinking about your comment

it seems SQLite does not actually return the number of rows affected (apparently always returns 0). Yet, this is pretty standard in most RDBMS and it is also the default in DBI::dbExecute, so I guess this is just a quirk specific to SQLite.

If we are not sure that all RDBMS outputs a number of rows, I guess it would be puzzling to have written in your report

Number of affected rows: 0

after a update query... 🤔

From your RStudio projects (https://rstudio.cloud/project/2931829) (thanks for this by the way, that is awesome!), it seems that not all requests are returning a number of rows. For example,

DROP TABLE IF EXISTS toy;

after having created and computed on toy also returns 0 which writes in the output as Markdown text that 0 zeros are affected. It could be puzzling to users.

I have published your HTML document into R Pubs to easily see it: https://rpubs.com/cderv/knitr-pr-2051

So overall, about the SQL engine, my thinking is :

We should stay clause to what the DB specialized tools are doing.
DBI::DBexecute() returns a value, and it seems fair that we allow it to be accessible to the user
DBI::DBexecute() does not print log message, and just return value. So we would in knitr do more than the tools offers. However, it makes sense to a publishing to tool to format some output (like we do with tables). But we must do this properly without assuming too much on what a user would like.

After this long review, does anyone of you two @edalfon or @yihui are sharing my concerns ?

cderv · 2021-09-27T09:16:09Z

NEWS.md

@@ -1,3 +1,7 @@
+- The `sql` engine now produces an output for update-like SQL queries, indicating the number of rows affected (as returned by DBI::dbExecute). So far, `knitr` produces a nicely formatted output for queries returning a result set (typically SELECT...), but gives no feedback at all for queries making changes to the DB (e.g. INSERT|UPDATE|DELETE|CREATE|DROP; see not exported function knitr:::is_sql_update_query). For such queries, `knitr` now shows the number of affected rows. You can also set the chunk option `output.var` to assign the number of affected rows to a variable.


Just so you know, this should be under the last header 1 you see.
1.35 is the next version so this should be moved under.

I'll do it before merging.

cderv · 2021-09-27T09:19:41Z

R/engine.R

+  if (is.numeric(data) && length(data) == 1 && is.null(varname)) {
+    options$results = 'asis'
+    # format = "fg" instead of "d". Row counts on DB may be greater than integer max value
+    output = paste0("Number of affected rows: ", formatC(data, format = "fg", big.mark = ','))
+  }


To follow the style of the previous output assignment, and making it more clear about the different output possible, I would use a else here for the previous if

Suggested change

if (is.numeric(data) && length(data) == 1 && is.null(varname)) {

options$results = 'asis'

# format = "fg" instead of "d". Row counts on DB may be greater than integer max value

output = paste0("Number of affected rows: ", formatC(data, format = "fg", big.mark = ','))

}

else if (is.numeric(data) && length(data) == 1 && is.null(varname)) {

options$results = 'asis'

# format = "fg" instead of "d". Row counts on DB may be greater than integer max value

paste0("Number of affected rows: ", formatC(data, format = "fg", big.mark = ','))

}

cderv · 2021-09-27T09:22:37Z

R/engine.R

@@ -633,6 +633,11 @@ eng_sql = function(options) {

    } else print(display_data) # fallback to standard print
  })
+  if (is.numeric(data) && length(data) == 1 && is.null(varname)) {
+    options$results = 'asis'


I don't really know if we want to force results = 'asis' here. The user would have not control over it.

I understand why it makes sens to have it as markdown text in the output document, but it may not please everyone who would prefer to have it code output.

I don't really know but usually users have control over how the output is show. 🤔

@yihui what do you think ?

cderv · 2021-09-27T09:33:30Z

R/engine.R

+  if (is.numeric(data) && length(data) == 1 && is.null(varname)) {
+    options$results = 'asis'
+    # format = "fg" instead of "d". Row counts on DB may be greater than integer max value
+    output = paste0("Number of affected rows: ", formatC(data, format = "fg", big.mark = ','))


In addition to previous comment, just wondering about internationalization:

We are enforcing a english sentence to be output in a report. This is not perfect for non English report. With result = 'asis', this sentence would be shown as usual text. If I am writing a French report, that would be weird to have such English output in the middle of other text, wouldn't it ?
That is why at least, not enforcing asis output seems better to me

Also, about big.mark = ",", I am just wondering if this is necessary. It is not something usual worldwide, and could even be confusing for some international user. For example, we don't have that in Europe, we often have no big mark or a space. A comma is used as decimal too. However, the context of number of rows makes 12,500 clear enough for a european even if it can be confusing. We would use 12 500 or just 12500. Is it something common for such update query output ?

Maybe we could / should make at least all that configurable ?

Agree!

What I did was trying to keep it similar to how the caption is formatted (line 614 on the same file)

rows_formatted = formatC(rows, format = "d", big.mark = ',')

but you are right that in that case, the user has options to configure the output

edalfon · 2021-09-27T14:01:30Z

Many thanks for reviewing this. I agree with pretty much everything you said. In particular, that saving data for the result of dbExecute() is key. And although I share your concerns on printing, perhaps I just feel less strongly about them.

And the reason I feel less strongly about those concerns is that I think the most common use cases for this output are:

to get quick feedback while interactively trying out queries
to leave documented the result of data processing on the DB side

But I did not think such an output would be mainly used in -let's say- a public facing document (e.g. final report or the like). In that case, as you said, the users can set output.var and format and use it as they wish.

But for the two use cases above, I thought a quick default without configuring anything else would be useful (the user simply keep using a {sql} chunk and there you go). Therefore, I put a quick-and-dirty approach in that second commit.

I totally share your concerns. It's not cool forcing results = 'asis', hard-coding output in English and a particular number format. Ideally, the user should be able to customize all that. But I was not sure it was worth doing it. After all, if the users do want a particular format for that info, they can do it via output.var. So, I guess one should weigh the benefits of doing that vs. the costs in terms of code complexity, maintainability, and so on. Perhaps I perceived that the costs outweigh the benefits. But you are for sure in a better position to make such assessment.

edalfon · 2021-09-27T14:02:18Z

a note on this

Also, regarding the textual output, I am thinking about your comment

it seems SQLite does not actually return the number of rows affected (apparently always returns 0). Yet, this is pretty standard in most RDBMS and it is also the default in DBI::dbExecute, so I guess this is just a quirk specific to SQLite.

If we are not sure that all RDBMS outputs a number of rows, I guess it would be puzzling to have written in your report

My bad here!!! SQLite do also return the number of rows affected. I wrote that after a couple of failed attempts, but I cannot really reproduce them now (perhaps I just looked at the results of DROP TABLE, which are in fact usually 0).

Having said that, DBI::dbExecute can indeed return puzzling results. For example, when you combine several queries, I've seen weird numbers returned by DBI::dbExecute with a Postgres backend (PostgreSQL do let you send multiple statements at once, but duckdb, for example, throws an error. SQLite, I think, only processes the first statement and ignores the rest). But my take on this is that's DBI's deal and usage, and knitr cannot and should not solve that.

edalfon added 3 commits September 24, 2021 11:18

Update engine.R

f78e1c6

also return data ("scalar numeric that specifies the number of rows affected") in case of an is_sql_update_query, and not just SELECT-like queries

Update engine.R

d1f8cbf

and print by default a message with number of rows affected

Update NEWS.md

e45a204

cderv linked an issue Sep 27, 2021 that may be closed by this pull request

produce output for UPDATE-like queries using sql engine #2050

Open

3 tasks

cderv requested changes Sep 27, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 'rows affected' output when using sql engine #2051

Add 'rows affected' output when using sql engine #2051

edalfon commented Sep 25, 2021

CLAassistant commented Sep 25, 2021 •

edited

Loading

cderv left a comment

cderv Sep 27, 2021

cderv Sep 27, 2021

cderv Sep 27, 2021

cderv Sep 27, 2021

edalfon Sep 27, 2021

edalfon commented Sep 27, 2021

edalfon commented Sep 27, 2021

		@@ -1,3 +1,7 @@
		- The `sql` engine now produces an output for update-like SQL queries, indicating the number of rows affected (as returned by DBI::dbExecute). So far, `knitr` produces a nicely formatted output for queries returning a result set (typically SELECT...), but gives no feedback at all for queries making changes to the DB (e.g. INSERT\|UPDATE\|DELETE\|CREATE\|DROP; see not exported function knitr:::is_sql_update_query). For such queries, `knitr` now shows the number of affected rows. You can also set the chunk option `output.var` to assign the number of affected rows to a variable.

Add 'rows affected' output when using sql engine #2051

Are you sure you want to change the base?

Add 'rows affected' output when using sql engine #2051

Conversation

edalfon commented Sep 25, 2021

CLAassistant commented Sep 25, 2021 • edited Loading

cderv left a comment

Choose a reason for hiding this comment

cderv Sep 27, 2021

Choose a reason for hiding this comment

cderv Sep 27, 2021

Choose a reason for hiding this comment

cderv Sep 27, 2021

Choose a reason for hiding this comment

cderv Sep 27, 2021

Choose a reason for hiding this comment

edalfon Sep 27, 2021

Choose a reason for hiding this comment

edalfon commented Sep 27, 2021

edalfon commented Sep 27, 2021

CLAassistant commented Sep 25, 2021 •

edited

Loading