Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quote SQL identifiers only when necessary? #11147

Closed
roji opened this issue Mar 4, 2018 · 15 comments
Closed

Quote SQL identifiers only when necessary? #11147

roji opened this issue Mar 4, 2018 · 15 comments
Labels
closed-no-further-action The issue is closed and no further action is planned.
Milestone

Comments

@roji
Copy link
Member

roji commented Mar 4, 2018

In an effort to make EF Core-generated SQL as minimal and as readable as possible, I was considering quoting identifiers only when necessary. In other words, I'd override the appropriate methods in my ISqlGenerationHelper to check the content of the identifier and quote accordingly.

I was wondering if this has already been discussed... Of course the value of doing this isn't enormous, but if there's no major reason not to do it, then why not...

@ErikEJ
Copy link
Contributor

ErikEJ commented Mar 4, 2018

Not sure I agree on readability if only some identifiers are quoted

@roji
Copy link
Member Author

roji commented Mar 4, 2018

@ErikEJ really? You prefer all identifiers to be quoted, always, just because some of them may contain a space or some other character? Personally, when I look at SQL generated from EF Core I find myself wishing all the quote noise would be cleaned out of there...

@ErikEJ
Copy link
Contributor

ErikEJ commented Mar 4, 2018

I do, yes! :-)

@roji
Copy link
Member Author

roji commented Mar 4, 2018

OK, let's see what the others think :)

@smitpatel
Copy link
Contributor

+1. I already do that in Cosmos Db provider. And the readability difference is rather large there.

@ErikEJ
Copy link
Contributor

ErikEJ commented Mar 4, 2018

Good luck with that - rule number 3 (of five):

The identifier must not be a Transact-SQL reserved word. SQL Server reserves both the uppercase and lowercase versions of reserved words. When identifiers are used in Transact-SQL statements, the identifiers that do not comply with these rules must be delimited by double quotation marks or brackets. The words that are reserved depend on the database compatibility level. This level can be set by using the ALTER DATABASE statement.

https://docs.microsoft.com/en-us/sql/relational-databases/databases/database-identifiers

@austindrenski
Copy link
Contributor

+1. This would be very useful.

To address reserved word concerns, how about making it opt-in via DbContext.OnConfiguring(...)?

public enum QuoteIdentifiers
{
    Always,
    AsRequired,
    Never
}

builder.QuoteIdentifiers(Always | AsRequired | Never)

@ErikEJ
Copy link
Contributor

ErikEJ commented Mar 4, 2018

"Never" will break!

@roji
Copy link
Member Author

roji commented Mar 4, 2018

@ErikEJ you're right that keywords are an issue. FWIW, the ADO.NET schema API has a collection which is supposed to provide the list of all reserved words. To what extent this is a reliable list on any given provider is another question... At least for PostgreSQL there's an official, hopefully exhaustive list,

@roji
Copy link
Member Author

roji commented Mar 4, 2018

It's definitely not just a change you can slap on without looking at stuff like keywords, but it should be possible to do correctly...

@ErikEJ
Copy link
Contributor

ErikEJ commented Mar 4, 2018

And doing it must not impact EF performance

@roji
Copy link
Member Author

roji commented Mar 4, 2018

And doing it must not impact EF performance

Unless I'm mistaken these queries are cached, i.e. the delimiting process doesn't happen on every execution (but someone from the EF team should be able to confirm).

Aside from that, delimiting involves quite a bit of allocations for all the concatenations, which this would reduce.

@smitpatel
Copy link
Contributor

Unless I'm mistaken these queries are cached, i.e. the delimiting process doesn't happen on every execution (but someone from the EF team should be able to confirm).

True. First level cache is based on ExpressionTree in linq. The same entry is used when parameter values are changed. Second level cache is at SQL tree level. Same entry is used as long as nullability of parameters does not change. Else we may need to generate a different SQL.

For Performance thing, we have #10513 to actually move most stuff out of SQL gen to first level cache so second level cache doesn't do much work apart from working with parameters.

@ajcvickers
Copy link
Contributor

@roji @ErikEJ We discussed this in triage and we're not going to make any change to the providers we own primarily because it creates an additional coupling for reserved names, etc., but for your providers we're happy to do what you want. Please be sure that this is safe with regard to any kind of SQL injection attack--should be if it is limited to identifiers, but just saying...

@ajcvickers ajcvickers added this to the Discussions milestone Mar 5, 2018
@ajcvickers ajcvickers added the closed-no-further-action The issue is closed and no further action is planned. label Mar 5, 2018
@roji
Copy link
Member Author

roji commented Mar 6, 2018

FYI, implemented this for Npgsql in npgsql/efcore.pg#327. The criteria for being an unquoted identifier in PostgreSQL is actually quite strict (lowercase letters, digits, underscore, dollar), and the reserved word list is pulled from the ADO.NET layer with GetSchema(). Only one keyword has been added in like 5 years, so it feels pretty safe :)

@ajcvickers ajcvickers reopened this Oct 16, 2022
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-no-further-action The issue is closed and no further action is planned.
Projects
None yet
Development

No branches or pull requests

5 participants