Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[10.x] Escaping functionality within the Grammar #46558

Merged
merged 8 commits into from
May 24, 2023

Conversation

tpetry
Copy link
Contributor

@tpetry tpetry commented Mar 23, 2023

So in Laravel 10, a PR of me (#44784) was merged to support database expressions with grammar-specific formatting. The idea is to bring more capabilities of abstracting the SQL flavors of different databases to Laravel. You shouldn't have to use any vendor-specific raw() statements anymore.

The idea is to replace those hardcoded SQL constructs with classes (like tpetry/laravel-query-expressions) that automatically change the generated SQL based on the used database:

// Instead of:
User::query()
    ->when(isPostgreSQL(), fn ($query) => $query->selectRaw('coalesce("user", "admin") AS "value"'))
    ->when(isMySQL(), fn ($query) => $query->selectRaw('coalesce(`user`, `admin`) AS `value`'))

// You can use:
User::select(new Alias(new Coalesce(['user', 'admin']), 'count'));

But one problem still needs to be solved: For some queries you have to use values with a query:

Movie::select([
    new CountFilter(new Equal('released', new Value(2021))),
    new CountFilter(new Equal('released', new Value(2022))),
    new CountFilter(new Equal('genre', new Value('drama'))),
    new CountFilter(new Equal('genre', new Value('action'))),
])->where('streamingservice', 'netflix');

Embedding numeric values is easy as they can not be used for SQL injections. But any user-provided string could be an SQL injection vector. This is solved in Laravel by using bindings (placeholders) in queries. But these can't be used for expressions as many parts of the Query Builder and Eloquent would have to be rewritten - the needed changes are too broad.

To solve the problem, I am proposing a solution to add support for database grammars to escape any value for safe embedding into SQL queries. PHP provides this natively with the PDO::quote method. But the documentation of it states that it should be theoretically safe to use a quoted string within a SQL query.

In my opinion (and after extensive research) this implementation is safe because of these reasons:

  • The PHP MySQL driver emulates prepares by default and does not use actual prepared statements. So the default behavior is already embedding placeholders/bindings into the SQL query. So this approach must be safe.
  • All known SQL injections with PDO::quote() are always some strange charset conversion tricks that begin by using invalid UTF-8 sequences. But those attacks are fixed by setting the proper connection charset. And Laravel's configuration has a charset setting that will be used.
  • Any value is additionally checked for invalid UTF-8 sequences to further secure the implementation. This eliminates the complete attack vector of inaccurate charset conversions.

What are your opinions?

@tpetry
Copy link
Contributor Author

tpetry commented Mar 23, 2023

@taylorotwell This is an implementation of your proposed approach for the initial database expressions which feature that I wanted to postpone. This implementation is backwards compatible in the meaning that any 3rd party driver will still work without any code change (because no constructor is added). Those drivers will just not be compatible with the escaping databas expressions until the add the setGrammar call. But any drivers won't break because of this change.

@valorin
Copy link
Contributor

valorin commented Mar 24, 2023

I'm out of my depth with database stuff, so take this with a grain of salt. I'm just looking at this from a security POV and hopefully it's constructive, but feel free to push back if you think I'm being too paranoid. 🙂

It worries me that it "should be theoretically safe", and although all you're doing is exposing a core PHP function, similar to other aspects of Laravel with some extra validation, it's specifically intended to perform a security function.

Do you know how widely used the PDO::quote() function is?
Do other frameworks/ORMs/DB packages use it?
Have you checked for any open bugs reported with it?

How much extra effort would it be to use a prepared statement instead?
I recently discovered you can use named parameter markers instead of ? - I assume you're aware of them, but if not, would they be any help?

@tpetry
Copy link
Contributor Author

tpetry commented Mar 24, 2023

I'm just looking at this from a security POV and hopefully it's constructive, but feel free to push back if you think I'm being too paranoid. 🙂

That's why I asked you to comment it ;)

It worries me that it "should be theoretically safe", and although all you're doing is exposing a core PHP function, similar to other aspects of Laravel with some extra validation, it's specifically intended to perform a security function.

Yes, the "should be theoretically safe" bugs me too. The PHP devs never state why the say it is only theoretical safe. Before prepared statements had been available mysql_real_escape_string was the solution and no one had any problem with it... And as said is the PHP MySQL driver not using prepared statements too by default.

Do you know how widely used the PDO::quote() function is? Do other frameworks/ORMs/DB packages use it? Have you checked for any open bugs reported with it?

Everyone is using prepared statements. And I only found some references:

  • Doctrine DBAL also supports PDO::quote and states it is usable for constructing a query string. But advises in the next sentence to use prepared statements because it is a lot easier.
  • Wordpress is still not using prepared statements. But they use mysql_real_escape_string which should be doing the same and is not marked as only thoreticall safe.

How much extra effort would it be to use a prepared statement instead? I recently discovered you can use named parameter markers instead of ? - I assume you're aware of them, but if not, would they be any help?

A lot of stuff within the DB core has to be rewritten for bindings support with expressions. Its a very big effort.

@crynobone
Copy link
Member

Would there be any issue for a project with custom connection and grammar (such as Oracle, MongoDb etc) since we planning to add this on a minor release.

@tpetry
Copy link
Contributor Author

tpetry commented Mar 24, 2023

Would there be any issue for a project with custom connection and grammar (such as Oracle, MongoDb etc) since we planning to add this on a minor release.

No problem. When those new drivers don't call Grammar::setConnection everything will still work. Expressions will just not be able to escape values and fail. As nothing in the core depends on it, this is not a problem. When those drivers implement escaping and they do the call they will work without any problems as any other driver.

@morloderex
Copy link
Contributor

@tpetry Looking at the PHP manual for the PDO quote function it actually states at it can take 2 parameters a string and a integer specifying what datatype PDO is suppose to qoute.

Maybe we should somehow consider implementing that sense i can see someone complain in the future that they cannot escape binary data?

@tpetry
Copy link
Contributor Author

tpetry commented Mar 24, 2023

Maybe we should somehow consider implementing that sense i can see someone complain in the future that they cannot escape binary data?

Do you have a use-case or reason for this? Binary data is still just some string with just different content within. The PDO::PARAM_LOB is for the large object type which needs special handling. I don't believe any part within Laravel can currently work with that. Nevertheless shouldn't large objects (LOB) not stored in the database anyway.

And as @crynobone said, there are plans to support MongoDB. Which couldn't work with PDO escaping params either.

@tontonsb
Copy link
Contributor

Some notes on why I would be reluctant to rely on quote.

Not all drivers have it

The PHP quote() docs state that:

Not all PDO drivers implement this method (notably PDO_ODBC). Consider using prepared statements instead.
And we can see that this is true indeed.

Not all drivers handle all strings

The sqlite driver apparently "doesn't handle binary strings" (see next).

Binary strings may break in this

Do you have a use-case or reason for this? Binary data is still just some string with just different content within.

I'd say that for the purpose of these drivers binary data is a string that can contain a null byte. It's not uncommon (and occasionally it is even recommended) to store some values like UUIDs in binary.

The PGSQL driver will truncate such strings:

If a terminating zero byte is found before length bytes are processed, PQescapeStringConn stops at the zero

Btw regarding charsets

But those attacks are fixed by setting the proper connection charset.

Wouldn't this interfere with using different charsets for different columns?

They seem less battle tested

This is a bit more subjective, but to me quotes appear to be an afterthought on many drivers. I'd only really trust the mysql one.

Have you checked for any open bugs reported with it?

@valorin https://bugs.php.net/bug.php?id=81740 (recently fixed) and https://bugs.php.net/bug.php?id=81053 might provide an insight on what kind of errors have arisen lately.

@tontonsb
Copy link
Contributor

A lot of stuff within the DB core has to be rewritten for bindings support with expressions. Its a very big effort.

I haven't really tried as it only seemed really necessary in one place, but I doubt that it would be so bad.

Many (all?) cases go through Builder:castBinding where one could extract bindings from Expression instead of skipping those values.

Besides, most of the query methods have their raw counterparts, so a lot of the cases could also be handled at that level in a manner like this:

public function method($expr)
{
    if ($expr instanceof Expression)
        return $this->methodRaw($expr->getSql(), $expr->getBindings());

    // the current code of `method`
}

The main issue with such change would be adding something like getBindings() to Expression which would break BC. That's why I previously proposed an additional new expression interface instead.

@tpetry
Copy link
Contributor Author

tpetry commented Mar 25, 2023

Not all drivers have it

The PHP quote() docs state that:

Not all PDO drivers implement this method (notably PDO_ODBC). Consider using prepared statements instead.
And we can see that this is true indeed.

The drivers supported by Laravel all implement it. ODBC is not used by Laravel.

Not all drivers handle all strings

The sqlite driver apparently "doesn't handle binary strings" (see next).

Binary strings may break in this

Do you have a use-case or reason for this? Binary data is still just some string with just different content within.

I'd say that for the purpose of these drivers binary data is a string that can contain a null byte. It's not uncommon (and occasionally it is even recommended) to store some values like UUIDs in binary.

The PGSQL driver will truncate such strings:

If a terminating zero byte is found before length bytes are processed, PQescapeStringConn stops at the zero

Thats a bad behaviour of them. I've tested it and the statement is true. But it is no big deal in my opinion. The PostgreSQL and SQLite grammar can just split any string by the zero byte, quote the parts independently and join them again. In reality it even doesnt have to join them, it can generate a concatenated value: 'AB'||'\x00'||'CD'

Btw regarding charsets

But those attacks are fixed by setting the proper connection charset.

Wouldn't this interfere with using different charsets for different columns?

The connection charset and column charsets are different things. Relevant in this case is only the connection charset which specifies how the SQL query will be parsed. The database will then internally transform the value to the column charset for storage. And vice versa retrieve values of the column charset and cast to the connection charset when querying data.

A lot of stuff within the DB core has to be rewritten for bindings support with expressions. Its a very big effort.

I haven't really tried as it only seemed really necessary in one place, but I doubt that it would be so bad.

Many (all?) cases go through Builder:castBinding where one could extract bindings from Expression instead of skipping those values.

Besides, most of the query methods have their raw counterparts, so a lot of the cases could also be handled at that level in a manner like this:

public function method($expr)
{
    if ($expr instanceof Expression)
        return $this->methodRaw($expr->getSql(), $expr->getBindings());

    // the current code of `method`
}

This may sound simple but you're only looking at provided binding values. You could also use database expressions as a column reference, selected value, group by etc. And within a GROUP BY placeholders are not allowed, this means quoting is again required - with the added complexity that provided values now need to be different handled. Sometimes you would quote them into the string, sometimes use bindings. That's impossible.

The main issue with such change would be adding something like getBindings() to Expression which would break BC. That's why I previously proposed an additional new expression interface instead.

There are approaches to do it without breaking BC. An BindingExpression implements Expression would be completely BC compliant with the added benefit that not two completely different expressions exist.

@tontonsb
Copy link
Contributor

Thats a bad behaviour of them. I've tested it and the statement is true. But it is no big deal in my opinion. The PostgreSQL and SQLite grammar can just split any string by the zero byte, quote the parts independently and join them again. In reality it even doesnt have to join them, it can generate a concatenated value: 'AB'||'\x00'||'CD'

Would it be very hard to support the PARAM_LOB option instead?

I have no idea about SQLite, but Postgres doesn't support the null byte in a string field. The contents for a bytea field, on the other hand, is properly prepared by the PQescapeByteaConn method which is invoked if you specify PARAM_LOB.

@tpetry
Copy link
Contributor Author

tpetry commented Mar 29, 2023

I am working on a new implementation that will manually fix the zero-byte issues with PostgreSQL/SQLite and also supports embedding binary data. Will take some days.

@tpetry tpetry force-pushed the db-expressions-escape branch from 5eb1c54 to b3c68e7 Compare April 4, 2023 15:36
@tpetry
Copy link
Contributor Author

tpetry commented Apr 4, 2023

I've completely rewritten the implementation and also added tests:

  • A new $isBinary parameter has been added to encode the value as a binary string. This will use the native binary SQL literal for every database. This is safe against SQL injections because it is just a hex value within SQL.
  • Numbers are now escaped as a number literals instead of strings with PDO::quote
  • Booleans are now escaped in the native format for every database. The number 0 and 1are used with every one except PostgreSQL which has native boolean literals (true and false).
  • The discussed null-bytes in string bytes had been a big problem. Only MySQL supports them in an escaped way within SQL queries. PostgreSQL doesn't support null bytes in string and SQLite and SQL Server will use the native null byte within a string. This last behaviour is not good either as you can't see them when viewing or copying a query and would result in a lot of issues. So I dropped support for null-bytes within strings by throwing an exception. If you want null bytes you shoul use the $isBinary parameter. By dropping it all databases are again behaving the same.

@crynobone To support MongoDB in the future the encodeString method has been added to the connection. The mongo driver would overwrite that one to replace PDO::quote with a mongo specific one.

@tontonsb Whats your opinion?

@tontonsb
Copy link
Contributor

tontonsb commented Apr 5, 2023

Thank you for your work @tpetry! This is a lot more polished now. The case-by-case handling of value types feels like it provides more control, especially if one of the cases will have to be adjusted.

So I dropped support for null-bytes within strings by throwing an exception. If you want null bytes you shoul use the $isBinary parameter.

I like this. Although it kind of takes away a MySQL feature, putting binary strings like a string is not a feature that should be encouraged.

I will add a couple of minor inline comments in a few moments, but I no longer have any objections against the implementation :)

src/Illuminate/Database/Connection.php Outdated Show resolved Hide resolved
src/Illuminate/Database/Connection.php Outdated Show resolved Hide resolved
src/Illuminate/Database/Connection.php Outdated Show resolved Hide resolved
src/Illuminate/Database/Connection.php Outdated Show resolved Hide resolved
src/Illuminate/Database/Connection.php Outdated Show resolved Hide resolved
src/Illuminate/Database/Grammar.php Outdated Show resolved Hide resolved
src/Illuminate/Database/PostgresConnection.php Outdated Show resolved Hide resolved
@tpetry
Copy link
Contributor Author

tpetry commented Apr 6, 2023

Another updates based on the good feedback by @tontonsb

  • The type handling has been improved. I formerly used is_numeric to detect strings. But this function is problematic with leading/trailing spaces. To make the implementation more consistent and easy the exact type provided will now be escaped. If you provide a number as a string it will be a string! This has the added benefit that no complex type casting logic needs to be maintained. It is up to the caller to provide the correct types.
  • I completely missed the null type. It is now added.
  • The test have been extended to reflect the changes of both former changes.

@tpetry
Copy link
Contributor Author

tpetry commented Apr 14, 2023

After more testing and using it in a various ways I can say that the PR reached its final state. It is working really great in escaping the different values specific for every db. Additionally, there is absolutely no breaking change for existing drivers. The full implementation is opt-in by drivers.

While the PR was built to improve the query expressions feature, there are more usefull ways it can be used:

  • A new eloquent binary cast can be built with the new binary escaping mode.
  • Additionally to Builder::toSql() a new implementation (Builder::toRawSql()) could generate a SQL string with all bindings applied and being safe from SQL injections. This was not possible until now. Multiple PRs had been opened for toRawSqlin the past that got closed. We could finally have a good implementation.
  • DB::listen (which I use quite often) could add sqlRaw that also has the bindings applied.
  • Ray could extend the debugging functionality to apply bindings to queries /cc @freekmurze
  • Ignition currently displays invalid SQL queries for SQL exceptions because values are not correctly escaped. This could be fixed. /cc @freekmurze

Sorry @taylorotwell that the implementation took so long. I promised it the day after Laracon EU but I needed several attempts to get one implementation with the least changes. I am now happy with that one :)

@tpetry tpetry marked this pull request as ready for review April 14, 2023 07:13
@laravel laravel deleted a comment from Mastacow Apr 24, 2023
@willrowe
Copy link
Contributor

I would rather see support for expressions returning bindings. Rolling our own escaping seems a bit risky when bindings would solve most use cases without adding and maintaining extra code.

I agree with @tontonsb that adding support for bindings in expressions would not be extremely difficult. I have drafted several PRs in the past to add support for this and the changes were not that involved.

The argument that escaping needs to be done in places where bindings are not allowed is somewhat valid, but I'm not sure if that's currently within the scope of what Laravel should be offering a solution for. If we add support for bindings that would bring expressions into line with the rest of the query builder, whereas it would seem odd to add in a custom escaping solution prior to expressions even supporting basic bindings.

@tpetry
Copy link
Contributor Author

tpetry commented Apr 26, 2023

I agree with @tontonsb that adding support for bindings in expressions would not be extremely difficult.

I disagree with that. Adding binding support only looks simple at first. But you need to support more than adding bindings for where. You also have to support expressions in selects, joins, unions, subqueries and more.

And at some places, bindings are even not allowed. You can't use bindings for ORDER BY, GROUP BY, HAVING and possibly more. So you need escaping functionality again and also complex logic where bindings are ok to use and where you have to escape values.

I would have implemented expressions returning bindings if I had seen a way to do it in an easy way without rewriting half the database core. Because that pull request would get closed because of too much complexity. Rightly so!

If we add support for bindings that would bring expressions into line with the rest of the query builder, whereas it would seem odd to add in a custom escaping solution prior to expressions even supporting basic bindings.

You are free to disagree. But I have plans for the expressions package and what it should support. Being limited again to only what bindings in the query builder can do will limit the use case massively. Expressions exist because the query builder is too limited in some cases. Not to do the same as the query builder. Because then we wouldn't need it.

@willrowe
Copy link
Contributor

But you need to support more than adding bindings for where. You also have to support expressions in selects, joins, unions, subqueries and more.

Yes, every place that supports expressions would need to be updated to also get any bindings as well, but that's not an impossible task and would make expression much more powerful. All of those places in the query builder already store their own bindings, so I'm not sure I understand why it would be as complex as you think. You had to update all those places in your last PR as well.

And at some places, bindings are even not allowed. You can't use bindings for ORDER BY or HAVING. So you need escaping functionality again and also complex logic where bindings are ok to use and where you have to escape values.
But I have plans for the expressions package and what it should support.

As I said, I agree there are some places that bindings would not work, but I'm not sure if that's something that Laravel should tackle. If you are doing this so that you can build a package, then maybe you could implement the escaping there first and prove out its stability and usefulness.

Expressions exist because the query builder is too limited in some cases. Not to do the same as the query builder. Because then we wouldn't need it.

Including support for bindings in expressions would not make them obsolete, just a bit easier to work with.

@tpetry
Copy link
Contributor Author

tpetry commented Apr 26, 2023

All of those places in the query builder already store their own bindings, so I'm not sure I understand why it would be as complex as you think. You had to update all those places in your last PR as well.

I updated just a few places that worked with bindings. There are a ton of places that have yet no idea what expressions are as they can just cast everything to a string. You underestimate the involved work. I am constantly changing things in the database core. Trust me. A ton of the methods of the query build would have to be rewritten.

As I said, I agree there are some places that bindings would not work, but I'm not sure if that's something that Laravel should tackle.

Sure, feel free to disagree. But I implemented it for a purpose. To get rid of many raw() calls that are currently not possible to replace. So expressions should do that. And also in places that you don't think are needed, but I think. Again, feel free to disagree. I don't care. Feel free to work on a different implementation.

If you are doing this so that you can build a package, then maybe you could implement the escaping there first and prove out its stability and usefulness.

I already do as much as possible without changing the Laravel core. At this point, it is not really possible to do this within a package.


You're all free to agree or disagree. And to work on a different implementation. This one already took 20h to research & implement. And it is an implementation the most likely to get included as it is not changing big things in the core and won't break anything. Not even third-party packages. Implementing custom binding-logic, as requested often, will need big changes and will most likely break many 3rd packages until they update their builder functions to also add special grammar-binding support. A way I don't want to go to make the adoption of this much easier.

@taylorotwell taylorotwell merged commit e953137 into laravel:10.x May 24, 2023
@taylorotwell
Copy link
Member

Thanks!

@tpetry
Copy link
Contributor Author

tpetry commented May 25, 2023

Awesome! Thanks Taylor, now I can work on the next extension to expressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants