Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[9.x] Fix postgres reference parsing #35530

Merged
merged 4 commits into from
Dec 10, 2020

Conversation

cbj4074
Copy link
Contributor

@cbj4074 cbj4074 commented Dec 8, 2020

Preface

I'm submitting this to master because it relies on #35463 , which was also sent to master due to breaking changes.

Regarding tests, there are no existing tests that cover this functionality. While I can write tests to cover what I changed and added, it'll take me a while, and given the significance of these issues for the small number of users that they affect, my recommendation is to merge this, even without new tests, because the current implementation should be fixed with urgency. I'm willing to commit to writing the tests when I have more time, and propose them in a separate PR.

(UPDATE: I added tests.)

I should mention also that I inadvertently introduced a slight regression (undefined variable warning) on

if (is_array($searchPath = $this->connection->getConfig('search_path'))) {
in the part 1 of 2 PR, which I fixed properly in this PR. (Rolling back that method wouldn't be a "fix", either, because its logic is not quite correct to begin with.)

Substance

This PR is essentially "part 2 of 2" relative to the above-mentioned PR, in that it "normalizes" the search_path parsing behavior such that the new parseSearchPath() method returns the same result whether the search_path input is an array (with one or more schemas), a string with one schema, or a string of comma-separated schemas. This method's presence makes it much simpler to retrieve the search_path as configured on the connection and use it for more complex schema-related grammar construction.

This PR also fixes several issues with the parseSchemaAndTable() method.

If the database object reference is unqualified (i.e., the $table string does not contain a schema + . prefix), the existing logic is reasonably sound. But if the reference is qualified, there are a few issues that arise.

Assuming the input $table is 'public.users':

  1. If the search_path is the default, 'public' (or if it is null, in which case 'public' is used), the following query is executed, which will never return a result, because the public. portion is never removed from the table name:
select * from information_schema.tables where table_schema = 'public' and table_name = 'public.users' and table_type = 'BASE TABLE'

For this logic to be sound, if the schema is found in the table reference, it should be removed, e.g.:

select * from information_schema.tables where table_schema = 'public' and table_name = 'users' and table_type = 'BASE TABLE'

And furthermore, PostgreSQL table names cannot contain periods, so there's just no reason for a period ever to be included in the table name binding here.

But there's still another issue, which is that the database name is not considered, neither in this method nor in the SQL grammar that is used to construct the queries for hasTable($table) and getColumnListing($table).

This omission is problematic because it causes these methods to return inaccurate results in some cases. For example, failing to qualify these queries with a database name (in the table_catalog column on the WHERE clause) could cause hasTable($table) to return true just because another/different database has a schema and table combination with the same names.

Seeking Feedback...

I don't like the fact that I added identical parseSearchPath($searchPath) methods to both the PostgresConnetor and the PostgresBuilder classes. Is there a better location for this method that both of these classes can leverage without adding new dependencies? This method is essentially a "utility function" that ingests any reasonably-formatted search_path (list of schemas) from the connection's configuration and returns a neatly-formatted array of schemas that can be queried or implode()d, for example.

Also, are the comments in the parseSchemaAndTable() method helpful? Sure, they seem self-evident, but the underlying logic here is deceptively complicated, due to the complexities inherent to PostgreSQL's search_path related nuances, so I thought some explicit commentary might be warranted.

Finally, that method should perhaps be renamed to parseDatabaseSchemaAndTable() or parseDatabaseObjectReference(), or something that better reflects what it actually does. I say this specifically because the method now supports a database part. For example, if the $table input is homestead.laravel.users, the homestead portion is actually used and passed to the underlying information_schema query (and if it's not passed, the database configuration parameter on the connection is used). So, technically, the method is parsing a full database object reference, and not a "schema and a table", per se.

Primarily, these changes "normalize" the search_path parsing behavior such that the new parseSearchPath() method returns the same result whether the search_path input is an array (with one or more schemas), a string with one schema, or a string of comma-separated schemas.

This method's presence makes it much simpler to retrieve the search_path as configured on the connection and use it for more complex schema-related grammar construction.
The manner in which object references were parsed in certain scenarios caused methods such as hasTable() to return incorrect results.

Among other issues, the underlying SQL grammar omitted the "table_catalog" (i.e., database) in the WHERE clause, which caused inaccurate results in certain cases, e.g., the method returned true incorrectly because a schema and table with the same name exist in a *different database*.
@driesvints driesvints changed the title Fix postgres reference parsing [9.x] Fix postgres reference parsing Dec 8, 2020
@driesvints
Copy link
Member

Just FYI: Taylor doesn't reviews draft PRs so once you're ready with this feel free to mark it as ready for review.

@cbj4074
Copy link
Contributor Author

cbj4074 commented Dec 8, 2020

@driesvints Thanks!

In the meantime, is your inclination that this will be closed unless I add tests to cover anything I changed or added, even though I'm only aiming to fix code that was effectively broken and with no existing test coverage, and not add any new features?

I suppose what I'm asking is, should I add tests before I mark this as ready for review? :)

@driesvints
Copy link
Member

Adding tests is always a good idea yeah.

@cbj4074 cbj4074 marked this pull request as ready for review December 9, 2020 16:16
@cbj4074
Copy link
Contributor Author

cbj4074 commented Dec 9, 2020

Please note that if merged, laravel/ideas#918 is satisfied and complete in my estimation.

@taylorotwell
Copy link
Member

Does this PR introduce any additional breaking changes?

@cbj4074
Copy link
Contributor Author

cbj4074 commented Dec 9, 2020

@taylorotwell No, I don't believe that this PR introduces any additional breaking changes.

@taylorotwell
Copy link
Member

taylorotwell commented Dec 9, 2020

So just on my cursory review, it looks like if I did a query like DB::select('select from foo.users'); that would look for stuff in the "foo" schema and not the "foo" database? Is that how it works on Laravel 8 as well?

@cbj4074

@cbj4074
Copy link
Contributor Author

cbj4074 commented Dec 9, 2020

@taylorotwell Yes, that's correct, and yes, that's how PostgreSQL has always behaved in Laravel, and that is the correct and expected behavior. And just to be clear, this PR doesn't change that behavior.

Edit to add: To clarify this point, in PostgreSQL, tables reside within a schema, not a database, so one cannot query a database directly. When selecting records, one is always querying tables within a schema (not a database). That said, the effective user might have access to several different databases, each of which could have several different schemas, which is why the ability to qualify hasTable($table) and similar on a given database and schema is crucial, and that's what this PR seeks to address.

As a point of comparison, MySQL has no concept of "schema", and its references are qualified with database.table, whereas PostgreSQL essentially has the "schema" sandwiched in between, e.g., database.schema.table.

In both databases, the qualifier (everything before the table name) is optional. In MySQL, when the database part is omitted, MySQL uses the database defined on the connection. In PostgreSQL, the same is true, but if the schema part is omitted, PostgreSQL consults the search_path, and tries to resolve the reference in every schema listed therein (checking each one, in the order specified). That's just a bit of background.

Primarily, this PR fixes how the framework determines what qualified reference to use when querying PostgreSQL's internal information_schema table. If you look at the sample queries in my OP, you can see that the information_schema requires all three parts to the qualifier, so if not all of them are known/passed into that parseSchemaAndTable($table) method, the values must be inferred by some sensible criteria.

Just let me know if this doesn't answer your question fully, or if you have any others, and I'll be happy to respond.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants