Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[columnar] fix create table as parallelism bug #138

Merged
merged 2 commits into from
Aug 25, 2023
Merged

Conversation

JerrySievert
Copy link
Contributor

This fixes #50.

This is not the solution that I wanted to write, but it is a solution and it works.

Access to a hook that gives us the plan for the CREATE TABLE portion of a CREATE TABLE foo AS statement is available in the utility hook, but that hook is fired after the table is created and the insert is finished. A planner hook has access to the SELECT statement, but does not show the CREATE TABLE portion of the plan.

I searched for an alternate method of getting access at the correct time to mutate the plan, but after posting on the postgres slack, discord, and pgsql-hackers mailing list, there was not an answer. If that somehow changes in the future, I will happily revert this for a much better solution.

For now, this is what it does:

  • creates a planner hook
  • if the query is of type CMD_SELECT it creates a lowercase copy of the query string
  • it then searches the lower case query string for create, table, and as in that order
  • if all are found, then it sets the parallelism to 0 (no parallel) and returns the plan

it is possible to trigger this with a query like: SELECT "create", "table", "as", FROM foo or some other permutation of this, but the likelihood of collision is fairly low.

as noted, it's not a perfect fix, but it at least fixes the issue.

@JerrySievert JerrySievert added the bug Something isn't working label Aug 24, 2023
@JerrySievert JerrySievert self-assigned this Aug 24, 2023
Copy link
Member

@wuputah wuputah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, parsing the query string is certainly not ideal.

I believe this bug can occur on a insert into ... select statement as well - though not documented in #50, that may be a more common issue than create table ... as select.

@JerrySievert
Copy link
Contributor Author

unfortunately, I've found no other path to fixing this. if it also fails on an insert then we can blacklist that as well.

as I noted, I'm happy to fix this in a correct way if ever there is a correct way, but as of now, there doesn't appear to be a way to connect them, as even getting the queryId of the create statement occurs after the table is created. there are only a couple of hooks that allow for plan mutation or even access to the Query * itself, which is unfortunate.

until someone comes forward with a solution that I've not found (I've combed through postgres code repeatedly), I'm not convinced there's another solution.

that said, if you can create a test case that still fails with this patch, I'm happy to include it, at least until we can toss this code out with a better solution.

@JerrySievert JerrySievert force-pushed the parallelism_bug branch 5 times, most recently from fe3a10c to c5630b3 Compare August 25, 2023 00:37
@JerrySievert
Copy link
Contributor Author

adding that it does not fail on insert, nothing to do with these code changes.

@mkaruza
Copy link
Contributor

mkaruza commented Aug 25, 2023

This issue related to CommandCounterIncrement function that is used when we write heap metadata tables. Maybe solution to fix this can be addressed there. If not, although not prettiest solution, should be good enough to mitigate this problem.

@JerrySievert
Copy link
Contributor Author

agreed that the problem should be addressed around CommandCounterIncrement and was one of my earliest attempts. unfortunately, it still gets called later, and only gets called during the parallel select.

@wuputah wuputah self-requested a review August 25, 2023 16:05
Copy link
Member

@wuputah wuputah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

until such time as a better solution present itself, I'm OK with shipping this; we can always change it later.

@mkaruza mkaruza self-requested a review August 25, 2023 17:15
@JerrySievert JerrySievert merged commit 0151073 into main Aug 25, 2023
@JerrySievert JerrySievert deleted the parallelism_bug branch August 25, 2023 17:28
@wuputah wuputah added this to the 1.0.0-rc milestone Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallelism can block creation of columnar tables or views
3 participants