-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(pyspark): set catalog and database with USE
instead of pyspark api
#9620
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard approve.
Is there any chance of us testing this?
let me look at what's involved with setting up a separate external catalog, because I would like better testing around this stuff. |
Thanks for the PR @gforsyth . I've just tested this and have the following error: ParseException: [[PARSE_SYNTAX_ERROR](https://docs.microsoft.com/azure/databricks/error-messages/error-classes#parse_syntax_error)] Syntax error at or near 'USE': extra input 'USE'.(line 2, pos 0)
== SQL ==
USE CATALOG hive_metastore;
USE SCHEMA default; maybe they need to be executed as separate statements? 135 catalog = table_loc.catalog or None
136 database = table_loc.db or None
--> 138 table_schema = self.get_schema(name, catalog=catalog, database=database)
139 return ops.DatabaseTable(
140 name,
141 schema=table_schema,
142 source=self,
143 namespace=ops.Namespace(catalog=catalog, database=database),
144 ).to_expr() |
Ahh, interesting, I bet you're right. Let me try that. Thanks for testing it out!! |
668f20a
to
f4db01b
Compare
Hey @jstammers -- can you give this PR another shot when you have a chance? Many thanks! |
That seems to work great for me. Thanks for resolving this so quickly! |
We should probably have at least one test for this if possible. Is creating another catalog a huge pain? |
I agree. I'm trying to sift through some docs to get that working. |
f4db01b
to
426c361
Compare
The unity catalog on Databricks has weird permissioning where it doesn't allow (at least one) user to run `setCurrentCatalog` or `setCurrentDatabase`. It _does_ allow setting those via `USE CATALOG mycat;` and `USE DATABASE mydb;`. This, however, is not part of standard Spark SQL and is special Databricks Spark SQL. SO, we try the special databricks SQL, if that throws a parser error, we fall back to using the pyspark API methods.
If you call `setCurrentCatalog`, Spark will default to using the `default` database in that catalog. So we need to note which database was being used so we can switch back to it correctly.
426c361
to
30020cd
Compare
Ok, this can probably be cleaned up at some point, but I think it's all working now. |
try: | ||
from pyspark.errors import ParseException as PySparkParseException | ||
except ImportError: | ||
from pyspark.sql.utils import ParseException as PySparkParseException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description of changes
As reported in #9604, there is apparently a mismatch in permissioning structure
on Databricks, so that you can set a catalog and database (schema, ugh) with sql
by running
USE CATALOG
, but not via the pyspark api calls that (ostensibly) dothe same thing.
This, however, is not part of standard Spark SQL and is special
Databricks Spark SQL.
SO, we try the special databricks SQL, if that throws a parser error, we
fall back to using the pyspark API methods.
Issues closed
Resolves #9604