Skip to content

Commit

Permalink
Upgrade to Spark 3.1.1 with testing (#349)
Browse files Browse the repository at this point in the history
* Testing Spark3 upgrade.WIP

* Skip tests.WIP

* update readme and setup for pyspark.WIP

* Fix circle ci version and bump mem value

* Bump memory, fix nit, bump pyhive version

* Pyhive version change

* enabled sasl for metastore

* Explicit server2 host port

* Try showing debug-level logs

* Rm -n4

* move to godatadriven lates spark image

* restore to 2 to check output

* Restore debug and parallelized to check output

* Revert to 3.0

* Revert to normal state

* open source spark image

* Change to pyspark image

* Testing with gdd spark 3.0 for thrift

* Switch back to dbt user pass

* Spark 3.1.1 gdd image without configs

* Clean up

* Skip session test

* Clean up for review

* Update to CHANGELOG

Co-authored-by: Jeremy Cohen <[email protected]>
  • Loading branch information
nssalian and jtcohen6 authored Jun 28, 2022
1 parent 120ec42 commit 0082e73
Show file tree
Hide file tree
Showing 6 changed files with 10 additions and 23 deletions.
19 changes: 1 addition & 18 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,29 +33,12 @@ jobs:
DBT_INVOCATION_ENV: circle
docker:
- image: fishtownanalytics/test-container:10
- image: godatadriven/spark:2
- image: godatadriven/spark:3.1.1
environment:
WAIT_FOR: localhost:5432
command: >
--class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
--name Thrift JDBC/ODBC Server
--conf spark.hadoop.javax.jdo.option.ConnectionURL=jdbc:postgresql://localhost/metastore
--conf spark.hadoop.javax.jdo.option.ConnectionUserName=dbt
--conf spark.hadoop.javax.jdo.option.ConnectionPassword=dbt
--conf spark.hadoop.javax.jdo.option.ConnectionDriverName=org.postgresql.Driver
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer
--conf spark.jars.packages=org.apache.hudi:hudi-spark-bundle_2.11:0.9.0
--conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension
--conf spark.driver.userClassPathFirst=true
--conf spark.hadoop.datanucleus.autoCreateTables=true
--conf spark.hadoop.datanucleus.schema.autoCreateTables=true
--conf spark.hadoop.datanucleus.fixedDatastore=false
--conf spark.sql.hive.convertMetastoreParquet=false
--hiveconf hoodie.datasource.hive_sync.use_jdbc=false
--hiveconf hoodie.datasource.hive_sync.mode=hms
--hiveconf datanucleus.schema.autoCreateAll=true
--hiveconf hive.metastore.schema.verification=false
- image: postgres:9.6.17-alpine
environment:
POSTGRES_USER: dbt
Expand Down
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
### Features
- Add session connection method ([#272](https://github.com/dbt-labs/dbt-spark/issues/272), [#279](https://github.com/dbt-labs/dbt-spark/pull/279))
- rename file to match reference to dbt-core ([#344](https://github.com/dbt-labs/dbt-spark/pull/344))
- Upgrade Spark version to 3.1.1 ([#348](https://github.com/dbt-labs/dbt-spark/issues/348), [#349](https://github.com/dbt-labs/dbt-spark/pull/349))

### Under the hood
- Add precommit tooling to this repo ([#356](https://github.com/dbt-labs/dbt-spark/pull/356))
Expand All @@ -29,6 +30,7 @@
### Contributors
- [@JCZuurmond](https://github.com/dbt-labs/dbt-spark/pull/279) ( [#279](https://github.com/dbt-labs/dbt-spark/pull/279))
- [@ueshin](https://github.com/ueshin) ([#320](https://github.com/dbt-labs/dbt-spark/pull/320))
- [@nssalian](https://github.com/nssalian) ([#349](https://github.com/dbt-labs/dbt-spark/pull/349))

## dbt-spark 1.1.0b1 (March 23, 2022)

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ more information, consult [the docs](https://docs.getdbt.com/docs/profile-spark)

## Running locally
A `docker-compose` environment starts a Spark Thrift server and a Postgres database as a Hive Metastore backend.
Note that this is spark 2 not spark 3 so some functionalities might not be available.
Note: dbt-spark now supports Spark 3.1.1 (formerly on Spark 2.x).

The following command would start two docker containers
```
Expand Down
4 changes: 2 additions & 2 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
version: "3.7"
services:

dbt-spark2-thrift:
image: godatadriven/spark:3.0
dbt-spark3-thrift:
image: godatadriven/spark:3.1.1
ports:
- "10000:10000"
- "4040:4040"
Expand Down
4 changes: 3 additions & 1 deletion docker/spark-defaults.conf
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
spark.driver.memory 2g
spark.executor.memory 2g
spark.hadoop.datanucleus.autoCreateTables true
spark.hadoop.datanucleus.schema.autoCreateTables true
spark.hadoop.datanucleus.fixedDatastore false
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.jars.packages org.apache.hudi:hudi-spark3-bundle_2.12:0.9.0
spark.jars.packages org.apache.hudi:hudi-spark3-bundle_2.12:0.10.0
spark.sql.extensions org.apache.spark.sql.hudi.HoodieSparkSessionExtension
spark.driver.userClassPathFirst true
2 changes: 1 addition & 1 deletion tests/functional/adapter/test_basic.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,4 +82,4 @@ def project_config_update(self):

@pytest.mark.skip_profile('spark_session')
class TestBaseAdapterMethod(BaseAdapterMethod):
pass
pass

0 comments on commit 0082e73

Please sign in to comment.