Skip to content

Commit

Permalink
remove references to tox
Browse files Browse the repository at this point in the history
  • Loading branch information
benc-db committed Sep 14, 2023
1 parent d269f11 commit b10f624
Show file tree
Hide file tree
Showing 5 changed files with 57 additions and 38 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ jobs:
run: poetry install --no-interaction --no-root

- name: Run linting
run: poetry run nox -t lint
run: poetry run nox -t lint_check

unit:
name: unit test / python ${{ matrix.python-version }}
Expand Down
47 changes: 22 additions & 25 deletions CONTRIBUTING.MD
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ We happily welcome contributions to the `dbt-databricks` package. We use [GitHub
Contributions are licensed on a license-in/license-out basis.

## Communication

Before starting work on a major feature, please reach out to us via GitHub, Slack, email, etc. We will make sure no one else is already working on it and ask you to open a GitHub issue. A "major feature" is defined as any change that is > 100 LOC altered (not including tests), or changes any user-facing behavior.

We will use the GitHub issue to discuss the feature and come to agreement. This is to prevent your time being wasted, as well as ours. The GitHub review process for major features is also important so that organizations with commit access can come to agreement on design.
Expand All @@ -18,21 +19,20 @@ If it is appropriate to write a design document, the document must be hosted eit
1. [Run the unit tests](#unit-tests) (and the [integration tests](#functional--integration-tests) if you [can](#please-test-what-you-can))
1. [Sign your commits](#sign-your-work)
1. [Open a pull request](#pull-request-review-process)
- Answer the PR template questions as best as you can
- _Recommended: [Allow edits from Maintainers]_

- Answer the PR template questions as best as you can
- _Recommended: [Allow edits from Maintainers]_

## Pull request review process

dbt-databricks uses a **two-step review process** to merge PRs to `main`. We first squash the patch onto a staging branch so that we can securely run our full matrix of integration tests against a real Databricks workspace. Then we merge the staging branch to `main`.

> **Note:** When you create a pull request we recommend that you _[Allow Edits from Maintainers]_. This smooths our two-step process and also lets your reviewer easily commit minor fixes or changes.
A dbt-databricks maintainer will review your PR and may suggest changes for style and clarity, or they may request that you add unit or integration tests.
A dbt-databricks maintainer will review your PR and may suggest changes for style and clarity, or they may request that you add unit or integration tests.

Once your patch is approved, a maintainer will create a staging branch and either you or the maintainer (if you allowed edits from maintainers) will change the base branch of your PR to the staging branch. Then a maintainer will squash and merge the PR into the staging branch.

dbt-databricks uses staging branches to run our full matrix of functional and integration tests via Github Actions. This extra step is required for security because GH Action workflows that run on pull requests from forks can't access our testing Databricks workspace.
dbt-databricks uses staging branches to run our full matrix of functional and integration tests via Github Actions. This extra step is required for security because GH Action workflows that run on pull requests from forks can't access our testing Databricks workspace.

If the functional or integration tests fail as a result of your change, a maintainer will work with you to fix it _on your fork_ and then repeat this step.

Expand All @@ -46,19 +46,20 @@ See [docs/local-dev.md](docs/local-dev.md).

## Code Style

We follow [PEP 8](https://www.python.org/dev/peps/pep-0008/) with one exception: lines can be up to 100 characters in length, not 79. You can run [`tox` linter command](#linting) to automatically format the source code before committing your changes.
We follow [PEP 8](https://www.python.org/dev/peps/pep-0008/) with one exception: lines can be up to 100 characters in length, not 79. You can run the [`nox` lint command](#linting) to detect issues in your source code before committing your changes.

### Linting

This project uses [Black](https://pypi.org/project/black/), [flake8](https://flake8.pycqa.org/en/latest/), and [mypy](https://www.mypy-lang.org/) for linting and static type checks. Run all three with the `linter` command and commit before opening your pull request.
This project uses [Black](https://pypi.org/project/black/), [flake8](https://flake8.pycqa.org/en/latest/), and [mypy](https://www.mypy-lang.org/) for linting and static type checks. Run all three with the `lint` command and commit before opening your pull request.

```
tox -e linter
```zsh
poetry run nox -t lint
```

To simplify reviews you can commit any format changes in a separate commit.

## Sign your work

The sign-off is a simple line at the end of the explanation for the patch. Your signature certifies that you wrote the patch or otherwise have the right to pass it on as an open-source patch. The rules are pretty simple: if you can certify the below (from developercertificate.org):

```
Expand Down Expand Up @@ -110,40 +111,36 @@ Use your real name (sorry, no pseudonyms or anonymous contributions.)

If you set your `user.name` and `user.email` git configs, you can sign your commit automatically with `git commit -s`.


## Unit tests

Unit tests do not require a Databricks account. Please confirm that your pull request passes our unit test suite before opening a pull request.

```bash
tox -e unit
```zsh
poetry run nox -s unit
```

## Functional & Integration Tests

Functional and integration tests require a Databricks account with access to a workspace containing four compute resources. These four comprise a matrix of multi-purpose cluster vs SQL warehouse with and without Unity Catalog enabled. The `tox` commands to run each set of these tests appear below:
Functional and integration tests require a Databricks account with access to a workspace containing four compute resources. These four comprise a matrix of multi-purpose cluster vs SQL warehouse with and without Unity Catalog enabled. The `nox` commands to run each set of these tests appear below:

|Compute Type |Unity Catalog |Command|
|-|-|-|
|SQL warehouse| Yes | `tox -e integration-databricks-uc-sql-endpoint` |
|SQL warehouse| No | `tox -e integration-databricks-sql-endpoint` |
|Multi-purpose| Yes | `tox -e integration-databricks-uc-cluster` |
|Multi-Purpose| No | `tox -e integration-databricks-cluster` |
| Compute Type | Unity Catalog | Command |
| ------------- | ------------- | ----------------------------------- |
| SQL warehouse | Yes | `poetry run nox -t uc_sql_endpoint` |
| Multi-purpose | Yes | `poetry run nox -t uc_cluster` |
| Multi-Purpose | No | `poetry run nox -t cluster` |

These tests are configured with environment variables that `tox` reads from a file called [test.env](/test.env.example) which you can copy from the example:
These tests are configured with environment variables that `pytest` reads from a file called [test.env](/test.env.example) which you can copy from the example:

```sh
cp test.env.example test.env
```

Update `test.env` with the relevant HTTP paths and tokens.

Update `test.env` with the relevant HTTP paths and tokens.

### Please test what you can

We understand that not every contributor will have all four types of compute resources in their Databricks workspace. For this reason, once a change has been reviewed and merged into a staging branch, we will run the full matrix of tests against our testing workspace at our expense (see the [pull request review process](#pull-request-review-process) for more detail).

That said, we ask that you include integration tests where relevant and that you indicate in your pull request description the environment type(s) you tested the change against.



[Allow Edits from Maintainers]: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork
[Allow Edits from Maintainers]: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ The `dbt-databricks` adapter contains all of the code enabling dbt to work with
- **Performance**. The adapter generates SQL expressions that are automatically accelerated by the native, vectorized [Photon](https://databricks.com/product/photon) execution engine.

## Choosing between dbt-databricks and dbt-spark

If you are developing a dbt project on Databricks, we recommend using `dbt-databricks` for the reasons noted above.

`dbt-spark` is an actively developed adapter which works with Databricks as well as Apache Spark anywhere it is hosted e.g. on AWS EMR.
Expand All @@ -34,11 +35,13 @@ If you are developing a dbt project on Databricks, we recommend using `dbt-datab
### Installation

Install using pip:

```nofmt
pip install dbt-databricks
```

Upgrade to the latest version

```nofmt
pip install --upgrade dbt-databricks
```
Expand All @@ -61,6 +64,7 @@ your_profile_name:
### Quick Starts

These following quick starts will get you up and running with the `dbt-databricks` adapter:

- [Developing your first dbt project](https://github.com/databricks/dbt-databricks/blob/main/docs/local-dev.md)
- Using dbt Cloud with Databricks ([Azure](https://docs.microsoft.com/en-us/azure/databricks/integrations/prep/dbt-cloud) | [AWS](https://docs.databricks.com/integrations/prep/dbt-cloud.html))
- [Running dbt production jobs on Databricks Workflows](https://github.com/databricks/dbt-databricks/blob/main/docs/databricks-workflows.md)
Expand All @@ -73,11 +77,13 @@ These following quick starts will get you up and running with the `dbt-databrick

The `dbt-databricks` adapter has been tested:

- with Python 3.7 or above.
- with Python 3.8 or above.
- against `Databricks SQL` and `Databricks runtime releases 9.1 LTS` and later.

### Tips and Tricks

## Choosing compute for a Python model

You can override the compute used for a specific Python model by setting the `http_path` property in model configuration. This can be useful if, for example, you want to run a Python model on an All Purpose cluster, while running SQL models on a SQL Warehouse. Note that this capability is only available for Python models.

```
Expand Down
30 changes: 23 additions & 7 deletions docs/local-dev.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,55 @@
# Local development with dbt-databricks

This page describes how to develop a dbt project on your computer using `dbt-databricks`. We will create an empty dbt project with information on how to connect to Databricks. We will then run our first dbt models.

## Prerequisites

- Access to a Databricks workspace
- Ability to create a Personal Access Token (PAT)
- Python 3.8+
- dbt-core v1.1.0+
- dbt-databricks v1.1.0+
- [Poetry 1.6+](https://python-poetry.org/docs/)

To install the project and all its dependencies run

```
poetry install
```

from within the project directory.

## Prepare to connect

## Prepare to connect
### Collect connection information

Before you scaffold a new dbt project, you have to collect some information which dbt will use to connect to Databricks. Where you find this information depends on whether you are using Databricks Clusters or Databricks SQL endpoints. We recommend that you develop dbt models against Databricks SQL endpoints as they provide the latest SQL features and optimizations.

#### Databricks SQL endpoints
1. Log in to your Databricks workspace

1. Log in to your Databricks workspace
2. Click the _SQL_ persona in the left navigation bar to switch to Databricks SQL
3. Click _SQL Endpoints_
4. Choose the SQL endpoint you want to connect to
5. Click _Connection details_
6. Copy the value of _Server hostname_. This will be the value of `host` when you scaffold a dbt project.
7. Copy the value of _HTTP path_. This will be the value of `http_path` when you scaffold a dbt project.
7. Copy the value of _HTTP path_. This will be the value of `http_path` when you scaffold a dbt project.

![image](/docs/img/sql-endpoint-connection-details.png "SQL endpoint connection details")

#### Databricks Clusters
1. Log in to your Databricks workspace

1. Log in to your Databricks workspace
2. Click the _Data Science & Engineering_ persona in the left navigation bar
3. Click _Compute_
4. Click on the cluster you want to connect to
5. Near the bottom of the page, click _Advanced options_
6. Scroll down some more and click _JDBC/ODBC_
7. Copy the value of _Server Hostname_. This will be the value of `host` when you scaffold a dbt project.
7. Copy the value of _HTTP Path_. This will be the value of `http_path` when you scaffold a dbt project.
8. Copy the value of _HTTP Path_. This will be the value of `http_path` when you scaffold a dbt project.

![image](/docs/img/cluster-connection-details.png "SQL endpoint connection details")

## Scaffold a new dbt project

Now, we are ready to scaffold a new dbt project. Switch to your terminal and type:

```nofmt
Expand Down Expand Up @@ -63,6 +77,7 @@ In `schema`, enter `databricks_demo`, which is the schema you created earlier.
Leave threads at `1` for now.

## Test connection

You are now ready to test the connection to Databricks. In the terminal, enter the following command:

```nofmt
Expand All @@ -72,6 +87,7 @@ dbt debug
If all goes well, you will see a successful connection. If you cannot connect to Databricks, double-check the PAT and update it accordingly in `~/.dbt/profiles.yml`.

## Run your first models

At this point, you simply run the demo models in the `models/example` directory. In your terminal, type:

```nofmt
Expand Down
8 changes: 4 additions & 4 deletions noxfile.py
Original file line number Diff line number Diff line change
@@ -1,27 +1,27 @@
from nox_poetry import session


@session(tags=["lint"])
@session(tags=["lint", "lint_check"])
def black(session):
session.install("black")
session.run("black", "--check", "dbt", "tests")


@session
@session(tags=["lint"])
def black_fix(session):
session.install("black")
session.run("black", "dbt", "tests")


@session(tags=["lint"])
@session(tags=["lint", "lint_check"])
def flake8(session):
session.install("flake8")
session.run(
"flake8", "--select=E,W,F", "--ignore=E203,W503", "--max-line-length=100", "dbt", "tests"
)


@session(tags=["lint"])
@session(tags=["lint", "lint_check"])
def mypy(session):
session.run_always("poetry", "install", external=True)
session.run("mypy", "--explicit-package-bases", "dbt", "tests")
Expand Down

0 comments on commit b10f624

Please sign in to comment.