Skip to content

Commit

Permalink
Feature/add listagg macro (#530)
Browse files Browse the repository at this point in the history
* Update README.md

* Mutually excl range examples in disclosure triangle

* Fix union_relations error when no include/exclude provided

* Fix union_relations error when no include/exclude provided (#509)

* Update CHANGELOG.md

* Add to_condition to relationships where

* very minor nit - update "an new" to "a new" (#519)

* add quoting to split_part (#528)

* add quoting to split_part

* update docs for split_part

* typo

* corrected readme syntax

* revert and update to just documentation

* add new line

* Update README.md

* Update README.md

* Update README.md

Co-authored-by: Joel Labes <[email protected]>

* add macro to get columns (#516)

* add macro to get columns

* star macro should use get_columns

* add adapter.

* swap adapter for dbt_utils

Co-authored-by: Joel Labes <[email protected]>

* update documentation

* add output_lower arg

* update name to get_filtered_columns_in_relation from get_columns

* add tests

* forgot args

* too much whitespace removal

    -----------
    Actual:
    -----------
    --->"field_3"as "test_field_3"<---

    -----------
    Expected:
    -----------
    --->"field_3" as "test_field_3"<---

* didnt mean to move a file that i did not create. moving things back.

* remove lowercase logic

* limit_zero

Co-authored-by: Joel Labes <[email protected]>

* Add listagg macro and integration test

* remove type in listagg macro

* updated integration test

* Add redshift to listagg macro

* remove redshift listagg

* explicitly named group by column

* updated default values

* Updated example to use correct double vs. single quotes

* whitespace control

* Added redshift specific macro

* Remove documentation

* Update integration test so less likely to accidentally work

Co-authored-by: Joel Labes <[email protected]>

* default everything but measure to none

* added limit functionality for other dbs

* syntax bug for postgres

* update redshift macro

* fixed block def control

* Fixed bug in redshift

* Bug fix redshift

* remove unused group_by arg

* Added additional test without order by col

* updated to regex replace

* typo

* added more integration_tests

* attempt to make redshift less complicated

* typo

* update redshift

* replace to substr

* More explicit versions with added complexity

* handle special characters

Co-authored-by: Joel Labes <[email protected]>
Co-authored-by: Jamie Rosenberg <[email protected]>
Co-authored-by: Pat Kearns <[email protected]>
  • Loading branch information
4 people authored Apr 6, 2022
1 parent 31577cb commit 1a517d2
Show file tree
Hide file tree
Showing 15 changed files with 358 additions and 20 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@

# dbt-utils v0.8.3
## New features
- A macro for deduplicating data ([#335](https://github.com/dbt-labs/dbt-utils/issues/335), [#512](https://github.com/dbt-labs/dbt-utils/pull/512))
Expand Down
61 changes: 54 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ For compatibility details between versions of dbt-core and dbt-utils, [see this

- [Introspective macros](#introspective-macros):
- [get_column_values](#get_column_values-source)
- [get_filtered_columns_in_relation](#get_filtered_columns_in_relation-source)
- [get_relations_by_pattern](#get_relations_by_pattern-source)
- [get_relations_by_prefix](#get_relations_by_prefix-source)
- [get_query_results_as_dict](#get_query_results_as_dict-source)
Expand Down Expand Up @@ -59,6 +60,7 @@ For compatibility details between versions of dbt-core and dbt-utils, [see this
- [split_part](#split_part-source)
- [last_day](#last_day-source)
- [width_bucket](#width_bucket-source)
- [listagg](#listagg)

- [Jinja Helpers](#jinja-helpers)
- [pretty_time](#pretty_time-source)
Expand All @@ -69,11 +71,11 @@ For compatibility details between versions of dbt-core and dbt-utils, [see this
- [insert_by_period](#insert_by_period-source)

----
=======
### Generic Tests
#### equal_rowcount ([source](macros/generic_tests/equal_rowcount.sql))
Asserts that two relations have the same number of rows.


**Usage:**
```yaml
version: 2
Expand Down Expand Up @@ -387,7 +389,6 @@ models:
```
<details>
<summary>Additional `gaps` and `zero_length_range_allowed` examples</summary>

**Understanding the `gaps` argument:**

Here are a number of examples for each allowed `gaps` argument.
Expand Down Expand Up @@ -435,7 +436,6 @@ models:
| 0 | 1 |
| 2 | 2 |
| 3 | 4 |

</details>

#### sequential_values ([source](macros/generic_tests/sequential_values.sql))
Expand Down Expand Up @@ -551,7 +551,7 @@ These macros run a query and return the results of the query as objects. They ar
#### get_column_values ([source](macros/sql/get_column_values.sql))
This macro returns the unique values for a column in a given [relation](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation) as an array.

Arguments:
**Args:**
- `table` (required): a [Relation](https://docs.getdbt.com/reference/dbt-classes#relation) (a `ref` or `source`) that contains the list of columns you wish to select from
- `column` (required): The name of the column you wish to find the column values of
- `order_by` (optional, default=`'count(*) desc'`): How the results should be ordered. The default is to order by `count(*) desc`, i.e. decreasing frequency. Setting this as `'my_column'` will sort alphabetically, while `'min(created_at)'` will sort by when thevalue was first observed.
Expand Down Expand Up @@ -592,6 +592,28 @@ Arguments:
...
```

#### get_filtered_columns_in_relation ([source](macros/sql/get_filtered_columns_in_relation.sql))
This macro returns an iterable Jinja list of columns for a given [relation](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation), (i.e. not from a CTE)
- optionally exclude columns
- the input values are not case-sensitive (input uppercase or lowercase and it will work!)
> Note: The native [`adapter.get_columns_in_relation` macro](https://docs.getdbt.com/reference/dbt-jinja-functions/adapter#get_columns_in_relation) allows you
to pull column names in a non-filtered fashion, also bringing along with it other (potentially unwanted) information, such as dtype, char_size, numeric_precision, etc.

**Args:**
- `from` (required): a [Relation](https://docs.getdbt.com/reference/dbt-classes#relation) (a `ref` or `source`) that contains the list of columns you wish to select from
- `except` (optional, default=`[]`): The name of the columns you wish to exclude. (case-insensitive)

**Usage:**
```sql
-- Returns a list of the columns from a relation, so you can then iterate in a for loop
{% set column_names = dbt_utils.get_filtered_columns_in_relation(from=ref('your_model'), except=["field_1", "field_2"]) %}
...
{% for column_name in column_names %}
max({{ column_name }}) ... as max_'{{ column_name }}',
{% endfor %}
...
```

#### get_relations_by_pattern ([source](macros/sql/get_relations_by_pattern.sql))
Returns a list of [Relations](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation)
that match a given schema- or table-name pattern.
Expand Down Expand Up @@ -770,9 +792,20 @@ group by 1,2,3
```

#### star ([source](macros/sql/star.sql))
This macro generates a comma-separated list of all fields that exist in the `from` relation, excluding any fields listed in the `except` argument. The construction is identical to `select * from {{ref('my_model')}}`, replacing star (`*`) with the star macro. This macro also has an optional `relation_alias` argument that will prefix all generated fields with an alias (`relation_alias`.`field_name`).
This macro generates a comma-separated list of all fields that exist in the `from` relation, excluding any fields
listed in the `except` argument. The construction is identical to `select * from {{ref('my_model')}}`, replacing star (`*`) with
the star macro.
This macro also has an optional `relation_alias` argument that will prefix all generated fields with an alias (`relation_alias`.`field_name`).
The macro also has optional `prefix` and `suffix` arguments. When one or both are provided, they will be concatenated onto each field's alias
in the output (`prefix` ~ `field_name` ~ `suffix`). NB: This prevents the output from being used in any context other than a select statement.


The macro also has optional `prefix` and `suffix` arguments. When one or both are provided, they will be concatenated onto each field's alias in the output (`prefix` ~ `field_name` ~ `suffix`). NB: This prevents the output from being used in any context other than a select statement.
**Args:**
- `from` (required): a [Relation](https://docs.getdbt.com/reference/dbt-classes#relation) (a `ref` or `source`) that contains the list of columns you wish to select from
- `except` (optional, default=`[]`): The name of the columns you wish to exclude. (case-insensitive)
- `relation_alias` (optional, default=`''`): will prefix all generated fields with an alias (`relation_alias`.`field_name`).
- `prefix` (optional, default=`''`): will prefix the output `field_name` (`field_name as prefix_field_name`).
- `suffix` (optional, default=`''`): will suffix the output `field_name` (`field_name as field_name_suffix`).

**Usage:**
```sql
Expand All @@ -789,6 +822,13 @@ from {{ ref('my_model') }}

```

```sql
select
{{ dbt_utils.star(from=ref('my_model'), except=["exclude_field_1", "exclude_field_2"], prefix="max_") }}
from {{ ref('my_model') }}

```

#### union_relations ([source](macros/sql/union.sql))

This macro unions together an array of [Relations](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation),
Expand Down Expand Up @@ -987,9 +1027,16 @@ This macro calculates the difference between two dates.
#### split_part ([source](macros/cross_db_utils/split_part.sql))
This macro splits a string of text using the supplied delimiter and returns the supplied part number (1-indexed).

**Args**:
- `string_text` (required): Text to be split into parts.
- `delimiter_text` (required): Text representing the delimiter to split by.
- `part_number` (required): Requested part of the split (1-based). If the value is negative, the parts are counted backward from the end of the string.

**Usage:**
When referencing a column, use one pair of quotes. When referencing a string, use single quotes enclosed in double quotes.
```
{{ dbt_utils.split_part(string_text='1,2,3', delimiter_text=',', part_number=1) }}
{{ dbt_utils.split_part(string_text='column_to_split', delimiter_text='delimiter_column', part_number=1) }}
{{ dbt_utils.split_part(string_text="'1|2|3'", delimiter_text="'|'", part_number=1) }}
```

#### date_trunc ([source](macros/cross_db_utils/date_trunc.sql))
Expand Down
10 changes: 10 additions & 0 deletions integration_tests/data/cross_db/data_listagg.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
group_col,string_text,order_col
1,a,1
1,b,2
1,c,3
2,a,2
2,1,1
2,p,3
3,g,1
3,g,2
3,g,3
10 changes: 10 additions & 0 deletions integration_tests/data/cross_db/data_listagg_output.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
group_col,expected,version
1,"a_|_b_|_c",bottom_ordered
2,"1_|_a_|_p",bottom_ordered
3,"g_|_g_|_g",bottom_ordered
1,"a_|_b",bottom_ordered_limited
2,"1_|_a",bottom_ordered_limited
3,"g_|_g",bottom_ordered_limited
3,"g, g, g",comma_whitespace_unordered
3,"g",distinct_comma
3,"g,g,g",no_params
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
field_1,field_2,field_3
a,b,c
d,e,f
g,h,i
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
field_2,field_3
h,i
32 changes: 32 additions & 0 deletions integration_tests/macros/assert_equal_values.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{% macro assert_equal_values(actual_object, expected_object) %}
{% if not execute %}

{# pass #}

{% elif actual_object != expected_object %}

{% set msg %}
Expected did not match actual

-----------
Actual:
-----------
--->{{ actual_object }}<---

-----------
Expected:
-----------
--->{{ expected_object }}<---

{% endset %}

{{ log(msg, info=True) }}

select 'fail'

{% else %}

select 'ok' {{ limit_zero() }}

{% endif %}
{% endmacro %}
6 changes: 6 additions & 0 deletions integration_tests/models/cross_db_utils/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,12 @@ models:
- assert_equal:
actual: actual
expected: expected

- name: test_listagg
tests:
- assert_equal:
actual: actual
expected: expected

- name: test_safe_cast
tests:
Expand Down
69 changes: 69 additions & 0 deletions integration_tests/models/cross_db_utils/test_listagg.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
with data as (

select * from {{ ref('data_listagg') }}

),

data_output as (

select * from {{ ref('data_listagg_output') }}

),

calculate as (

select
group_col,
{{ dbt_utils.listagg('string_text', "'_|_'", "order by order_col") }} as actual,
'bottom_ordered' as version
from data
group by group_col

union all

select
group_col,
{{ dbt_utils.listagg('string_text', "'_|_'", "order by order_col", 2) }} as actual,
'bottom_ordered_limited' as version
from data
group by group_col

union all

select
group_col,
{{ dbt_utils.listagg('string_text', "', '") }} as actual,
'comma_whitespace_unordered' as version
from data
where group_col = 3
group by group_col

union all

select
group_col,
{{ dbt_utils.listagg('DISTINCT string_text', "','") }} as actual,
'distinct_comma' as version
from data
where group_col = 3
group by group_col

union all

select
group_col,
{{ dbt_utils.listagg('string_text') }} as actual,
'no_params' as version
from data
where group_col = 3
group by group_col

)

select
calculate.actual,
data_output.expected
from calculate
left join data_output
on calculate.group_col = data_output.group_col
and calculate.version = data_output.version
10 changes: 10 additions & 0 deletions integration_tests/models/sql/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,11 @@ models:
values:
- '5'

- name: test_get_filtered_columns_in_relation
tests:
- dbt_utils.equality:
compare_model: ref('data_filtered_columns_in_relation_expected')

- name: test_get_relations_by_prefix_and_union
columns:
- name: event
Expand Down Expand Up @@ -121,6 +126,11 @@ models:
- dbt_utils.equality:
compare_model: ref('data_star_aggregate_expected')

- name: test_star_uppercase
tests:
- dbt_utils.equality:
compare_model: ref('data_star_expected')

- name: test_surrogate_key
tests:
- assert_equal:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{% set exclude_field = 'field_1' %}
{% set column_names = dbt_utils.get_filtered_columns_in_relation(from= ref('data_filtered_columns_in_relation'), except=[exclude_field]) %}

with data as (

select

{% for column_name in column_names %}
max({{ column_name }}) as {{ column_name }} {% if not loop.last %},{% endif %}
{% endfor %}

from {{ ref('data_filtered_columns_in_relation') }}

)

select * from data
13 changes: 13 additions & 0 deletions integration_tests/models/sql/test_star_uppercase.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{% set exclude_field = 'FIELD_3' %}


with data as (

select
{{ dbt_utils.star(from=ref('data_star'), except=[exclude_field]) }}

from {{ ref('data_star') }}

)

select * from data
Loading

0 comments on commit 1a517d2

Please sign in to comment.