Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add argument to get_column_values allow for alternate sorting of column values #289

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# dbt-utils v0.6.3

## Features
- Adds ability to specify a `sort_column` and `sort_direction` in `get_column_values.

# dbt-utils v0.6.2

## Fixes
Expand Down
37 changes: 37 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -442,6 +442,8 @@ group by 1
This macro returns the unique values for a column in a given [relation](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation).
It takes an options `default` argument for compiling when the relation does not already exist.

The `order_by` argument allows for sorting of values. The default is highest to lowest frequency of values. You can also specify a `sort_direction`.

Usage:
```
-- Returns a list of the top 50 states in the `users` table
Expand All @@ -453,6 +455,41 @@ Usage:

...
```

```
-- Returns a list of user names sorted by name from the `users` table
{% set names = dbt_utils.get_column_values(table=ref('users'), column='name', default=[], order_by='name') %}

{% for name in names %}
...
{% endfor %}

...
```

```
-- Returns a list of user cities sorted by name from the `users` table
{% set cities = dbt_utils.get_column_values(table=ref('users'), column='city_name', default=[], order_by='city_name') %}

{% for city in cities %}
...
{% endfor %}

...
```


```
-- Returns a list of user cities sorted by name from the `users` table
{% set cities = dbt_utils.get_column_values(table=ref('users'), column='city_name', default=[], order_by='max(created_at)') %}

{% for city in cities %}
...
{% endfor %}

...
```

#### get_relations_by_prefix
Returns a list of [Relations](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation)
that match a given prefix, with an optional exclusion pattern. It's particularly
Expand Down
2 changes: 1 addition & 1 deletion integration_tests/models/sql/test_get_column_values.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

{% set columns = dbt_utils.get_column_values(ref('data_get_column_values'), 'field', default = []) %}
{% set columns = dbt_utils.get_column_values(ref('data_get_column_values'), 'field', default = [], order_by="field", sort_direction="desc") %}


{% if target.type == 'snowflake' %}
Expand Down
38 changes: 27 additions & 11 deletions macros/sql/get_column_values.sql
Original file line number Diff line number Diff line change
Expand Up @@ -10,18 +10,29 @@ Returns:
A list of distinct values for the specified columns
#}

{% macro get_column_values(table, column, max_records=none, default=none) -%}
{% macro get_column_values(table, column, max_records=none, default=none, order_by=none, sort_direction=none) -%}

{#-- Prevent querying of db in parsing mode. This works because this macro does not create any new refs. #}
{%- if not execute -%}
{{ return('') }}
{% endif %}
{#-- #}

{%- set order_by = order_by if order_by else 'count(*)' -%}
{%- set order_by = 'max(' ~ order_by ~ ')' if order_by == column else order_by -%}
{%- set sort_direction -%}
{%- if order_by == column -%}
{{ sort_direction or 'asc' }}
{%- else -%}
{{ sort_direction or 'desc' }}
{%- endif -%}
{%- endset -%}

{%- set target_relation = adapter.get_relation(database=table.database,
schema=table.schema,
identifier=table.identifier) -%}

{# If no sort column is supplied, we use the default descending frequency count. #}
{%- call statement('get_column_values', fetch_result=true) %}

{%- if not target_relation and default is none -%}
Expand All @@ -36,16 +47,21 @@ Returns:

{%- else -%}

select
{{ column }} as value

from {{ target_relation }}
group by 1
order by count(*) desc

{% if max_records is not none %}
limit {{ max_records }}
{% endif %}
with sorted_column_values as (

select
{{ column }} as value,
{{ order_by }} as sort_column
from {{ target_relation }}
group by 1
order by {{ order_by }} {{ sort_direction }}
{% if max_records is not none %}

limit {{ max_records }}
{% endif %}

)
select value from sorted_column_values

{% endif %}

Expand Down