-
Notifications
You must be signed in to change notification settings - Fork 27
Reference Hubs
Compared to a Standard Hub, a Reference Hub is created to store reference Data. The source model of a reference Hub would be a stage model, but compared to a standard Hub, there is no Hub Hashkey required. Instead, a reference Hub only contains the unhashed one or multiple reference keys.
If a reference Hub is loaded from multiple sources, each source is required to have the same number of reference keys. Additionally each source needs to have a rsrc_static Attribute defined.
In general, a reference Hub shares the same features as a general standard Hub, which are:
- Loadable by multiple sources
- Supports multiple updates per batch and therefore initial loading
- Using a dynamic high-water-mark to optimize loading performance of multiple loads
- Allows source mappings for deviations between source column names and hub column names
Parameter | Data type | Required | Default Value | Explanation |
---|---|---|---|---|
ref_keys | string | list of strings | mandatory | - | Name of the reference key(s) inside the source system. If multiple keys are used, then the ref_keys need to be given as a list of strings. |
source_models | string | list of dictionaries | dictionary | mandatory | - | If single source, just a string holding the name of the stage model is required. For multi soure reference Hubs, a list of dictionaries with information of each source is required. Please see this wiki page for more details. The inner dictionaries must have 'name' as a key, and optionally the keys 'rsrc_static' & 'ref_keys'.
For further information about the rsrc_static attribute, please visit the following wiki page: rsrc_static Attribute |
src_ldts | string | optional | datavault4dbt.ldts_alias | Name of the ldts column inside the source models. Needs to use the same column name as defined as alias inside the staging model |
src_rsrc | string | optional | datavault4dbt.rsrc_alias | Name of the rsrc column inside the source models. Needs to use the same column name as defined as alias inside the staging model. |
{{ config(materialized='incremental',
schema='Core') }}
{%- set yaml_metadata -%}
source_models: stg_nation
ref_keys: N_NATIONKEY
{%- endset -%}
{% set metadata_dict = fromyaml(yaml_metadata) %}
{{ datavault4dbt.ref_hub(source_models=metadata_dict.get('source_models'),
ref_keys=metadata_dict.get('ref_keys')) }}
-
ref_keys: This source model has one reference key, the column "N_NATIONKEY".
-
source_models: This would create a reference Hub loaded from only one source. It uses the model 'stg_nation' and it is not defined as a dictionary because the parameters for this source (only the ref_keys) match the higher-level definition.
- The 'rsrc_static' attribute is not set, because it is not required for single source entities. For more information see rsrc_static Attribute.
Click Me!
WITH
src_new_1 AS (
SELECT
N_NATIONKEY,
ldts,
rsrc
FROM datavault4dbt_demo.core_Stages.stg_nation src
),
earliest_ref_key_over_all_sources AS (
SELECT
lcte.*
FROM src_new_1 AS lcte
QUALIFY ROW_NUMBER() OVER (PARTITION BY N_NATIONKEY ORDER BY ldts) = 1),
records_to_insert AS (
SELECT
N_NATIONKEY,
ldts,
rsrc
FROM earliest_ref_key_over_all_sources)
SELECT * FROM records_to_insert
{{ config(materialized='incremental',
schema='Core') }}
{%- set yaml_metadata -%}
source_models:
- name: stg_nation
rsrc_static: 'TPC_H_SF1.Nation'
- name: stg_customers
ref_keys: C_NATIONKEY
rsrc_static: 'TPC_H_SF1.Customer'
ref_keys: N_NATIONKEY
{%- endset -%}
{% set metadata_dict = fromyaml(yaml_metadata) %}
{{ datavault4dbt.ref_hub(source_models=metadata_dict.get('source_models'),
ref_keys=metadata_dict.get('ref_keys')) }}
-
ref_keys: This source model only has one reference key, the column "N_NATIONKEY".
-
source_models: This would create a reference Hub loaded from two different sources. From the model 'stg_nation' it will select the column 'N_NATIONKEY' as the reference key, because there is not source-specific definition for this parameter. For the model 'stg_customers' it will select the column 'C_NATIONKEY' as defined.
- The 'rsrc_static' attribute is set for each source. For more information see rsrc_static Attribute.
Click Me!
WITH
src_new_1 AS (
SELECT
N_NATIONKEY,
ldts,
rsrc
FROM datavault4dbt_demo.core_Stages.stg_nation src
),src_new_2 AS (
SELECT
C_NATIONKEY,
ldts,
rsrc
FROM datavault4dbt_demo.core_Stages.stg_customers src
),
source_new_union AS (SELECT
N_NATIONKEY AS N_NATIONKEY,
ldts,
rsrc
FROM src_new_1
UNION ALL
SELECT
C_NATIONKEY AS N_NATIONKEY,
ldts,
rsrc
FROM src_new_2),
earliest_ref_key_over_all_sources AS (
SELECT
lcte.*
FROM source_new_union AS lcte
QUALIFY ROW_NUMBER() OVER (PARTITION BY N_NATIONKEY ORDER BY ldts) = 1),
records_to_insert AS (
SELECT
N_NATIONKEY,
ldts,
rsrc
FROM earliest_ref_key_over_all_sources)
SELECT * FROM records_to_insert
Table of Content
- Staging
- DV-Entities
- Hubs
- Links
- Satellites
- Standard Satellite
- Multi-Active Satellite
- Record-Tracking Satellite
- Non-Historized Satellite
-
Reference Data
- Reference Hub
- Reference Satellite
- Reference Tables
- Business Vault
- PIT
- Snapshot Control
- General