Skip to content

Commit

Permalink
Performance: Remove unused models, reduce memory usage (openedx#111)
Browse files Browse the repository at this point in the history
* refactor!: Remove unused models

These models are currently shown as unused in the base project, this usually means they either have performance issues and require prewhere in Superset, or were added with speculation they may be used later but never where.

This is a breaking change for anyone who uses these models downstream in a custom dbt project or Superset report, however they can simply be copied wholesale into a downstream dbt project.

dbt doesn't delete removed models from the database, so if a running service has those resources they can either be manually dropped or left in place if they are being used.

* refactor: Reduce memory usage on dict queries

* refactor: Add / use most_recent_course_blocks mv

* feat: Add ability to remove deprecated objects from the database.
  • Loading branch information
bmtcril authored Aug 9, 2024
1 parent e231fab commit c855dc6
Show file tree
Hide file tree
Showing 30 changed files with 74 additions and 1,403 deletions.
9 changes: 9 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,15 @@ These require tables to be seeded first. To do this, add 'unit-test-seeds' to ``
``dbt test --selector all_tests`` will run all data/generic/unit tests.


Removing old models
*******************

dbt does not automatically remove models that have been deleted from this project. As we remove models we will add them to a macro that can be manually run to clean up things which are no longer needed. This can be important to prevent stale materialized views from breaking when schemas change, and to prevent unnecessary inserts writes to tables that aren't used.

If you need a model that has been removed due to custom reporting you should either move that model to the system you use to manage your custom schema (such as your own dbt package) instead of letting the old version remain. This will let you explicitly upgrade it as necessary.

``dbt run-operation remove_deprecated_models`` will drop the relations and ``dbt -d run-operation remove_deprecated_models`` will drop with debug information showing the commands that are run.

More Help
=========

Expand Down
48 changes: 48 additions & 0 deletions macros/remove_deprecated_models.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
{% macro do_drop(type, schema, relation) %}
-- Drop a relation, types are "view", "table", or "mv".
-- "mv" will drop both the expected view and destination table.
{% if type == "mv" %}
{% do do_drop("view", schema, relation ~ "_mv") %}
{% do do_drop("table", schema, relation) %}
{% else %}
{% set cmd = "drop " ~ type ~ " if exists " ~ schema ~ "." ~ relation ~ ";" %}
{% print(cmd) %}
{% do run_query(cmd) %}
{% endif %}
{% endmacro %}

{% macro remove_deprecated_models() %}
{% set xapi = env_var("ASPECTS_XAPI_DATABASE", "xapi") %}
{% set reporting = env_var("DBT_PROFILE_TARGET_DATABASE", "reporting") %}
{% set event_sink = env_var("ASPECTS_EVENT_SINK_DATABASE", "event_sink") %}

{{
print(
"Running remove_deprecated_models on "
~ xapi
~ ", "
~ reporting
~ ", "
~ event_sink
~ "."
)
}}

-- https://github.com/openedx/aspects-dbt/pull/111/
{% do do_drop("mv", xapi, "completion_events") %}
{% do do_drop("view", reporting, "fact_completions") %}
{% do do_drop("view", xapi, "fact_forum_interactions") %}
{% do do_drop("mv", xapi, "forum_events") %}
{% do do_drop("view", reporting, "fact_grades") %}
{% do do_drop("view", reporting, "learner_summary") %}
{% do do_drop("view", reporting, "fact_navigation_dropoff") %}
{% do do_drop("view", reporting, "fact_learner_problem_summary") %}
{% do do_drop("view", reporting, "fact_problem_engagement") %}
{% do do_drop("view", reporting, "fact_problem_engagement_per_subsection") %}
{% do do_drop("view", reporting, "fact_problem_responses_extended") %}
{% do do_drop("view", reporting, "dim_at_risk_learners") %}
{% do do_drop("view", reporting, "fact_transcript_usage") %}
{% do do_drop("view", reporting, "fact_watched_video_segments") %}
{% do do_drop("mv", reporting, "video_transcript_events") %}

{% endmacro %}
26 changes: 0 additions & 26 deletions models/completion/completion_events.sql

This file was deleted.

61 changes: 0 additions & 61 deletions models/completion/fact_completions.sql

This file was deleted.

77 changes: 0 additions & 77 deletions models/completion/schema.yml

This file was deleted.

1 change: 0 additions & 1 deletion models/courses/course_block_names.sql
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@
},
)
}}

select
location, block_name, course_key, graded, course_order, display_name_with_location
from {{ ref("most_recent_course_blocks") }}
19 changes: 17 additions & 2 deletions models/courses/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ models:
description: "The type of block. This can be a section, subsection, unit, or the block type"

- name: course_block_names
description: "A table of course blocks with their names"
description: "An in-memory dictionary of course blocks with their display names and additional metadata. Only stores the most recent row per block location."
columns:
- name: location
data_type: String
Expand All @@ -65,6 +65,21 @@ models:
- name: course_order
data_type: Int32
description: "The sort order of this block in the course across all course blocks"
- name: section
data_type: Int32
description: "The section number that this block falls under in the course. Starts at 1."
- name: subsection
data_type: Int32
description: "The subsection number that this block falls under in the section. Starts at 1."
- name: unit
data_type: Int32
description: "The unit number that this block falls under in the subsection. Starts at 1."
- name: dump_id
data_type: UUID
description: "The UUID of the event sink run that published this block to ClickHouse. When a course is published all blocks inside it are sent with the same dump_id."
- name: time_last_dumped
data_type: String
description: "The Datetime of the event sink run that published this block to ClickHouse. When a course is published all blocks inside it are sent with the same time_last_dumped."

- name: most_recent_course_blocks
description: "A materialized view of course blocks with their display names and additional metadata. Only stores the most recent row per block location."
Expand Down Expand Up @@ -101,7 +116,7 @@ models:
description: "The UUID of the event sink run that published this block to ClickHouse. When a course is published all blocks inside it are sent with the same dump_id."
- name: time_last_dumped
data_type: String
description: "The Datetime of the event sink run that published this block to ClickHouse. When a course is published all blocks inside it are sent with the same time_last_dumped."
description: "The datetime of the event sink run that published this block to ClickHouse. When a course is published all blocks inside it are sent with the same time_last_dumped."

- name: course_names
description: "A table of courses with their names"
Expand Down
17 changes: 0 additions & 17 deletions models/forum/fact_forum_interactions.sql

This file was deleted.

24 changes: 0 additions & 24 deletions models/forum/forum_events.sql

This file was deleted.

67 changes: 0 additions & 67 deletions models/forum/schema.yml

This file was deleted.

Loading

0 comments on commit c855dc6

Please sign in to comment.