-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running edits with no tests #8
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ster into running_edits_with_no_tests
…ster into running_edits_with_no_tests
freiksenet
pushed a commit
that referenced
this pull request
Jun 7, 2018
Running edits with no tests
rexledesma
added a commit
that referenced
this pull request
Nov 14, 2023
## Summary & Motivation Following up on #17980, one of the issues with memory usage in `dagster-dbt` is that if the `manifest` is specified as a path, we potentially hold multiple copies of the dictionary manifest in memory when creating our assets. This is because we read from the path to create the dictionary manifest, and then we hold a reference to the manifest dictionary in our asset metadata. When the same path is read multiple times, multiple copies ensure. We can optimize this by implementing a cache when reading from the path, so that the same dictionary manifest is returned even when a manifest path is specified. To further optimize this to hold no reference to the manifest dictionary, we could instead hold a reference to the manifest path in our asset metadata. However, 1. This only works if a manifest path is passed. If a manifest is already specified as a dictionary, then we still have to hold it in memory. This is the worst case. 2. (1) can be solved by preventing a manifest from being specified as a dictionary, but that would be a breaking change. ## How I Tested These Changes Use `tracemalloc`. 1. Create a `jaffle_dagster` project using `dagster-dbt project scaffold` on `jaffle_shop`. 2. Update the code to read the manifest from a path in three separate `@dbt_assets` definitions. 3. Run `PYTHONTRACEMALLOC=1 DAGSTER_DBT_PARSE_PROJECT_ON_LOAD=1 dagster dev` **Before Change** ```diff Top 10 lines #1: <frozen importlib._bootstrap_external>:672: 37114.4 KiB #2: /Users/rexledesma/.pyenv/versions/3.10.13/envs/dagster/lib/python3.10/site-packages/text_unidecode/__init__.py:6: 3675.4 KiB _replaces = pkgutil.get_data(__name__, 'data.bin').decode('utf8').split('\x00') - #3: /Users/rexledesma/dagster-labs/dagster/python_modules/libraries/dagster-- dbt/dagster_dbt/dbt_manifest.py:17: 3262.9 KiB - manifest = cast(Mapping[str, Any], orjson.loads(manifest.read_bytes())) #4: /Users/rexledesma/.pyenv/versions/3.10.13/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/abc.py:106: 1313.7 KiB cls = super().__new__(mcls, name, bases, namespace, **kwargs) #5: /Users/rexledesma/.pyenv/versions/3.10.13/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/collections/__init__.py:481: 987.5 KiB result = type(typename, (tuple,), class_namespace) #6: /Users/rexledesma/.pyenv/versions/3.10.13/envs/dagster/lib/python3.10/site-packages/mashumaro/core/meta/code/builder.py:114: 978.7 KiB self.globals = globals().copy() #7: /Users/rexledesma/.pyenv/versions/3.10.13/envs/dagster/lib/python3.10/site-packages/mashumaro/core/meta/code/builder.py:263: 891.8 KiB exec(code, self.globals, self.__dict__) #8: /Users/rexledesma/.pyenv/versions/3.10.13/envs/dagster/lib/python3.10/site-packages/google/protobuf/internal/builder.py:85: 697.9 KiB message_class = _reflection.GeneratedProtocolMessageType( #9: <frozen importlib._bootstrap_external>:128: 655.2 KiB #10: <frozen importlib._bootstrap_external>:1616: 569.6 KiB 53203 other: 38030.8 KiB Total allocated size: 88177.9 KiB ``` **After Change** ```diff Top 10 lines #1: <frozen importlib._bootstrap_external>:672: 37114.9 KiB #2: /Users/rexledesma/.pyenv/versions/3.10.13/envs/dagster/lib/python3.10/site-packages/text_unidecode/__init__.py:6: 3675.4 KiB _replaces = pkgutil.get_data(__name__, 'data.bin').decode('utf8').split('\x00') #3: /Users/rexledesma/.pyenv/versions/3.10.13/Library/Frameworks/Python.framework/Versions/3.10 /lib/python3.10/abc.py:106: 1313.7 KiB cls = super().__new__(mcls, name, bases, namespace, **kwargs) + #4: /Users/rexledesma/dagster-labs/dagster/python_modules/libraries/dagster-dbt/dagster_dbt/dbt_manifest.py:13: 1057.4 KiB + return cast(Mapping[str, Any], orjson.loads(manifest_path.read_bytes())) #5: /Users/rexledesma/.pyenv/versions/3.10.13/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/collections/__init__.py:481: 987.5 KiB result = type(typename, (tuple,), class_namespace) #6: /Users/rexledesma/.pyenv/versions/3.10.13/envs/dagster/lib/python3.10/site-packages/mashumaro/core/meta/code/builder.py:114: 978.7 KiB self.globals = globals().copy() #7: /Users/rexledesma/.pyenv/versions/3.10.13/envs/dagster/lib/python3.10/site-packages/mashumaro/core/meta/code/builder.py:263: 893.6 KiB exec(code, self.globals, self.__dict__) #8: /Users/rexledesma/.pyenv/versions/3.10.13/envs/dagster/lib/python3.10/site-packages/google/protobuf/internal/builder.py:85: 697.9 KiB message_class = _reflection.GeneratedProtocolMessageType( #9: <frozen importlib._bootstrap_external>:128: 655.2 KiB #10: <frozen importlib._bootstrap_external>:1616: 569.6 KiB 53195 other: 38013.7 KiB Total allocated size: 85957.5 KiB ``` **Baseline (one `@dbt_assets` definition)** ```diff Top 10 lines #1: <frozen importlib._bootstrap_external>:672: 37114.2 KiB #2: /Users/rexledesma/.pyenv/versions/3.10.13/envs/dagster/lib/python3.10/site-packages/text_unidecode/__init__.py:6: 3675.4 KiB _replaces = pkgutil.get_data(__name__, 'data.bin').decode('utf8').split('\x00') #3: /Users/rexledesma/.pyenv/versions/3.10.13/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/abc.py:106: 1313.7 KiB cls = super().__new__(mcls, name, bases, namespace, **kwargs) + #4: /Users/rexledesma/dagster-labs/dagster/python_modules/libraries/dagster-dbt/dagster_dbt/dbt_manifest.py:13: 1056.9 KiB + return cast(Mapping[str, Any], orjson.loads(manifest_path.read_bytes())) #5: /Users/rexledesma/.pyenv/versions/3.10.13/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/collections/__init__.py:481: 987.5 KiB result = type(typename, (tuple,), class_namespace) #6: /Users/rexledesma/.pyenv/versions/3.10.13/envs/dagster/lib/python3.10/site-packages/mashumaro/core/meta/code/builder.py:114: 978.7 KiB self.globals = globals().copy() #7: /Users/rexledesma/.pyenv/versions/3.10.13/envs/dagster/lib/python3.10/site-packages/mashumaro/core/meta/code/builder.py:263: 893.7 KiB exec(code, self.globals, self.__dict__) #8: /Users/rexledesma/.pyenv/versions/3.10.13/envs/dagster/lib/python3.10/site-packages/google/protobuf/internal/builder.py:85: 697.9 KiB message_class = _reflection.GeneratedProtocolMessageType( #9: <frozen importlib._bootstrap_external>:128: 655.2 KiB #10: <frozen importlib._bootstrap_external>:1616: 569.6 KiB 53175 other: 37976.1 KiB Total allocated size: 85918.9 KiB ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Merging abes stuff. I need to figure out a better merge process here as this polluted the commit history. I don't really care for now.