Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tox to CI/CD pipeline #7

Closed
schrockn opened this issue Jun 1, 2018 · 0 comments
Closed

Add tox to CI/CD pipeline #7

schrockn opened this issue Jun 1, 2018 · 0 comments

Comments

@schrockn
Copy link
Member

schrockn commented Jun 1, 2018

Need to support different versions of python

rexledesma added a commit that referenced this issue Nov 14, 2023
## Summary & Motivation
Following up on #17980, one
of the issues with memory usage in `dagster-dbt` is that if the
`manifest` is specified as a path, we potentially hold multiple copies
of the dictionary manifest in memory when creating our assets. This is
because we read from the path to create the dictionary manifest, and
then we hold a reference to the manifest dictionary in our asset
metadata. When the same path is read multiple times, multiple copies
ensure.

We can optimize this by implementing a cache when reading from the path,
so that the same dictionary manifest is returned even when a manifest
path is specified.

To further optimize this to hold no reference to the manifest
dictionary, we could instead hold a reference to the manifest path in
our asset metadata. However,

1. This only works if a manifest path is passed. If a manifest is
already specified as a dictionary, then we still have to hold it in
memory. This is the worst case.
2. (1) can be solved by preventing a manifest from being specified as a
dictionary, but that would be a breaking change.

## How I Tested These Changes
Use `tracemalloc`.

1. Create a `jaffle_dagster` project using `dagster-dbt project
scaffold` on `jaffle_shop`.
2. Update the code to read the manifest from a path in three separate
`@dbt_assets` definitions.
3. Run `PYTHONTRACEMALLOC=1 DAGSTER_DBT_PARSE_PROJECT_ON_LOAD=1 dagster
dev`

**Before Change**
```diff
Top 10 lines
#1: <frozen importlib._bootstrap_external>:672: 37114.4 KiB
#2: /Users/rexledesma/.pyenv/versions/3.10.13/envs/dagster/lib/python3.10/site-packages/text_unidecode/__init__.py:6: 3675.4 KiB
    _replaces = pkgutil.get_data(__name__, 'data.bin').decode('utf8').split('\x00')
- #3: /Users/rexledesma/dagster-labs/dagster/python_modules/libraries/dagster-- dbt/dagster_dbt/dbt_manifest.py:17: 3262.9 KiB
-    manifest = cast(Mapping[str, Any], orjson.loads(manifest.read_bytes()))
#4: /Users/rexledesma/.pyenv/versions/3.10.13/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/abc.py:106: 1313.7 KiB
    cls = super().__new__(mcls, name, bases, namespace, **kwargs)
#5: /Users/rexledesma/.pyenv/versions/3.10.13/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/collections/__init__.py:481: 987.5 KiB
    result = type(typename, (tuple,), class_namespace)
#6: /Users/rexledesma/.pyenv/versions/3.10.13/envs/dagster/lib/python3.10/site-packages/mashumaro/core/meta/code/builder.py:114: 978.7 KiB
    self.globals = globals().copy()
#7: /Users/rexledesma/.pyenv/versions/3.10.13/envs/dagster/lib/python3.10/site-packages/mashumaro/core/meta/code/builder.py:263: 891.8 KiB
    exec(code, self.globals, self.__dict__)
#8: /Users/rexledesma/.pyenv/versions/3.10.13/envs/dagster/lib/python3.10/site-packages/google/protobuf/internal/builder.py:85: 697.9 KiB
    message_class = _reflection.GeneratedProtocolMessageType(
#9: <frozen importlib._bootstrap_external>:128: 655.2 KiB
#10: <frozen importlib._bootstrap_external>:1616: 569.6 KiB
53203 other: 38030.8 KiB
Total allocated size: 88177.9 KiB
```

**After Change**
```diff
Top 10 lines
#1: <frozen importlib._bootstrap_external>:672: 37114.9 KiB
#2: /Users/rexledesma/.pyenv/versions/3.10.13/envs/dagster/lib/python3.10/site-packages/text_unidecode/__init__.py:6: 3675.4 KiB
    _replaces = pkgutil.get_data(__name__, 'data.bin').decode('utf8').split('\x00')
#3: /Users/rexledesma/.pyenv/versions/3.10.13/Library/Frameworks/Python.framework/Versions/3.10 /lib/python3.10/abc.py:106: 1313.7 KiB
    cls = super().__new__(mcls, name, bases, namespace, **kwargs)
+ #4: /Users/rexledesma/dagster-labs/dagster/python_modules/libraries/dagster-dbt/dagster_dbt/dbt_manifest.py:13: 1057.4 KiB
+    return cast(Mapping[str, Any], orjson.loads(manifest_path.read_bytes()))
#5: /Users/rexledesma/.pyenv/versions/3.10.13/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/collections/__init__.py:481: 987.5 KiB
    result = type(typename, (tuple,), class_namespace)
#6: /Users/rexledesma/.pyenv/versions/3.10.13/envs/dagster/lib/python3.10/site-packages/mashumaro/core/meta/code/builder.py:114: 978.7 KiB
    self.globals = globals().copy()
#7: /Users/rexledesma/.pyenv/versions/3.10.13/envs/dagster/lib/python3.10/site-packages/mashumaro/core/meta/code/builder.py:263: 893.6 KiB
    exec(code, self.globals, self.__dict__)
#8: /Users/rexledesma/.pyenv/versions/3.10.13/envs/dagster/lib/python3.10/site-packages/google/protobuf/internal/builder.py:85: 697.9 KiB
    message_class = _reflection.GeneratedProtocolMessageType(
#9: <frozen importlib._bootstrap_external>:128: 655.2 KiB
#10: <frozen importlib._bootstrap_external>:1616: 569.6 KiB
53195 other: 38013.7 KiB
Total allocated size: 85957.5 KiB
```

**Baseline (one `@dbt_assets` definition)**

```diff
Top 10 lines
#1: <frozen importlib._bootstrap_external>:672: 37114.2 KiB
#2: /Users/rexledesma/.pyenv/versions/3.10.13/envs/dagster/lib/python3.10/site-packages/text_unidecode/__init__.py:6: 3675.4 KiB
    _replaces = pkgutil.get_data(__name__, 'data.bin').decode('utf8').split('\x00')
#3: /Users/rexledesma/.pyenv/versions/3.10.13/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/abc.py:106: 1313.7 KiB
    cls = super().__new__(mcls, name, bases, namespace, **kwargs)
+ #4: /Users/rexledesma/dagster-labs/dagster/python_modules/libraries/dagster-dbt/dagster_dbt/dbt_manifest.py:13: 1056.9 KiB
+    return cast(Mapping[str, Any], orjson.loads(manifest_path.read_bytes()))
#5: /Users/rexledesma/.pyenv/versions/3.10.13/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/collections/__init__.py:481: 987.5 KiB
    result = type(typename, (tuple,), class_namespace)
#6: /Users/rexledesma/.pyenv/versions/3.10.13/envs/dagster/lib/python3.10/site-packages/mashumaro/core/meta/code/builder.py:114: 978.7 KiB
    self.globals = globals().copy()
#7: /Users/rexledesma/.pyenv/versions/3.10.13/envs/dagster/lib/python3.10/site-packages/mashumaro/core/meta/code/builder.py:263: 893.7 KiB
    exec(code, self.globals, self.__dict__)
#8: /Users/rexledesma/.pyenv/versions/3.10.13/envs/dagster/lib/python3.10/site-packages/google/protobuf/internal/builder.py:85: 697.9 KiB
    message_class = _reflection.GeneratedProtocolMessageType(
#9: <frozen importlib._bootstrap_external>:128: 655.2 KiB
#10: <frozen importlib._bootstrap_external>:1616: 569.6 KiB
53175 other: 37976.1 KiB
Total allocated size: 85918.9 KiB
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants