Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle op outputs in default asset IO manager #8074

Merged
merged 7 commits into from
May 31, 2022

Conversation

clairelin135
Copy link
Contributor

Addresses #7713

This PR adjusts the default fs_asset_io_manager to handle op outputs by storing op outputs in a file directory /run_id/step_key to ensure unique directories for each step.

@vercel
Copy link

vercel bot commented May 25, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
dagster ✅ Ready (Inspect) Visit Preview May 26, 2022 at 11:50PM (UTC)
1 Ignored Deployment
Name Status Preview Updated
dagit-storybook ⬜️ Ignored (Inspect) May 26, 2022 at 11:50PM (UTC)

@clairelin135 clairelin135 marked this pull request as ready for review May 25, 2022 21:54
@clairelin135 clairelin135 requested review from sryza and OwenKephart May 25, 2022 21:54
if context.has_asset_key:
path = context.get_asset_output_identifier()
else:
path = [context.run_id, context.step_key]
Copy link
Contributor

@OwenKephart OwenKephart May 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should this be context.get_output_identifier() for parity with PickledObject...

maybe even return super()._get_path(context), but I'm not picky on that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! didn't realize that function existed

Copy link
Contributor

@OwenKephart OwenKephart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one comment, otherwise 🚢

@sryza
Copy link
Contributor

sryza commented May 25, 2022

If I understand correctly, this makes the behavior of fs_asset_io_manager identical to the behavior of fs_io_manager, when the output is not an asset. Thoughts on just merging them together? I.e. getting rid of fs_asset_io_manager and adding logic inside fs_io_manager? A small advantage is that we'd now have a name that's not as long.

Also, we should merge this after this external contribution lands so that we don't force them to rebase: #8007.

Copy link
Contributor

@sryza sryza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@@ -131,7 +131,7 @@ def asset2(asset1):

return graph.to_job(
resource_defs=merge_dicts(
{"io_manager": fs_asset_io_manager}, all_resource_defs, {"root_manager": root_manager}
{"io_manager": fs_io_manager}, all_resource_defs, {"root_manager": root_manager}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the to_job() function will by default supply fs_io_manager as the default io_manager, so it's possible that we can remove this bit. (not 100% sure)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed ✅

@clairelin135
Copy link
Contributor Author

@sryza I've adjusted this PR to merge functionality of fs_asset_io_manager with fs_io_manager (and deleted references to fs_asset_io_manager). I think this is ready for you to take another look!

Copy link
Contributor

@sryza sryza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Love to see PRs that remove more lines than they add. We'll need to make sure to message this as a breaking change in our CHANGES.md. Also, as followups, we'll need this for the gcs, s3, and azure IO managers as well.

@clairelin135 clairelin135 merged commit 6b8f907 into master May 31, 2022
@clairelin135 clairelin135 deleted the claire/handle-op-outputs-asset-io-mgr branch May 31, 2022 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants