Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dagster-aws] add PipesEMRServerlessClient #24318

Merged
merged 1 commit into from
Sep 13, 2024

Conversation

danielgafni
Copy link
Contributor

@danielgafni danielgafni commented Sep 9, 2024

Summary & Motivation

resolve DS-443

Implemented message reading for CloudWatch logging configuration (using CloudWatchMessageReader). Mapping Spark driver stdout/stderr to Dagster's stdout/stderr.

S3-backed logs are currently blocked by #24098

How I Tested These Changes

  1. Manually tested with real AWS infra (personal account)
  2. Added a very simple test which checks that context env vars are set correctly. This is as far as we can get with moto since they don't have start_job_run implemented.

I'm planning to add integration tests in CI with real AWS infra.

Changelog

Insert changelog entry or "NOCHANGELOG" here.

  • NEW ([dagster-aws] added PipesEMRServerlessClient, but it's untested yet so please don't include into release notes until we have tests)
  • BUGFIX (fixed a bug)
  • DOCS (added or updated documentation)

Copy link
Contributor Author

danielgafni commented Sep 9, 2024

@danielgafni danielgafni changed the title docs: update automation.mdx (#23193) [dagster-aws] add PipesEMRServerlessClient Sep 9, 2024
@danielgafni danielgafni marked this pull request as ready for review September 9, 2024 16:09
@graphite-app graphite-app bot added the area: docs Related to documentation in general label Sep 9, 2024
@danielgafni danielgafni marked this pull request as draft September 9, 2024 16:10
Copy link

github-actions bot commented Sep 9, 2024

Deploy preview for dagster-docs ready!

Preview available at https://dagster-docs-b2obvn9ya-elementl.vercel.app
https://aws-pipes-emr-serverless-client.dagster.dagster-docs.io

Direct link to changed pages:

@danielgafni
Copy link
Contributor Author

Example Dagster logs:

image
image
image

@danielgafni danielgafni force-pushed the aws-pipes-emr-serverless-client branch 3 times, most recently from 0f6b936 to 2095d0b Compare September 10, 2024 13:35
@danielgafni danielgafni changed the base branch from master to dagster-pipes-route-logs-to-file September 10, 2024 13:35
@danielgafni danielgafni force-pushed the dagster-pipes-route-logs-to-file branch from c8aa00b to a191a5c Compare September 10, 2024 14:24
@danielgafni danielgafni force-pushed the aws-pipes-emr-serverless-client branch 2 times, most recently from 431c7ec to 3710769 Compare September 10, 2024 15:11
@danielgafni danielgafni requested review from alangenfeld and schrockn and removed request for erinkcochran87 September 10, 2024 15:16
@danielgafni danielgafni marked this pull request as ready for review September 10, 2024 15:18
@schrockn schrockn requested a review from smackesey September 10, 2024 15:51
@schrockn
Copy link
Member

Now that @smackesey is back he is also a good reviewer for Pipes PRs!

@danielgafni danielgafni force-pushed the aws-pipes-emr-serverless-client branch from 3710769 to 8a5e287 Compare September 10, 2024 19:57
Comment on lines 41 to 49
f"""A pipes client for running workloads on AWS {AWS_SERVICE_NAME}.

Args:
client (Optional[boto3.client]): The boto3 {AWS_SERVICE_NAME} client used to interact with AWS {AWS_SERVICE_NAME}.
context_injector (Optional[PipesContextInjector]): A context injector to use to inject
context into AWS {AWS_SERVICE_NAME} workload. Defaults to :py:class:`PipesEnvContextInjector`.
message_reader (Optional[PipesMessageReader]): A message reader to use to read messages
from the {AWS_SERVICE_NAME} workload. Defaults to :py:class:`PipesCloudWatchMessageReader`.
forward_termination (bool): Whether to cancel the {AWS_SERVICE_NAME} workload if the Dagster process receives a termination signal.
Copy link
Member

@schrockn schrockn Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure doing this variable Interpolation adds much, and it adds a layer of indirection. I'd rather just see the docstring as it will appear in our docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a leftover from the BaseAWSPipesClient PR.
I wanted to generalize docstrings over the AWS service.
I'll fix this here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to enforce consistency recommend doing it via a unit test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly, just trying to standardize boilerplate such as this one.
Basically, a lot of AWS pipes clients will share these exact docstrings with the only difference being the service name.
Same with log messages.
But let's keep it simple until we refactor different classes to extend BaseAWSPipesClient.

Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some changes that are worth a roundtrip

@danielgafni danielgafni changed the base branch from dagster-pipes-route-logs-to-file to graphite-base/24318 September 11, 2024 16:07
Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@danielgafni
Copy link
Contributor Author

danielgafni commented Sep 12, 2024

Cool. I'm planning to keep this PR open (if it's possible) until we have integration tests against real AWS infra. This is because these tests are likely to reveal some problems, I expect to have a few more fixes for this PR.

Me and @yuhan to merge this to master anyway and to add tests next

@danielgafni danielgafni force-pushed the aws-pipes-emr-serverless-client branch from 0ee18fd to ff14610 Compare September 12, 2024 14:05
@danielgafni danielgafni changed the base branch from graphite-base/24318 to master September 12, 2024 14:05
@danielgafni danielgafni force-pushed the aws-pipes-emr-serverless-client branch 3 times, most recently from 31ce713 to c7d4d4a Compare September 13, 2024 06:49
@danielgafni danielgafni force-pushed the aws-pipes-emr-serverless-client branch from c7d4d4a to 2e45a72 Compare September 13, 2024 07:06
@danielgafni danielgafni changed the title [dagster-aws] add PipesEMRServerlessClient [dagster-aws] add PipesEMRServerlessClient Sep 13, 2024
Copy link
Contributor Author

danielgafni commented Sep 13, 2024

Merge activity

@danielgafni danielgafni merged commit f4cff85 into master Sep 13, 2024
1 of 2 checks passed
@danielgafni danielgafni deleted the aws-pipes-emr-serverless-client branch September 13, 2024 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: docs Related to documentation in general
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants