Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove subprocess.PIPE usage by using a temp file #22654

Merged
merged 3 commits into from
Aug 11, 2022

Conversation

chamikaramj
Copy link
Contributor

@chamikaramj chamikaramj commented Aug 10, 2022

This is to address concerns regarding deadlocks. See #22533 for details.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

@codecov
Copy link

codecov bot commented Aug 10, 2022

Codecov Report

Merging #22654 (5901176) into master (4799828) will decrease coverage by 0.01%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master   #22654      +/-   ##
==========================================
- Coverage   74.20%   74.18%   -0.02%     
==========================================
  Files         708      708              
  Lines       93465    93473       +8     
==========================================
- Hits        69352    69347       -5     
- Misses      22838    22851      +13     
  Partials     1275     1275              
Flag Coverage Δ
python 83.59% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/python/apache_beam/utils/subprocess_server.py 56.54% <0.00%> (-2.20%) ⬇️
.../python/apache_beam/testing/test_stream_service.py 88.09% <0.00%> (-4.77%) ⬇️
...che_beam/runners/interactive/interactive_runner.py 90.06% <0.00%> (-1.33%) ⬇️
sdks/python/apache_beam/internal/metrics/metric.py 93.00% <0.00%> (-1.00%) ⬇️
...hon/apache_beam/runners/direct/test_stream_impl.py 93.28% <0.00%> (-0.75%) ⬇️
sdks/python/apache_beam/transforms/combiners.py 93.05% <0.00%> (-0.39%) ⬇️
...ks/python/apache_beam/runners/worker/sdk_worker.py 88.94% <0.00%> (-0.16%) ⬇️
...on/apache_beam/runners/dataflow/dataflow_runner.py 82.87% <0.00%> (-0.14%) ⬇️
...hon/apache_beam/runners/worker/bundle_processor.py 93.42% <0.00%> (-0.13%) ⬇️
sdks/python/apache_beam/runners/direct/executor.py 97.01% <0.00%> (+0.54%) ⬆️
... and 1 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@github-actions
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @TheNeuralBit for label python.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@chamikaramj
Copy link
Contributor Author

Run XVR_PythonUsingJava_Dataflow PostCommit

@chamikaramj
Copy link
Contributor Author

Run XVR_Direct PostCommit

Comment on lines 118 to 119
with tempfile.NamedTemporaryFile(delete=False) as stdout_file:
self._stdout_file_name = stdout_file.name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the tempfile docs:

The resulting object can be used as a context manager (see Examples). On completion of the context or destruction of the file object the temporary file will be removed from the filesystem.

Doesn't this mean that the temp file will get deleted when this context completes? I think instead we can just not use it in a context, and the file will get deleted when stdout_file is destroyed:

Suggested change
with tempfile.NamedTemporaryFile(delete=False) as stdout_file:
self._stdout_file_name = stdout_file.name
self._stdout_file = tempfile.NamedTemporaryFile(delete=False)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the file will be deleted since we set "delete=False" but probably safer to not use a context manager here.

self._process = subprocess.Popen(
cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
cmd,
stdout=open(self._stdout_file_name, 'wb'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the object returned by NamedTemporaryFile is already an opened file object with mode w+b, if you apply the above suggestion, this could be:

Suggested change
stdout=open(self._stdout_file_name, 'wb'),
stdout=self._stdout_file,

except Exception as e:
logging.error((
'Could not remove temporary file %s due to %r' %
(self._stdout_file_name, e)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be able to omit this:

It will be destroyed as soon as it is closed (including an implicit close when the object is garbage collected)

Or we can explicitly request it with self._stdout_file.close()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have to manually delete since we set "delete=False" when creating the temp file.

Copy link
Contributor Author

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

except Exception as e:
logging.error((
'Could not remove temporary file %s due to %r' %
(self._stdout_file_name, e)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have to manually delete since we set "delete=False" when creating the temp file.

Comment on lines 118 to 119
with tempfile.NamedTemporaryFile(delete=False) as stdout_file:
self._stdout_file_name = stdout_file.name
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the file will be deleted since we set "delete=False" but probably safer to not use a context manager here.

@chamikaramj chamikaramj merged commit 0c2e235 into apache:master Aug 11, 2022
chamikaramj added a commit that referenced this pull request Aug 24, 2022
@chamikaramj
Copy link
Contributor Author

@robertwb I think you had concerns about this change.

Do you think this should be reverted ?

MarcoRob pushed a commit to MarcoRob/beam that referenced this pull request Sep 5, 2022
* Remove subprocess.PIPE usage by using a temp file

* Remove context manager usage

* Fix yapf
@TheNeuralBit
Copy link
Member

@robertwb I think you had concerns about this change.

Do you think this should be reverted ?

I'm curious what the concerns were. My understanding was that this was a net positive, orthogonal to the grpc issue.

@chamikaramj
Copy link
Contributor Author

@TheNeuralBit I reverted it before the new release cut since new grpcio has been released and we haven't run into issues. I think @robertwb believed that PIPE is the correct construct to use here (instead of temp files) and we should not run into deadlocks due to the way we use PIPE. We can get it in again in the future if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants