Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-3777: [C++] Add Slow input streams and slow filesystem #5439

Closed
wants to merge 2 commits into from

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Sep 19, 2019

No description provided.

@@ -28,12 +28,10 @@

# examine the output of clang-format and if changes are
# present assemble a (unified)patch of the difference
def _check_one_file(completed_processes, filename):
def _check_one_file(filename, formatted):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated to this PR, but I finally got annoyed enough by the slow format-check that I spent some time figuring out what happened. The bottom line is that each call to _check_one_file in a child process was serializing and transmitting the entire results for all files. Instead, we just transmit the result for the single file being checked, it's much faster.

@fsaintjacques @bkietz

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@pitrou pitrou force-pushed the ARROW-3777-slow-fs branch 3 times, most recently from bcfd933 to b46487d Compare September 23, 2019 12:47
@codecov-io
Copy link

Codecov Report

Merging #5439 into master will decrease coverage by <.01%.
The diff coverage is 84.13%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5439      +/-   ##
==========================================
- Coverage   89.69%   89.69%   -0.01%     
==========================================
  Files         810      710     -100     
  Lines      114853   108385    -6468     
  Branches     1498        0    -1498     
==========================================
- Hits       103021    97219    -5802     
+ Misses      11821    11166     -655     
+ Partials       11        0      -11
Impacted Files Coverage Δ
cpp/src/arrow/filesystem/filesystem_test.cc 100% <100%> (ø) ⬆️
cpp/src/arrow/io/slow.h 100% <100%> (ø)
cpp/src/arrow/io/memory_test.cc 100% <100%> (ø) ⬆️
cpp/src/arrow/filesystem/filesystem.h 100% <100%> (ø) ⬆️
cpp/src/arrow/io/slow.cc 66.66% <66.66%> (ø)
cpp/src/arrow/filesystem/filesystem.cc 85.96% <88.67%> (+0.96%) ⬆️
cpp/src/arrow/filesystem/s3_internal.h 90.74% <0%> (-3.71%) ⬇️
js/src/util/fn.ts
js/src/builder/index.ts
js/src/enum.ts
... and 102 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 06dc863...2ca64c5. Read the comment docs.

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. Minor comments but can be merged whenever. I think we should opt for reducing the size of header files (and hiding more implementation details) when possible

@@ -28,12 +28,10 @@

# examine the output of clang-format and if changes are
# present assemble a (unified)patch of the difference
def _check_one_file(completed_processes, filename):
def _check_one_file(filename, formatted):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -265,5 +265,47 @@ class ARROW_EXPORT SubTreeFileSystem : public FileSystem {
Status FixStats(FileStats* st) const;
};

/// \brief EXPERIMENTAL: a FileSystem implementation that delegates to another
/// implementation but inserts latencies at various points.
class ARROW_EXPORT SlowFileSystem : public FileSystem {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've brought this topic up before, but what do you think about putting the entire implementation of SlowFS in a .cc file and returning std::shared_ptr from the function that creates it? This can always be done later so refactor need not happen now. It's probably better to use factory methods for instantiating most FS classes anyway

Unless you anticipate adding additional methods to the class

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments hold regarding the slow stream classes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. I don't think our conventions should vary too much. If some classes are exposed and other hidden it feels a bit weird. But strong opinion from me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to hide the implementations in all cases except where we expose additional methods. If only that it offers more freedom to refactor

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we discuss this in a separate JIRA. Will merge.

@pitrou pitrou closed this in 5a918ce Sep 24, 2019
@pitrou pitrou deleted the ARROW-3777-slow-fs branch September 24, 2019 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants