Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor File Handling Code to Fix S3 Incompatability Issues #338

Merged
merged 10 commits into from
Jul 30, 2024

Conversation

zprobst
Copy link
Contributor

@zprobst zprobst commented Jul 29, 2024

In this PR, we a complete refactoring to the internals of the file handling logic that allows for increased interoperability and maintainability between file handling logic and sources. To accomplish this, the PR introduces the following changes:

  • Rename SupportedFileFormat and SupportedCompressedFileFormat to FileCodec and CompressionCodec respectively to provide a better naming for their changed roles. Now, these types simply provided the decode behaviors and do not own a file in anyway.
  • Introduce ReadableFile and its subclasses which define methods for building reader types from self as well as defining the path of the file (which can be a non-real path).
  • Introduce FileSource and its subclasses which essentially work to build instances of ReadableFile and return them to a single unified file extractor behavior.
  • Add dramatically better documentation as to what the code does across the entire files module.
  • Deprecate the old extractor types for future removal.

Copy link

codecov bot commented Jul 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.93%. Comparing base (83b0200) to head (714fcb6).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #338      +/-   ##
==========================================
+ Coverage   97.88%   97.93%   +0.05%     
==========================================
  Files         144      144              
  Lines        5142     5122      -20     
==========================================
- Hits         5033     5016      -17     
+ Misses        109      106       -3     
Flag Coverage Δ
3.10-macos-latest 97.91% <99.44%> (+0.03%) ⬆️
3.10-ubuntu-latest 97.91% <99.44%> (+0.03%) ⬆️
3.10-windows-latest 97.87% <99.44%> (+0.03%) ⬆️
3.11-macos-latest 97.91% <99.44%> (+0.03%) ⬆️
3.11-ubuntu-latest 97.91% <99.44%> (+0.03%) ⬆️
3.11-windows-latest 97.87% <99.44%> (+0.03%) ⬆️
3.12-macos-latest 97.91% <99.44%> (+0.03%) ⬆️
3.12-ubuntu-latest 97.91% <99.44%> (+0.03%) ⬆️
3.12-windows-latest 97.87% <99.44%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zprobst zprobst marked this pull request as ready for review July 29, 2024 22:10
@zprobst zprobst requested a review from ccloes as a code owner July 29, 2024 22:10
@@ -27,6 +25,17 @@ def fixture_directory():
yield tempdir


@pytest.fixture
def unspported_file(fixture_directory):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*supported

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I'll patch that un in a follow up PR.

)


class S3Extractor(UnifiedFileExtractor):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this class specifying the file source is to be an S3FileSource, how would we do:

UnifiedFileExtractor.from_file_data(**s3FileSourceParams) in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a really good question. I've documented the approach here.

#339

@zprobst zprobst merged commit 760ef38 into main Jul 30, 2024
11 checks passed
@zprobst zprobst deleted the refactor/file-handling branch July 30, 2024 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants