-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor File Handling Code to Fix S3 Incompatability Issues #338
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #338 +/- ##
==========================================
+ Coverage 97.88% 97.93% +0.05%
==========================================
Files 144 144
Lines 5142 5122 -20
==========================================
- Hits 5033 5016 -17
+ Misses 109 106 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -27,6 +25,17 @@ def fixture_directory(): | |||
yield tempdir | |||
|
|||
|
|||
@pytest.fixture | |||
def unspported_file(fixture_directory): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. I'll patch that un in a follow up PR.
) | ||
|
||
|
||
class S3Extractor(UnifiedFileExtractor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this class specifying the file source is to be an S3FileSource, how would we do:
UnifiedFileExtractor.from_file_data(**s3FileSourceParams) in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a really good question. I've documented the approach here.
In this PR, we a complete refactoring to the internals of the file handling logic that allows for increased interoperability and maintainability between file handling logic and sources. To accomplish this, the PR introduces the following changes:
SupportedFileFormat
andSupportedCompressedFileFormat
toFileCodec
andCompressionCodec
respectively to provide a better naming for their changed roles. Now, these types simply provided the decode behaviors and do not own a file in anyway.ReadableFile
and its subclasses which define methods for building reader types fromself
as well as defining thepath
of the file (which can be a non-real path).FileSource
and its subclasses which essentially work to build instances ofReadableFile
and return them to a single unified file extractor behavior.files
module.