Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix streaming from arrow files #7083

Merged
merged 1 commit into from
Aug 30, 2024
Merged

fix streaming from arrow files #7083

merged 1 commit into from
Aug 30, 2024

Conversation

fschlatt
Copy link
Contributor

No description provided.

@@ -386,7 +386,7 @@ def resolve_pattern(
matched_paths = [
filepath if filepath.startswith(protocol_prefix) else protocol_prefix + filepath
for filepath, info in fs.glob(pattern, detail=True, **glob_kwargs).items()
if info["type"] == "file"
if info["type"] in ("file", "other")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand why you did this change ? can you elaborate ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my Ubuntu machine, the info of symlinked files is "other". Currently, if datafiles are symlinks, they are not found.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is an issue for the LocalFileSystem implementation of fsspec no ?
I'm afraid of future errors if we support "other" as a valid file type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It was just my quick hack to get it to work locally for me.

On the other hand, I'm not sure why checking for the type is necessary at all since a user should probably only specify glob patterns to valid files anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, maybe let's try to run the tests without this line ?

anyway for this PR feel free to just keep the "arrow" change and we can tackle your other issue in another PR - feel free to ping me there :)

Copy link
Contributor Author

@fschlatt fschlatt Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. I've reverted the second commit and merged the recent changes from main. The second PR is here #7133

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks ! I started the CI on your other PR

@lhoestq lhoestq merged commit ca58154 into huggingface:main Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants