Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory mapped datasource does not allow reading data beyond the mapped range #15186

Closed
vuule opened this issue Feb 29, 2024 · 1 comment
Closed
Labels
bug Something isn't working cuIO cuIO issue

Comments

@vuule
Copy link
Contributor

vuule commented Feb 29, 2024

When creating a memory mapped datasource, we optionally pass a range within the file that we want mapped. Primary use for this is to avoid memory mapping the entire file when using a byte_range option in CSV/JSON.
Because we often need data beyond the exact byte_range, the mapped source add padding to the mapped range. However, we cannot guarantee that the reads will fall into this range.
Currently the source does not read beyond the mapped range and this can lead to incorrect output when the padding is not sufficient.
https://github.com/rapidsai/cudf/blob/branch-24.04/cpp/src/io/utilities/datasource.cpp#L163

Desired behavior:
Memory mapped datasource should read from the file when the mapping is not sufficient instead of clamping the returned data to the mapped range.

@vuule
Copy link
Contributor Author

vuule commented Nov 5, 2024

fixed by #16865

@vuule vuule closed this as completed Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue
Projects
None yet
Development

No branches or pull requests

1 participant