Does csv_index::RandomAccessSimple store approximate or exact byte offsets? #375

gavinwahl · 2024-09-05T23:52:27Z

One piece of documentation says csv_index::RandomAccessSimple stores indices to byte offsets corresponding to the start of records, while another piece of documentation says it stores /approximate/ offsets. Which is it? If approximate, how would an approximate index be used to locate the actual start of a record?

exact: https://github.com/BurntSushi/rust-csv/blob/master/csv-index/src/lib.rs#L19
approximate: https://github.com/BurntSushi/rust-csv/blob/master/csv-index/src/simple.rs#L14

BurntSushi · 2024-09-06T11:54:02Z

I can't remember, sadly. I think what that's referring to is that there may be cases where the byte offset is before what a human might consider to be the start of a CSV record when reading the CSV data, but that the byte offset is still correct assuming you use the csv crate (or its underlying csv-core implementation) to read the record for that position. This might sound weird, and that's because it is. For example, csv-core ignores empty lines, so if you have:


foo,bar,baz

Then there are technically 2 valid byte offsets for the start of the foo,bar,baz record: 0 or 1 (assuming \n record delimiters). I think the language in the docs is just being a bit sneaky about not guaranteeing one or the other. It should (but doesn't) mention the fundamental invariant though: if you seek to that byte offset in the data and start the csv reader at that point, then you'll get the corresponding ith record.

BurntSushi added the doc label Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does csv_index::RandomAccessSimple store approximate or exact byte offsets? #375

Does csv_index::RandomAccessSimple store approximate or exact byte offsets? #375

gavinwahl commented Sep 5, 2024

BurntSushi commented Sep 6, 2024 •

edited

Loading

Does csv_index::RandomAccessSimple store approximate or exact byte offsets? #375

Does csv_index::RandomAccessSimple store approximate or exact byte offsets? #375

Comments

gavinwahl commented Sep 5, 2024

BurntSushi commented Sep 6, 2024 • edited Loading

BurntSushi commented Sep 6, 2024 •

edited

Loading