-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support reading from Text and/or ByteString #92
Comments
The pure reading code should clearly be separated from the IO part. That it isn’t is a serious flaw; fixing that would be a significant improvement. The reason file IO is baked in so deeply is the expectation that compile-time and run-time look at the same data. That said, the reliance on IO is incidental. Out of curiosity, are you not using the TH pieces, or perhaps using another file to establish the types before streaming more data at run time? |
Yeah I probably will be (though it's more likely I'll write the types by hand in the end, but the TH will help me find them). One of the other reasons for me needing to parse Text values in memory is that the CSVs I deal with aren't nice columnar data, but often have a row of metadata at the beginning and a final row to signify the end of the document (yeah it's pretty awful). Something I've just realised this morning, that probably needs a robust fix, is that all the code assumes that CSV is a line based format - it's perfectly valid to have new lines within quoted text blocks (and some of the data I work with does that a lot). This means the code I've made in the PR is likely broken, as well as the IO hGetLine based code. I know there's been some discussion of using cassava, which I believe handles this problem. Maybe it would be worth investigating taking at least the parser from cassava, since a lot of the work in Frames duplicates it. |
I have a use case where I want to build frames from data what'd been downloaded from the internet (in the form of a
.csv.zip
); most of the API seems centred around reading directly from files. I haven't seen pure alternatives toreadTableOpt'
orreadTable
which accept a Text or ByteString inputs.Is there any fundamental reason why this doesn't exist? If not, would it be difficult to add? I assume there's a pure core in these existing functions already which process data read in MonadIO, so it feels like it should be possible to split these out (though there's possibly some optimisations which can be made in both situations - streaming data from IO and reading a known length chunk of data).
The text was updated successfully, but these errors were encountered: