Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch from .rData to .fst #202

Open
drnickisaac opened this issue Dec 9, 2020 · 5 comments
Open

switch from .rData to .fst #202

drnickisaac opened this issue Dec 9, 2020 · 5 comments

Comments

@drnickisaac
Copy link
Contributor

@AugustT has suggested we switch from saving sparta outputs from .rData to .fst: this will save time and disk space. It should be easily implemented with a single line of code.

@AugustT
Copy link
Member

AugustT commented Dec 9, 2020

This is not a single line of code job! The implications are very far reaching and should be mapped out first!

@drnickisaac
Copy link
Contributor Author

@AugustT I have started looking into this. The fst format is certainly fast for data frames, but is it really optimal for the list structure that comes out of sparta? Also, this blog post implies other potential drawbacks. I don't fully understand the details:
http://svmiller.com/blog/2020/02/comparing-qs-fst-rds-for-bigger-datasets/

@03rcooke
Copy link
Contributor

I'm also slightly concerned about this shift. For me I don't see an issue with the current read/write speed, and it seems like a lot of work!

I've always preffered .rds as you can formally assign objects to an object name when you read them in, but you can get round this with a small function. And .rdata potentially seem better for lists.

@mlogie
Copy link
Contributor

mlogie commented Dec 21, 2020

I would be in favour of a switch to rds format (and in fact all my current occ mod functions work with rds files). If the implications of a change to a slightly faster format are potentially far reaching for minor benefit, I would be in favour of not a change. Other users of sparta would also have to install a new package as well to get this to work, if this became the default output.

@03rcooke
Copy link
Contributor

I'd vote for a change to .rds files instead of .rdata files, we could add an argument filetype where users could specify .rds (the default) or .rdata (the previous method)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants