Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Progress bar for read_feather for R and a verbose version #43404

Open
ajinkya-k opened this issue Jul 24, 2024 · 6 comments
Open

[R] Progress bar for read_feather for R and a verbose version #43404

ajinkya-k opened this issue Jul 24, 2024 · 6 comments

Comments

@ajinkya-k
Copy link

Describe the enhancement requested

I would like to request a that a progress bar be shown when using read_feather function in R especially for large files, so that the user can see if the file is actually being read and progress is being read, similar to data.table::fread which shows a simple progress bar enabled using the showProgress argument in fread. I have a use case in which I am using read_feather to read a large file into R from a network drive, and there is no indication if R is even making progress on loading the file during some runs. In others it loads in ~300 seconds. fread also has a verbose option which dumps a lot more output, and would also be well worth implementing, but a progress bar at minimum would also be great!

Component(s)

R

@thisisnic thisisnic changed the title Progress bar for read_feather for R and a verbose version [R] Progress bar for read_feather for R and a verbose version Jul 27, 2024
@thisisnic
Copy link
Member

I think this is a great idea @ajinkya-k though I'm not sure how feasible this is as this has been discussed in relation to another piece of functionality and we concluded it would be tricky as it'd require non-trivial updates to the Arrow C++ library.

Out of interest, once you've loaded the file, are you performing further dplyr manipulations? It might be that you get better performance calling open_dataset() on the file so it's not pulled into your R session, running whatever manipulations, and then only calling collect() to pull the relevant bits into memory.

@ajinkya-k
Copy link
Author

Thanks for the update @thisisnic . I do a join and a few filters that drop less than 1% of the rows and then collect, but it's still a huge dataset after that, which I plug into a Bayesian model. The Bayesian model does work, it's just that due to DUA constraints I have to keep the file on a network drive and pull from there. And therefore it's hard to figure out if the file is even being loaded at all, i.e. a progress bar will help me figure out if the read is even progressing at all, or if the network throttling means the process is hung up.

@thisisnic
Copy link
Member

Ah, that makes sense, doesn't sound like there's much else to suggest in terms of temporary workarounds then!

@ajinkya-k
Copy link
Author

@thisisnic I ran the code a few more times and it turns out that the read_feather code was indeed working but it was very very slow compared to loading the exact same data stored as a css using fread function. Is there a known issue with network drives on windows?

@thisisnic
Copy link
Member

I believe I've seen issues with this kind of thing on Windows reading across a network drive though unsure - could be worth comparing with a local file to test.

@ajinkya-k
Copy link
Author

yeah unfortunately cant make a copy of the data on my local machine due to DUA constraints. I might try an opensource dataset to test this though. Any suggestions for dataset?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants