Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read action takes too long on a large csv #6

Open
briankariuki opened this issue May 6, 2024 · 7 comments
Open

Read action takes too long on a large csv #6

briankariuki opened this issue May 6, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@briankariuki
Copy link

I have a csv of about 25000 records and a resource that uses this csv. Using Api.read takes a very long time even when reading only the first 5 records.

@briankariuki briankariuki added the bug Something isn't working label May 6, 2024
@zachdaniel
Copy link
Contributor

zachdaniel commented May 6, 2024

What is a very long time? I'm on a pretty overpowered machine, but a file w/ 20k records just returned for me in .25s. With that said, I'll make some improvements to how our reading logic works to make it leverage the streaming nature of the file read.

@zachdaniel
Copy link
Contributor

After I wrote that I realized that .25s is pretty crazy long time for this to take :) I was just thinking when you said "very long time" that it would be much longer. I'm pushing something to main shortly that improves a fair amount, but likely there are just lots of small optimizations to be made in the way we load the data from the CSV file into their actual resource structs.

@zachdaniel
Copy link
Contributor

Will be looking more into this, as it highlights a few places worth optimizing. I also want to see how much of the slow down is coming from the csv parser vs Ash, but I'd venture a guess that it's mostly loading and validating stored types into memory taking the most time.

@zachdaniel
Copy link
Contributor

zachdaniel commented May 6, 2024

I've pushed some non trivial performance improvements up. Give main a try and let me know how it goes :)

@briankariuki
Copy link
Author

What is a very long time? I'm on a pretty overpowered machine, but a file w/ 20k records just returned for me in .25s. With that said, I'll make some improvements to how our reading logic works to make it leverage the streaming nature of the file read.

More than a minute I'm afraid. I can share the csv and resource for you to try out.

@briankariuki
Copy link
Author

I've pushed some non trivial performance improvements up. Give main a try and let me know how it goes :)

Let me try the main branch and report back.

Also does ash_csv work with ash_json_api? I got some error while trying to read the resource via an api endpoint

@zachdaniel
Copy link
Contributor

No reason it shouldn't. What's the error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants