Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to iterate over rows? #42

Closed
maciej-jaworski opened this issue Nov 28, 2023 · 2 comments · Fixed by #44
Closed

Ability to iterate over rows? #42

maciej-jaworski opened this issue Nov 28, 2023 · 2 comments · Fixed by #44

Comments

@maciej-jaworski
Copy link

(I'm ignorant about how the underlying rust code works so maybe this is not feasible),

Would it be possible to add support for iterating over rows in a sheet without loading all of them into memory (similar to iter_rows that openpyxl has)?

Dealing with some larger files and while the compute performance is amazing, I end up allocating loads of memory (400+ MB for 80mb file, using iter_rows from openpyxl helps bring this down to 40mb, but it takes 5-6x longer so obviously I'd prefer to use this package).

@dimastbk
Copy link
Owner

Hi! I created PoC, but:

  1. Calamine doesn't support lazy loading (support DataTypeRef for shared strings and worksheet CellsReader tafia/calamine#370). And I prefer to wait to merge this PR.
  2. Due to pyo3 and calamine limitation, we should use unsafe for iteration over the Rust structure, and it's unsafe. I need some time to research it.

@dimastbk
Copy link
Owner

Due to limitation of pyo3 we can't add truly iterating over rust iterator (see PyO3/pyo3#1085). So, I added iterating over rust range #43 (after calamine read whole sheet in memory), this can decrease memory allocation in some cases (see benchmark).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants