Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(docs): add benchmarks and plots in readme #367

Merged
merged 5 commits into from
Oct 25, 2023

Conversation

RoloEdits
Copy link
Contributor

Went through and benchmarked some other libraries to see where calamine stood compared to other ecosystems. Decided to add it to the docs. As well as, after seeing the results, file an issue for excelize.

I wanted to add umya-spreadsheet, but it didn't seem to have any way to directly iterate over the rows? At least I couldn't tell from the wording in the docs nor the function signitures. If you manage to figure out a way to do that, and want another rust comparison, I don't mind adding it.

Git history is a bit messy with fixes, squashing might be best.

@RoloEdits
Copy link
Contributor Author

Got some pointers from a maintainer of excelize, I need to update the data. I'll try to get to it soon as I can.

@dimastbk
Copy link
Contributor

calamine vs openpyxl (read_only mode), python3.11 on my PC:

Benchmark 1: calamine
  Time (mean ± σ):     21.299 s ±  0.093 s    [User: 20.361 s, System: 0.931 s]
  Range (min … max):   21.193 s … 21.512 s    10 runs

Benchmark 1: openpyxl
  Time (mean ± σ):     134.424 s ±  0.582 s    [User: 133.749 s, System: 0.654 s]
  Range (min … max):   133.057 s … 135.192 s    10 runs

Code:

from openpyxl import load_workbook


wb = load_workbook(filename='NYC_311_SR_2010-2020-sample-1M.xlsx', read_only=True)
ws = wb['NYC_311_SR_2010-2020-sample-1M']

for row in ws.rows:
    _ = row

# Close the workbook after reading
wb.close()

@dimastbk
Copy link
Contributor

I wanted to add umya-spreadsheet, but it didn't seem to have any way to directly iterate over the rows?

I didn't find this too. With this code, application allocate over 10 GB memory and I killed it.

    let path = std::path::Path::new("NYC_311_SR_2010-2020-sample-1M.xlsx");
    let book = umya_spreadsheet::reader::xlsx::read(path).unwrap();
    let sheet = book.get_sheet_by_name("NYC_311_SR_2010-2020-sample-1M").unwrap();
    let _ = sheet.get_collection_to_hashmap();

    // OR
    let path = std::path::Path::new("NYC_311_SR_2010-2020-sample-1M.xlsx");
    let book = umya_spreadsheet::reader::xlsx::lazy_read(path).unwrap();
    let _ = book.get_lazy_read_sheet_cells(&0).unwrap();

Previous `excelize` data was gotten using an improper iterator. New code comes from [here](qax-os/excelize#1695 (comment)).
@dimastbk
Copy link
Contributor

What version of python did you use?

python3.11 138.470 s
python3.10 158.893 s

@RoloEdits
Copy link
Contributor Author

@dimastbk Python 3.11.5. What kind of hardware are you using?

@dimastbk
Copy link
Contributor

Thanks. I just surprised so big different between python3.10 and 3.11.
Intel® Core™ i7-9700, KDE Neon 5.27

@RoloEdits
Copy link
Contributor Author

I'm also interested in how much slower mine is compared to yours. 100 seconds. I'm not even sure what could account for that much difference.

@tafia tafia merged commit c66195c into tafia:master Oct 25, 2023
3 of 4 checks passed
@tafia
Copy link
Owner

tafia commented Oct 25, 2023

Thanks!
Very informative

@RoloEdits RoloEdits deleted the docs/lib_comp branch October 25, 2023 05:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants