Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: reduce memory in to_python() (return PyList instead of Vec) #44

Merged
merged 1 commit into from
Dec 1, 2023

Conversation

dimastbk
Copy link
Owner

@dimastbk dimastbk commented Dec 1, 2023

Partial fix #42. Return PyList instead of Vec - remove unnecessary reallocation.

Memory:
example file (saved as xlsx in LibreOffice).
image

Perormance:
A benchmark for reading files from https://github.com/MarkPflug/Benchmarks/tree/main/source/Benchmarks/Data using pandas.read_excel (first - all file, second - first 10 rows).

Before
[75.00%] ··· io.excel.ReadExcel.time_read_excel
[75.00%] ··· ========== ============ ============ ============ ============
             --                                 ext                        
             ---------- ---------------------------------------------------
               engine       xlsx         xlsm         xlsb         xls     
             ========== ============ ============ ============ ============
              calamine   1.29±0.01s   1.28±0.01s    821±4ms      816±7ms   
                None     6.62±0.01s   6.66±0.02s   4.83±0.01s   1.65±0.01s 
             ========== ============ ============ ============ ============

[100.00%] ··· io.excel.ReadExcelNRows.time_read_excel 
[100.00%] ··· ========== ============ ============ ========== =========
              --                              ext                      
              ---------- ----------------------------------------------
                engine       xlsx         xlsm        xlsb       xls   
              ========== ============ ============ ========== =========
               calamine    576±2ms      578±3ms     99.1±1ms   102±2ms 
                 None     16.8±0.6ms   16.4±0.6ms   61.6±2ms   1.04±0s 
              ========== ============ ============ ========== =========
After
[75.00%] ··· io.excel.ReadExcel.time_read_excel
[75.00%] ··· ========== ============ ============ ========== ============
             --                                ext                       
             ---------- -------------------------------------------------
               engine       xlsx         xlsm        xlsb        xls     
             ========== ============ ============ ========== ============
              calamine   1.23±0.01s    1.23±0s     760±10ms    753±3ms   
                None     6.60±0.02s   6.59±0.04s   4.66±0s    1.64±0.01s 
             ========== ============ ============ ========== ============

[100.00%] ··· io.excel.ReadExcelNRows.time_read_excel 
[100.00%] ··· ========== ============ ============ ========== =========
              --                              ext                      
              ---------- ----------------------------------------------
                engine       xlsx         xlsm        xlsb       xls   
              ========== ============ ============ ========== =========
               calamine    581±5ms      577±3ms     99.3±3ms   103±2ms 
                 None     17.7±0.3ms   17.5±0.5ms   61.4±2ms   1.04±0s 
              ========== ============ ============ ========== =========

@dimastbk
Copy link
Owner Author

dimastbk commented Dec 1, 2023

Using PyTuple (black line) get more performance, but it's breaking changes for some cases..
image

@dimastbk dimastbk marked this pull request as ready for review December 1, 2023 08:23
@dimastbk dimastbk merged commit c71f3c8 into master Dec 1, 2023
27 checks passed
@dimastbk dimastbk deleted the fix-memory branch December 1, 2023 08:31
@dimastbk dimastbk mentioned this pull request Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability to iterate over rows?
1 participant