-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Odd Performance Characteristics In Benchmarks #1695
Comments
Thanks for your issue. There are two kinds of functions in the excelize library: normal mode functions and stream mode functions. The stream mode function is used to generate or reading a worksheet with the amount of data in lower resource usage, please try to using using rows iterator like this: package main
import (
"fmt"
"github.com/xuri/excelize/v2"
)
func main() {
// Open workbook
file, err := excelize.OpenFile(`NYC_311_SR_2010-2020-sample-1M.xlsx`)
if err != nil {
fmt.Println(err)
return
}
defer func() {
// Close the spreadsheet.
if err := file.Close(); err != nil {
fmt.Println(err)
}
}()
// Get worksheet
rows, err := file.Rows("NYC_311_SR_2010-2020-sample-1M")
if err != nil {
fmt.Println(err)
return
}
for rows.Next() {
}
} 2.6 GHz 6-Core Intel Core i7, 16 GB 2667 MHz DDR4, 500GB SSD, macOS Sonoma 14.0, go1.20 darwin/amd64
|
Thanks for the quick response. This is the updated data. Benchmark 1: excelize.exe
Time (mean ± σ): 44.254 s ± 0.574 s [User: 46.071 s, System: 7.754 s]
Range (min … max): 42.947 s … 44.911 s 10 runs I still notice the writes that are being done. I guess this is just part of the implementation? I'll be sure to update the benchmarks in |
To avoid high memory usage for reading large files, this library allows user-specific UnzipXMLSizeLimit options when opening the workbook, to set the memory limit on the unzipping worksheet and shared string table in bytes, worksheet XML will be extracted to the system temporary directory when the file size is over this value, so you can see that data written in reading mode, and you can change the default for that to avoid this behavior. Also reference the docs and issue #1581. |
Previous `excelize` data was gotten using an improper iterator. New code comes from [here](qax-os/excelize#1695 (comment)).
I closed this. If you have any questions, please let me know to reopen this anytime. |
Description
When benchmarking for
calamine
and updating the readme with the info to try to see where its performance is in the language ecosystems, I usedexcelize
as the library for go. During the benchmarking I noticed odd behavior.This is the program I put together. Taken and modified from the example.
The benchmarks gave this result:
I'm an outsider coming with basically zero go knowledge, so excuse me if this is for nothing, but most benchmarks in a Rust vs Go are usually not that far apart. A 7.9x difference seems out of the ordinary.
In another benchmark, I noticed some excessive reading. 11x the file size on disk:
As well as writing, when there is no writing logic in the program:
The cpu also has a lot of spikes, from what I presume is garbage collection
The dataset was from https://raw.githubusercontent.com/wiki/jqnatividad/qsv/files/NYC_311_SR_2010-2020-sample-1M.7z saved as an
xlsx
file. 1M rows, 41 columns, 28M cells with values in it.Output of
go version
:Excelize version or commit ID:
Environment details (OS, Microsoft Excel™ version, physical, etc.):
OS:
Windows 11
CPU:
RYZEN 9 5900X @ 4GHz
SSD:
Sabrent 2TB Gen 4 PCIE
The text was updated successfully, but these errors were encountered: