Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow Performance when there are lots of merge cells & calculations #1448

Closed
nathj07 opened this issue Jan 13, 2023 · 1 comment
Closed

Slow Performance when there are lots of merge cells & calculations #1448

nathj07 opened this issue Jan 13, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@nathj07
Copy link
Contributor

nathj07 commented Jan 13, 2023

Description

I am using this library to read text out of XLSX files, and generally it works incredibly well. Typically the performance is fantastic, well, well under 1 second. However, I have on file, which sadly I cannot share, where the time to extract the text is ~10minutes. What I have been able to do isprofile that file and see that the issue is in mergeCellsParser. The exact sheet in question does have merged cells, among other things.

Steps to reproduce the issue:

Using a file I can share you can still see the issue. The file,
merge_test.xlsx, has two columns with ~5,000 rows each, but these columns are merged A-B, and merged C-D.

This file takes 1m50s to extract text from.

A snippet of the code I am using, though not the full code:

    f, err := excelize.OpenFile(xlsxFile, excelOpts)
	if err != nil {
		return nil, fmt.Errorf("excelize failed to open file %s: %w", xlsxFile, err)
	}
	defer func() {
		// Close the spreadsheet.
		if err := f.Close(); err != nil {
			logger.WithError(err).Error("failed to close XLSX file")
		}
	}()

	// Get extras that we need, comments first
	comments, err := f.GetComments()
	if err != nil {
		return nil, fmt.Errorf("failed to GetComments: %w", err)
	}

	sheetRels, err := msx.findRels(arc, ExcelRelsXML, "xl/")
	if err != nil {
		return nil, err
	}

	// Now we can process the main data from the file using excelize
	for i, sheet := range f.GetSheetList() {
		rows, err := f.GetRows(sheet, excelOpts)
		if err != nil {
			return nil, fmt.Errorf("failed to GetRows for %s: %w", xlsxFile, err)
		}
		if _, err := sb.WriteString(sheet + "\n"); err != nil {
			return nil, fmt.Errorf("failed to write the sheet title %q: %s", sheet, err)
		}
		for j, row := range rows {
			for k, colCell := range row {
				txt := colCell
				// cellName e.g. "A1" is needed to get cell hyperlinks.
				cellName, err := excelize.CoordinatesToCellName(k+1, j+1) // this is not 0 based
				if err != nil {
					return nil, fmt.Errorf("failed to get cell name for %d-%d: %w", k+1, j+1, err)
				}
				hasLink, link, err := f.GetCellHyperLink(sheet, cellName)
				if err != nil {
					// log this, it is not worth aborting the whole extraction for
					logger.WithError(err).WithField("cellName", cellName).Warn("failed to get link for cell")
				}
				if hasLink {
					txt = colCell + " " + link
				}
				if _, err := sb.WriteString(txt + " "); err != nil {
					return nil, fmt.Errorf("failed to write string to builder: %w", err)
				}
			}
		}

As I say go's profiling shows the issue is.
pprof001
This profile was generated using the worst case scenario file that takes 10mins to read

Describe the results you received:
The text is extracted correctly, however it takes ~10minutes

Describe the results you expected:
I would expect it to be faster, even if it were longer than normal, I think under 30s is reasonable

Output of go version:

go version go1.19.4 darwin/amd64

Excelize version or commit ID:

github.com/xuri/excelize/v2 v2.7.0

Environment details (OS, Microsoft Excel™ version, physical, etc.):
This performance is particularly

@xuri xuri added the enhancement New feature or request label Jan 13, 2023
@nathj07 nathj07 changed the title Slow Performance when there are lots of calculations Slow Performance when there are lots of merge cells & calculations Jan 16, 2023
@xuri xuri closed this as completed in a34c81e Mar 20, 2023
@xuri
Copy link
Member

xuri commented Mar 20, 2023

This issue has been fixed, please try to upgrade master branch code. This patch will be released in the next version.

fudali113 pushed a commit to fudali113/excelize that referenced this issue Apr 17, 2023
xuri pushed a commit to JDavidVR/excelize that referenced this issue Jul 11, 2023
jenbonzhang pushed a commit to jenbonzhang/excelize that referenced this issue Oct 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants