Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive Memory Usage During the Creation of Large Excel Documents #480

Open
gamerover98 opened this issue Sep 21, 2024 · 0 comments
Open

Comments

@gamerover98
Copy link

Problem

I am experiencing a memory usage issue when creating very large Excel documents using fastexcel. While inserting cells into a worksheet with maximum dimensions (16,384 columns by 1,048,576 rows), I noticed that the memory consumption was extremely high.

Here are some specific observations:

  • Memory Usage: I measured memory consumption with VisualVM and observed usage of over 8 GB of RAM.
    image

  • Memory Profiling: The profiling showed more than 180 million instances of org.dhatim.fastexcel.Cell and over 1 million instances of org.dhatim.fastexcel.Cell[].
    image

Code Example to Reproduce the Issue

package org.example;

import lombok.extern.slf4j.Slf4j;
import org.dhatim.fastexcel.Workbook;
import org.dhatim.fastexcel.Worksheet;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;

@Slf4j
public class Main {

    private static final String EXCEL_RELATIVE_PATH = "./target/data/example.xlsx";
    private static final int COLUMNS = Worksheet.MAX_COLS; // 16_384 columns
    private static final int ROWS = Worksheet.MAX_ROWS;    // 1_048_576 rows

    public static void main(String... args) throws Exception {
        try (var outputStream = new FileOutputStream(getExcelFile());
             var workbook = new Workbook(outputStream, "MyApplication", "1.0");
             var worksheet = workbook.newWorksheet("Sheet 1")) {

            log.debug("Generating data...");

            for (int columnIndex = 0; columnIndex < COLUMNS; columnIndex++) {
                for (int rowIndex = 0; rowIndex < ROWS; rowIndex++) {
                    worksheet.value(rowIndex, columnIndex, "test");
                }
            }

            log.debug("Waiting...");
            // Stop current thread to see memory usage.
            Thread.sleep(Long.MAX_VALUE);
        }
    }

    public static File getExcelFile() throws IOException {
        var file = new File(EXCEL_RELATIVE_PATH);

        if (file.mkdirs()) {
            log.debug("Excel file created");
        }

        if (file.exists()) {
            Files.delete(Path.of(file.toURI()));
            log.debug("Excel file deleted");
        }

        return file;
    }
}

Proposed Solution

I propose implementing a new version of the Worksheet class that writes cells directly to the document without keeping them in memory for potential future modifications. This sequential writing method is particularly useful for those, like me, who need to create large documents "cascadingly" without needing to know the content of previously written cells.

This solution would significantly reduce memory usage for large documents, improving the library’s efficiency in similar scenarios.

Thanks a lot 💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant