Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of memory-efficient Excel writing for large documents #481

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gamerover98
Copy link

I have implemented an initial version that does not use memory to write a document. This solution is useful in cases where it’s necessary to create large Excel documents sequentially, meaning there’s no need for a second manipulation of the cells already added to the document.

The issue is described here: #480


Currently, this code only allows text to be written into a cell, and there is no functionality yet for changing its style. I would also like to point out that, due to the rush in which this modification was made, you should see this pull request not as a final change but as a way to demonstrate that even more can be achieved with Fastexcel!

You can compare the memory usage difference using VisualVM with the images from the issue:

  • Memory usage:
    image

  • Memory Profiling:
    image

The difference is remarkable, isn’t it? You can now create huge files without wasting memory!

Here’s the code used for the tests:

package org.example;

import lombok.extern.slf4j.Slf4j;
import org.dhatim.fastexcel.Workbook;
import org.dhatim.fastexcel.Worksheet;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;

@Slf4j
public class Main {

    private static final String EXCEL_RELATIVE_PATH = "./target/data/example.xlsx";
    private static final int COLUMNS = 100;
    private static final int ROWS = 10000000;

    public static void main(String... args) throws Exception {
        try (var outputStream = new FileOutputStream(getExcelFile());
             var workbook = new Workbook(outputStream, "MyApplication", "1.0");
             var worksheet = workbook.newStreamWorksheet("Sheet 1")) {

           // Start writing on the document.
            worksheet.start(COLUMNS);

            log.debug("Generating data...");

            for (int rowIndex = 0; rowIndex < ROWS; rowIndex++) {
                worksheet.appendRow(column -> "test"); // Append a new row and set "test" at each cell.
            }

           // The autoclosable will close the worksheet and the workbook.
        }
    }

    public static File getExcelFile() throws IOException {
        var file = new File(EXCEL_RELATIVE_PATH);

        if (file.mkdirs()) {
            log.debug("Excel file created");
        }

        if (file.exists()) {
            Files.delete(Path.of(file.toURI()));
            log.debug("Excel file deleted");
        }

        return file;
    }
}

Lastly, I’d like to mention that test cases haven’t been added due to the nature of this pull request. If this approach moves forward, they will be included.

Thank you all <3

@tiago-s-vieira-alb
Copy link

Hi @gamerover98 , how can I test with multiple different values per row?

This replicates the same value in all the columns:
worksheet.appendRow(column -> "test"); // Append a new row and set "test" at each cell.

How can I append to that row different values?

@gamerover98
Copy link
Author

gamerover98 commented Nov 20, 2024

Hi @tiago-s-vieira-alb,

in worksheet.appendRow(column -> "test"); the column is the index of the column.

9adf2ae#diff-70d50d3cdef0a64c1a61b46213e78789bfd3f304132a741dcc93bb87afde5627R128

If I remember correctly, the Cell class is not exposed in the API, so I only included the index.


So, to answer your question:

for (int rowIndex = 0; rowIndex < ROWS; rowIndex++) {
    worksheet.appendRow(column -> {
        return switch (column) {
            case 0 -> "first column";
            case 1 -> "second column";
            ...
            case n-th-column: -> "latest column";
            default: ...
        }
    });
}

@tiago-s-vieira-alb
Copy link

tiago-s-vieira-alb commented Nov 20, 2024

Thank you.. we already tested with some lambda functions and it works!

Did you tried to apply some styles? Merging cells?

@gamerover98
Copy link
Author

Did you tried to apply some styles? Merging cells?

Nope, this pull request is only meant to demonstrate that it is possible to do better than what Fastexcel currently offers. However, if I remember correctly, I left comments in the source to point out the lack of decorative features for the cells.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants