-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Applying a lot of styles causes spreadsheet to need repair #32
Comments
Hmmm... it seems that Excel has an undocumented limit of about 65k styles. xlsx-populate currently creates a new style for each cell, which isn't the most efficient. There are 3 style items on the roadmap that each should help with your problem (in order of priority):
Unfortunately, none of these are in place right now. The only suggestion I have for the moment is to style a workbook in Excel, save it in your repo, and then populate the template with xlsx-populate. |
Ahhh got it thanks. # 3 would be the best approach for us. I'll see if the template idea works, thanks so much for this library! |
@dtjohnson Are you aware of any memory issues when working with very large datasets (I.E. inserting basic text values into tens of thousands of rows across dozens of columns)? My company would be willing to sponsor your time to look into this issue we are having. Please let me know if you would be open to that. |
@rickharrison, I have a very demanding day job already so I'm not interested in compensation, but I'd love it if you'd support the project by submitting code, fielding issues, writing docs, etc. :) There was an earlier issue with regard to memory usage. I did some enhancements and was able to populate about 2 million cells in V8 (less in IE). That was about the upper limit though. There is some more optimization possible to maybe get it to 3 or 4 million, but I'd prefer to get more of the XLSX spec covered before optimizing too much. If you are adding styles that will chew up a lot more memory. I did just push this branch which has initial support for row/column styles. I'm happy to look at your workbook (or a representative example) to see if anything strange pops out. |
Hmm do you have any suggestions on how to get around all the object instantiation? All we actually need to do is insert basic text columns. Should we try to edit the sheet1.xml file directly? |
From my tests it seems the major culprit is that for each cell node in the XML we instantiate two child arrays to represent the XML structure. Those arrays take up a lot of memory. It's certainly possible to eliminate them, but it may only get us to 3 million cells... You might also consider trying two other libraries: xlsx and ExcelJS. I can't vouch for them personally. In order to preserve existing workbook features, xlsx-populate uses an inefficient data model. These other libraries might perform better. |
Thanks for the suggestion. I am using your library to insert super basic text data, but the caveat is I'm inserting it into a pretty complicated spreadsheet. That is why your library has worked so well thus far (and thanks so much for creating it!) Are those arrays created when reading a workbook with data already in it? I could close and re-open the workbook in pages to drop memory |
They are, I'm afraid. Is all of your data in one sheet? Or is it split in multiple? |
There are 7 sheets in the workbook and I am inserting data into the first 2 sheets both of which are blank initially (I could just use the first sheet if that would solve something) |
There's not really a way to get around having an entire sheet in memory. Currently, all sheets are loaded into memory when you parse a workbook. If the data is split into multiple sheets, we could limit the memory hit by only loading sheets that are needed. Otherwise, the only solution is to optimize the data structure for the single sheet. |
@dtjohnson This library has been super awesome to us, but I think I am going to try manually editing the workbook. If I am just inserting very basic data to one sheet is it as simple as adding |
That's the basic idea, but it depends on what exactly you are doing. There are some annoying issues that come up though. On the bright side, I did think of a solution to massively decrease memory usage while still retaining the original workbook features. In my tests I was able to raise the max number of populated cells from 2 million to almost 10 million. It looks like it could go higher, but I ran into the V8 max string length limit when generating the XML. There's more work to do on the code to get it out the door, but stay tuned. |
Awesome, I'm looking forward to your solution :) |
I just pushed the improved performance to https://github.com/dtjohnson/xlsx-populate/tree/performance It's not fully tested yet but you can give it a whirl as-is. I was able to produce a workbook with 9.5 million numbers. You'll get less with other data types and with styles, etc. |
Great, thanks for working on this! I'll try it out |
I just published v1.6.0, which included fully-tested support for this higher performance feature. |
Thanks again for all your hard work! |
Hello, I am facing this issue in an excel export for which I'm applying individual cell styles. @dtjohnson I would like to know how to take the number 1 approach you mentioned: Because a style is created per cell, there are many duplicate styles in place. Instead we should be reusing nodes across styles if they are identical. That's a little tricky to do, but it's the right approach. I have a few styles that I would like to reuse across cells. Could you please help me with that? Thank you in advance |
For anyone interested: if you need per-column custom styles, there's an easy way to prevent style overflow. I left a comment in this issue #188 |
When applying a ton of styles to a large amount of cells in a workbook, I then get the message "Excel could not open filename because some content is unreadable. Do you want to open and repair this workbook?"
As an example, I am using this code:
lineItemRowOffset
in this case is around 15,000 so styles are being applied to tens of thousands of cells.Is there a better way to apply styles or is this a bug?
The text was updated successfully, but these errors were encountered: