-
-
Notifications
You must be signed in to change notification settings - Fork 8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support large excel files #61
Comments
It's certainly "possible". Here are the roadblocks:
On a side note, have you tried saving the file as XLSB? IME it's about half the size of the equivalent XLSX/XLSM |
I am interested in this too. Going through XLSB will help with regard to the zip library but what I am most interested in is number 2 to avoid building the parsed object in memory: nextSheet, nextRow is the interface I would use. Also @SheetJSDev do you intend to add callbacks to the interface for async processing of the parsed data? |
@hmalphettes I used the phrase 'event-driven' but yes that will involve callback functions. We'd have to look through the test suite, but my initial concern is that certain data won't be available until you process the entire file (for example, if dimensions are not included in the sheet, the parser keeps track of the first and last seen column as well as the first and last seen row when calculating the range for the worksheet). Comments, merge cells and other metadata potentially could be stored after the actual sheet data (although we'd have to dig into some test cases to see what writers do) |
OK @SheetJSDev. |
@SheetJSDev, it will be nice to figure out, what the max file size supported today |
@antonama it's a tricky question because XLSX/XLSM/XLSB are zip files and some very large sheets can be stored in a small file (and vice versa -- a small string-heavy sheet will appear to be big). For example, the command line tool It "handles" the file insofar as it took a full minute on my computer. YMMV. IE6 seems to crash on files with more than 30 cells (it appears that IE6 tries to warn about a slow-running script, but that is thwarted by the fact that the script locks up the browser while processing) Due to github's file size limitation, I don't have file larger than 50MB in the test suite. |
@SheetJSDev, your file sizes are really impressive. I'm trying to convert file ~2MB size and I get blank output. And I don't have any other 'jazz' except raw text. Could you assume, why? |
@antonama can you share that file (either email me directly -- sheetjs at the google email service -- or put the file up in a public location) and describe your conditions (OS, browser version, webpage/script) ? |
@SheetJSDev, hmmm, I'm really sorry. I've just noticed macros in file. Is it may be the issue? |
I doubt macros are the issue. If you are seeing a blank screen (no output) can you check the javascript console? In Chrome, hit control-shift-j on windows / command-option-j on mac. |
Is large document suppose crash tab process? I have 4mb xlsx which converted from xls 17mb file, after 15 seconds of processing chrome tab crashes |
Could you share a file? If not, can you at least share how many cells are Also, can you check if the node module
|
Sure here it is https://yadi.sk/d/gB5Qe8yMbXosL |
The website uses the gh-pages branch. Can you check against the master
Then go to http://localhost:8000 and try to repeat the test. I can
|
I though gh-pages runing on master. Anyway master branch works for this file correctly. But I did try another even more complex file (9,2mb) and this time master branch crash process too. its more complex file with 126 sheets inside of with different sizes of rows per sheet. I can share it privately if you let me know how. |
You can email it to the gmail account listed at https://github.com/SheetJS or to the email address in the git log (the commit email is hosted on my own server directly) Sent from my iPad
|
So it appears disabling "Web Workers" helps with larger files, and from what I can see its even little bit faster |
I could offer a different API if it makes it simpler and should not require a heavy rewrite: The options may accept an event emitter (or any object with the method "emit(name,...args)).
I'd recommend all events to be passed with a context reference on which the passed arguments should be found.
The only thing I cannot consder in full by myself is the implication of calculated fields. I could offer for this
What do you think? |
We've started addressing the general performance problem in node:
function get_cell(ws/*:Worksheet*/, addr/*:string*/) {
if(!Array.isArray(ws)) return ws[addr];
var a = X.utils.decode_cell(addr);
return (ws[a.r]||[])[a.c];
}
Besides the general issue of non-conformant files, there are more hurdles preventing a true streaming read solution:
Now, it's possible to conserve memory at the expense of time by performing two passes of the worksheets: collect relevant metadata in the first pass then emit cell/row events in the second pass. Most formats store in row-major order so it's fairly straightforward to implement (some quattro pro formats use column major order but those aren't show-stoppers). @osher the original stumbling block was the ZIP/CFB container format. You basically have to read the whole file into memory to start processing. If that behavior is acceptable, and if it's acceptable to limit the feature set, we could easily patch in support for streaming read |
|
FWIW, I've experienced much-worse-than-linear performance on very large spreadsheets even stored in plaintext SpreadsheetML (bookType: The spreadsheet I was filling out was pretty sparse, but it had columns going out to
(This was on node 12.20, via node-xlsx) |
@SheetJSDev Is it possible to read an excel file with 1M rows and the file size is around 220MB. How can I read such a file? |
Had an experience when a browser extension caused infinite parsing(
Smaller files were parsed successfully even with enabled Metamask ext. |
hello,How do you deal with this problem?I came across this, too |
I'm getting the following errors on a javascript script file that I'm trying to run:
Here is the code I'm trying to run:
Would really appreciate some help with this, thanks |
I am trying to use the api json_to_sheet to create excel file(I return 300K rows from backend API) on client side/browser. Sheet_to_json seems to error out with this much data. Any help or alternate ? please help Though, I have created an issue for this , I thought active community could help if they have already resolved it for themselves |
Small suggestion about this. I was looking into the source code and augmented it to write the result of the parsed xml data to file instead of variable. This way there is no memory issue, although the final result must be rendered as well.
This line can be replace by a for loop that writes to file if the result is larger than given number. I didn't want to open a PR with feature because I ended up using files without parsing into a final result, but maybe there is a more generic solution that can be implemented into the library at some point ? Something like |
@davidfoxio ODS files are ZIP files under the hood and the library tries to decompress the @devgrigor The general approach would use a callback or iterator or stream, generating X rows at a time and allowing the end code to process. This would avoid building up the entire worksheet in memory. It works well for simple row-level operations like generating SQL statements or CSV rows but does not have the flexibility of the in-memory object approach |
@SheetJSDev I am afraid the file does not exceed the limit. The file I've used was 55 MB. And it didn't throw an error it just stopped working once it reached 400231th row :) when I opened task manager it showed that the node process uses 2.7GB memory of RAM. I believe it was waiting for a memory to be available to continue. This was simple node process (an empty js file with nothing than read operation of XLSX) Basically the code was the following
It is about having a single variable with that size, once the underlying code was modified to use files to write down the json the |
@SheetJSDev I'm trying to parse a ".xlsx" file to "JSON" uploads from the Client/Browser with a Sheet that has 30k rows and 685 columns but getting an "empty string" value in "Sheets". Then I reduced it from 30k to 22.5k rows and it worked correctly. The following code below uses Reactjs as a Client Framework. const onUploadFile = useCallback(
(event: React.ChangeEvent<HTMLInputElement>) => {
const reader = new FileReader();
const file = event.target.files?.[0];
if (!file) return;
reader.readAsBinaryString(file);
reader.onload = (e: ProgressEvent<FileReader>) => {
const data = e.target?.result;
const workbook = read(data, { type: "binary", dense: true });
const result: Record<string, unknown> = {};
workbook.SheetNames.forEach((sheetName: string) => {
const roa = utils.sheet_to_json(workbook.Sheets[sheetName]);
result[sheetName] = roa.length > 0 ? roa : [];
});
};
},
[]
); The "Employee Data" has a value of "empty string": |
I need some help understanding the row, column, and memory limits of this library. I had a 126 MB XLSX file with more than 1,000,000 rows and 27 columns which failed to ingest - the library found zero rows in the worksheet.
|
You Guys facing the issue of browser freezing try to use workers for me they really helped me
read excel file
|
currently this module will load the excel file into memory and build the js model, but it doesn't work for large file because memory limitation.
Is it possible to provide a stream like api so that we can handle only part of data at one time?
The text was updated successfully, but these errors were encountered: