-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data source size #36
Comments
I've definitely used PivotTable.js with up to 200k rows of input data, and it does slow down at that point, but certainly it's quite snappy in the 25k-50k rows of input. The time it takes to generate a table varies with the number of records and with the size of the table (i.e. the cardinality of the attributes you use as rows and columns): if you are making a table that is more than around 200x200 things will definitely start to take a while. The issue in #29 was with actually loading the data up into memory before passing it into the What does your data look like and what sort of table are you trying to generate? |
[{"ID": "1","Geography": "United States","Disease Area": "","Valu Version": "","Product": "","Type": "A. Population","Variable": "Population","Column": "Country","Value": "USA","Formula": "","For Calc": "","Is Protected": "Y","Format": ""},{"ID": "2","Geography": "United States","Disease Area": "","Valu Version": "","Product": "","Type": "A. Population","Variable": "Population","Column": "YoY Growth (%)","Value": "0.00975","Formula": "","For Calc": "","Is Protected": "N","Format": "0.000%"},{"ID": "3","Geography": "United States","Disease Area": "","Valu Version": "","Product": "","Type": "A. Population","Variable": "Population","Column": "2010","Value": "317640000","Formula": "","For Calc": "","Is Protected": "N","Format": "#,##0"},{"ID": "4","Geography": "United States","Disease Area": "","Valu Version": "","Product": "","Type": "A. Population","Variable": "Population","Column": "2011","Value": "320736990","Formula": "C15+(C15_B15)","For Calc": "","Is Protected": "Y","Format": "#,##0"},{"ID": "5","Geography": "United States","Disease Area": "","Valu Version": "","Product": "","Type": "A. Population","Variable": "Population","Column": "2012","Value": "323864175.6525","Formula": "D15+(D15_B15)","For Calc": "","Is Protected": "Y","Format": "#,##0"}] All kinds of tables. When it works, the code I'm using lets me pick from a dropdown. List of values: Table, Table Barchart, Heatmap, Row Heatmap, Col Heatmap, Line Chart, Bar Chart, Area Chart I understand it will take longer to render more data, but I've walked away for over five minutes on 20k and nothing has displayed. Yet the data was transferred within seconds... |
That's very strange, I can't replicate this behaviour on my end: it's working fine right now with 25k of data in my browser. What OS/browser versions are you running? If you look in the javascript debug console are you seeing any helpful output or error messages? Is your data proprietary or can you send me a to-scale sample of what you're running into problems with? |
Very proprietary. Windows 7 x64 in latest versions of Firefox and chrome, tested on three
|
Hi again, OK, so I've generated a random file to test with, it's here: https://www.dropbox.com/s/05yifl75chcws5y/bigdata.csv It's 20k lines of 26 columns of random letters: each attribute has a cardinality of 26. If I browse to the example which uses in-browser CSV parsing here http://nicolaskruchten.github.io/pivottable/examples/local.html I'm able to load it up and make some fairly large 26x26 tables and charts very quickly and 26x26x26 tables and charts in around 1.5 seconds. Admittedly when I make tables that have two attributes in the columns and two in the rows (i.e. 26x26x26x26 cells) things slow down a LOT but it does eventually render a table. Are you seeing different behaviour on your end? Does your data have much higher cardinality than 25 or so? How many cells are you expecting to see in the final table which is not generating correctly? |
Just following up... Did you get a chance to try with my sample data? |
Hi Nicholas, I was able to test this dataset this morning, and although it loaded smoothly, I do not think this can be considered a valid test-case since you're changing multiple variables: content and datasize. Your data takes up less than a quarter of the space than mine, so I would assume this would have an impact on how the pivot table is able to load the data into memory pre-rendering. I'm not really sure what you're asking: "Are you seeing different behaviour on your end? Does your data have much higher cardinality than 25 or so? How many cells are you expecting to see in the final table which is not generating correctly?"
|
Hi again, OK, I've regenerated the CSV file, it's at the same URL as before but now it's 7.5MB instead of 1MB and it's 50k lines instead of 20k lines. Still works well on my end... You too? My third question wasn't about how many cells were generating correctly, I get that you're seeing 0 :) My question was more: if you saw what you were expecting, how many cells would that have? Let me know, I'd love to help find a solution! Cheers, |
It still works for me too, despite sometimes giving a javascript error when the table gets too big. I'm not sure how to answer that. When I am able to load the pivot table I'm not using any default parameters, the pivot table is instantiated with no columns or rows set, so I guess I expect to see 0? |
Hmm OK, in that case I'm sort of stumped. Is the sample data file comparable to yours in terms of size now (number of rows, number of attributes, cardinality of attributes, total size in megabytes)? |
Sure, it has more rows than I've been able to load so far, it has more columns than I'm trying to use, I think my cardinality must be lower and it's smaller in megabytes. The only differences I can see are content (you have a single letter per record... obviously this is easier for the javascript to parse) and formatting (json vs csv) |
So the attribute values aren't parsed unless some aggregator requires parsing. The default aggregator is Perhaps it's just a JSON parsing thing but I've worked with larger JSON objects in the past. I'll try to regenerate my data in JSON and we'll see what's what :) Is the code you're using similar to the code in one of the examples or is there a lot of other stuff going on as well? |
The configuration of the pivot table is pretty basic: google.load("visualization", "1", {packages:["corechart", "charteditor"]});
|
OK. Have you ruled out the possibility that the browser is choking on parsing the incoming JSON? I.e. if you put an |
Hi, is there any update on this issue? |
Sorry Nicolas, we've had to focus on other areas of development so we are Thanks for developing such a great plugin.
|
Hi Nicolas, We're taking another pass at getting the PivotTable to accommodate our data volume rather than restricting the input. I was wondering if you could provide a copy of the dataset used to support the following "Why is it good?" statement: "works acceptably fast in Chrome on commodity hardware up to around a hundred thousand records, depending on the cardinality of the attributes."? As I began re-testing this I started where you advised, including an alert before the pivotUI call. In lower-volume sets (~<20,000) the alert loads in a timely fashion, however, when increasing the number of rows towards 50,000, I never get an alert (or do after many, many minutes of seeing a blank page). I believe this confirms your initial suspicion, that the browser is choking when it tries to parse my JSON, right? If so, I'm not sure how to proceed. I guess I can accept the fact that I just can't load that many records due to the complexity of my data in the JSON, however, that would be a shame. A colleague suggested that this problem may be alleviated, at least in part, by moving some of the data processing (especially the aggregation functions) to PHP rather than Javascript. This will require many more AJAX calls when the user changes chart type, plotted columns, etc., but I think the performance increase would justify the development time. Did you ever consider or put any time into creating a version that supports some level of server-side processing? Do aggregated values get calculated when the PivotTable is initialized or when that particular aggregation type is selected? |
Hi again, Thanks for getting back to me! It seems from your tests with the alert() calls that the problem is indeed happening outside the pivot library itself, in the data parsing/preparation step... If the issue is a problem strictly with JSON parsing (rather than simply manipulating a very large array in memory), then perhaps you could try the same solution as the one that worked for issue #29 which was to do multiple requests to the server for smaller chunks of JSON-formatted data, which are then parsed separately and merged before passing them to the pivot table. You could also switch to a simpler JSON format: right now you seem to be using the array-of-objects format (based on one of your much-earlier answers) but the JSON would be far far smaller if you used the array-of-arrays format. (Examples here: https://github.com/nicolaskruchten/pivottable/wiki/Input-Formats) Here my answers to your other questions:
|
Wow thanks for the fast reply. Thanks for the suggestion on JSON data formatting, I am going to try converting my datasource from array of objects to array of arrays later today to see if that affects performance at all. However, as to your suggestion of chunking the JSON, I will quote my original post: "But I'm not really sure how to do this. I can't find anything in the documentation that lets data be loaded in piecemeal, after the initial instantiation of the table. Is this possible?" Your reply seemed to indicate that there was no way to load data to the PivotTable in segments, so I did not continue thinking about this. Can you provide any pointers as to how the author of #29 was able to modify the plugin's core to allow for this? I revisited the cardinality discussion from prior posts and researched the actual meaning a little bit more. Unfortunately I think our data has an unlimited level of cardinality, if such a thing is possible? I don't believe any two values in our set are the same. And if they are, it is purely coincidence as they are very precise: eg. have many decimal points. |
Since we've narrowed the issue down to JSON parsing, let's leave cardinality issues aside for the moment... My chunking suggestion, and the solution to issue #29, is not to load the data into the pivot table piecemeal, but rather to parse it piecemeal, merge it and to load it in one go. Compare the following pseudocode: var input = JSON.parse(string_that_is_too_big) // this takes way too long //now call x.pivot(input...) with this: // step 1: parse var input_chunk1 = JSON.parse(chunk1_string); // chunk1_string is small enough to work var input_chunk2 = JSON.parse(chunk2_string); // chunk2_string is small enough to work //step 2: merge var input = [].concat(input_chunk1, input_chunk2); //now call x.pivot(input...) Does this make sense? |
Yes, that makes a lot of sense, thanks for the walkthrough. I'm going to try this out later tonight. Do you recommend I try this before changing from array of objects to array of arrays? I'm going to include some more info I uncovered from my testing: Where values correspond to number of rows, size of JSON response, time to retrieve JSON from server and time to render PivotTable after $(document).ready() - Using ---- to indicate that the PivotTable never became visible. As you can see, as before in my OP, some sort of threshold is being reached after 15.2k records. This led me to think that there was some kind of data problem, so I changed somethings and found: 20k 2.8MB 2.7s 9.547s -- every column (replacing " and ' in value and column) 60k 8.2MB 4.44s -- every column (replacing " and ' in value and column) - ERRORS Uncaught RangeError: Maximum call stack size exceeded As you can see, when I use str_replace in PHP to remove quotes from the values, before feeding the JSON to the PivotTable, I am able to load far more records (up to 55.5k) however the table takes upwards of thirty seconds to render (after jquery.ready). Is there anything I can do to optimize this time? And I've just been informed that our prior target of being able to handle 300,000 rows is not still accurate. We now have 1.7 million records and that is potentially the tip of the iceberg. Do you think there's a chance if I'm chunking, using arrays of arrays and doing a lot of the math on the server side that it could potentially handle this much data? |
Another issue we're having is that the PivotTable is not rendering within IE7 or 8. I mean, I see the skeleton of a table which only says "Total." No configuration options or data is visible. Have you encountered this at all? I know from browsing to the demo that the PivotTable is supported, and just that the Google Charts portion is excluded, but I'm not sure what could be different between your demo page and my test environment that's breaking IE7/8. Any insights? |
OK, lots going on here!
|
Hi @nicolaskruchten |
I would be very interested in seeing a 500-row input which will cause this to hang! Can you share it please? |
Simply construct an array like: [ This does not allow any grouping and table buildup/DOM nodes become the bottlenecks. |
Ah OK if you actually try to build a 10M-cell table, yeah, rendering will be a problem. But the initial parse of the data and default render (i.e. one cell containing the counts) will work etc. |
Yes, parsing is no the issue. HTML table build up and rendering are. Have you considered using alternative to HTML table that use virtual rendering of the viewport? (e.g., Slickgrid) Or any other architectural strategy you can recommend to prevent the browser from hanging? |
Certainly one could write a new Renderer which uses a different strategy than DOM operations to build a table. That said, I'm not sure how useful it is to be able to render tables that big and sparse, from a UI point of view, so I have not put any effort into solving this problem... :) |
Fully agree that it is not useful. It is important to prevent user from hanging the browser by mistake (e.g., automatic projections of rows in a spreadsheet). I am talking about defensive coding. It would be great if you could point us to a place in the code where we can stop DOM operations in case of high cardinality. Thanks. |
Sorry for the long delay in answering here.... You could stop DOM operations in the TableRenderer based on the number of rows/cols. |
Hi Nicolas, how about get data from url ? |
@yolnda please don't comment on old, closed issues. Please create a new issue with a clearer description of your question. |
Hi, I'm having trouble implementing a rather large data source through the pivot table. The source is currently about 300,000 rows with 12 columns.
As I begin to increase the number of rows I'm selecting, it gets slower and slower, eventually not loading at all. I mean, the data is returned from the server, but the page is just blank. Here are some numbers:
1,000 1.12MB 581ms
10,000 2.4MB 939ms
15,000 3.6MB 1.46s
50,000 12.0MB 4.76s
This is # of records, size of response and response time in Chrome x64.
Anything after 15,000 does not render a chart or table at all.
I see someone posted in #29 a similar issue and he said that he resolved it by breaking his ajax requests apart. But I'm not really sure how to do this. I can't find anything in the documentation that lets data be loaded in piecemeal, after the initial instantiation of the table. Is this possible?
The text was updated successfully, but these errors were encountered: