Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Dataverse page, to issue fewer queries and load faster. #2777

Closed
landreev opened this issue Nov 30, 2015 · 20 comments
Closed

Optimize Dataverse page, to issue fewer queries and load faster. #2777

landreev opened this issue Nov 30, 2015 · 20 comments
Assignees
Labels
UX & UI: Design This issue needs input on the design of the UI and from the product owner
Milestone

Comments

@landreev
Copy link
Contributor

Dataverse page still issues 1,200+ queries to load the default home page in production.
Custom/native queries can be used to speed things up. (Raman already worked out some solutions for the mydata page).
Also, an important subtask of this is #2776 - optimizing handling of thumbnails. This alone generates a surprising number of database requests; hence it has a dedicated ticket of its own.

fyi: Card attribute list can be found under "Next Steps": #2474

@landreev landreev added UX & UI: Design This issue needs input on the design of the UI and from the product owner Status: Dev labels Nov 30, 2015
@landreev landreev added this to the 4.2.2 milestone Nov 30, 2015
@landreev landreev self-assigned this Nov 30, 2015
landreev added a commit that referenced this issue Nov 30, 2015
landreev added a commit that referenced this issue Dec 1, 2015
@landreev landreev assigned kcondon and unassigned landreev Dec 2, 2015
@landreev
Copy link
Contributor Author

landreev commented Dec 2, 2015

I can see 2 areas of QA process for this issue:

  1. regression - compare the functionality to the 4.2.1;
  2. performance. verify savings in the # of queries against 4.2.1? stress test, compare to 4.2.1?

for best results the same db should be used.
I also have a preserved version of a 4.2.2 war file, built before my dataverse page fixes were added.

@landreev
Copy link
Contributor Author

landreev commented Dec 2, 2015

(a new build is needed; i've already made a commit today)

@landreev
Copy link
Contributor Author

landreev commented Dec 3, 2015

OK, there are following known limitations/things that may be messed up:

  • no "linked" icon;
  • extra info for tabular files may not be shown;
  • harvested cards - not sure if displayed correctly.
  • mydata (and potentially any other components using SearchServiceBean) are not available...

But it is fast!! lol

I have solutions for these things; will wire them on Friday. Really had to leave today.
Anything else you find - please add.

@kcondon
Copy link
Contributor

kcondon commented Dec 3, 2015

Missing from the harvest card is the harvested icon on the dataset card but the linked icon instead and a note at the bottom that reads:
This dataset is harvested from our partners at the ICPSR at the University of Michigan. Clicking the link will take you directly to the archival source of the data.

@kcondon
Copy link
Contributor

kcondon commented Dec 3, 2015

For tabular data, for comparison on what is missing, for the same file:

production:
Tabular Data - 228.3 KB - 1559 Variables, 77 Observations - UNF: UNF:5:OJzZAAxyY+KzHZUBuzGj1A==

4.2.2:
Tab-Delimited - 228.3 KB - MD5:

@mheppler
Copy link
Contributor

mheppler commented Dec 3, 2015

I am going to take the list of to-do items from both Leonid's comment and Kevin's comment, and combined them here. Please add additional items here, for one easy to read checklist, rather than a running dialog of new comments.

  • Linked datasets card
  • No "linked" icon
  • Harvested dataset card
  • No "harvested" icon
  • No info msg: "This dataset is harvested from our partners at the ICPSR at the University of Michigan. Clicking the link will take you directly to the archival source of the data."
  • Files card
    • No file tags
    • No extra info (variables, observations, UNF, etc.) for tabular data
  • My Data (and potentially any other components using SearchServiceBean) are not available...

@kcondon
Copy link
Contributor

kcondon commented Dec 4, 2015

Benchmark summary:
Both 4.2.1 and 4.2.2 had the same max load: 240 concurrent users but the response time at the maximum and the growth in response time under load varied considerably: 4.2.1 sharply increased, 4.2.2 stayed relatively flat.
Query count summary:
4.2.2 query counts were a factor of 10 less than 4.2.1:
4.2.2 was roughly 280 for the homepage, 250 for the files facet and 4.2.1 was 3500 for the homepage, 7200 for the files facet.

@kcondon
Copy link
Contributor

kcondon commented Dec 4, 2015

Benchmark
4.2.1
Users  Resp (ms) /  Resp (ms) File Facet  Total requests    RPS  Fail  Max Mem  CPU
    5         1689                  3187              99    0.4     0    6.5GB   4.7%
  100         2811                  8228             518    6       0    7.2GB  50%
  200        16665                 56335             518    5.5     0   10GB    60%
  240        22538                 73538             625    5       0   10GB    64%

4.2.2
Users  Resp (ms) /  Resp (ms) File Facet  Total requests    RPS  Fail  Max Mem  CPU
    5          746                   780             100    0.4     0      7GB   3.5%
  100          609                   645             791    8       0      7GB  35%
  200          699                   752            1120   15       0      8GB  70%
  240          775                   811            1426   18       0     10GB  80%

Query Count
4.2.1
User Status          /  File Facet
Not logged in     3586        7695
dvAdmin           3306        6196

4.2.2
User Status         /   File Facet
Not logged in     283          246
dvAdmin           326          283

@mheppler
Copy link
Contributor

mheppler commented Dec 4, 2015

😵

@landreev
Copy link
Contributor Author

landreev commented Dec 4, 2015

Tabular data files: cards should now have proper display of UNF, var counts, etc.

@kcondon
Copy link
Contributor

kcondon commented Dec 4, 2015

my data and tabular data file cards now ok.
However, icons for cards now appear to be sized and positioned incorrectly now. Opened as a separate ticket #2799 and assigned to @mheppler

@landreev
Copy link
Contributor Author

landreev commented Dec 4, 2015

Cards for harvested dataverses and datasets should be properly formatted now.
[Kevin]
type icons are present now but citation missing from harvested datasets. Discussed with Leonid, requires some extra work and he will open a separate ticket.

@landreev
Copy link
Contributor Author

landreev commented Dec 4, 2015

Cards for harvested files should also be properly formatted now.
[Kevin]
Yes, same as in production but may want to revisit some aspects of the harvested file card:
data/various-formats - 0 bytes - MD5:
Opened as a suggestion, #2801

@landreev
Copy link
Contributor Author

landreev commented Dec 7, 2015

Linked objects - should now be displayed properly.
[Kevin]
Linked objects are displayed correctly.

@landreev
Copy link
Contributor Author

landreev commented Dec 7, 2015

File categories - should now be displayed in the file card.
[Kevin]
Yes, file category tags are appearing correctly.

@landreev
Copy link
Contributor Author

landreev commented Dec 7, 2015

Per @mheppler 's list of TODO items:

To the best of my knowledge, all the issue have been resolved.
With the single exception being the tabular data tags. I can easily retrieve them (at the cost of one extra db query per page); however, I feel we should just start indexing these things - same way we are already indexing file categories. This way they'll be available from SOLR, with no need to retrieve them from the db on page load...
I'll bring this up at the meeting in the AM - whether it's ok to add extra indexable fields at this point.
And I will fix it, one way or another, after the meeting.

@pdurbin
Copy link
Member

pdurbin commented Dec 7, 2015

tabular data tags. I can easily retrieve them (at the cost of one extra db query per page); however, I feel we should just start indexing these things - same way we are already indexing file categories.

A few things.

Here's how the current "File Tag" facet ("categories" in the code looks in production as of v. 4.2.1):

harvard_dataverse_-_2015-12-07_09 10 14

Here's how you can edit "File Tags" vs "Tabular File Tags":

spruce_goose_-spruce_dataverse-_2015-12-07_09 23 22

Here's the DataTags homepage/branding:

datatags_-_2015-12-07_09 29 11

I don't have any objection to indexing Tabular Data Tags in principle but I want us to think about the use of the word "tags". It feels overloaded to me already, and we haven't integrated DataTags yet (#871). /cc @mcrosas @eaquigley

@kcondon
Copy link
Contributor

kcondon commented Dec 7, 2015

Found some issues:

  1. @landreev Citation is missing from harvested datasets:
    Production:
    ICPSR, 2014, "Toward a Healthy America: Selected Research Data from the Health and Medical Care Archive at ICPSR", http://hdl.handle.net/1902.29/CD-11511
    Opened separate ticket Make sure citations are shown for harvested datasets, just as in 4.2.1. #2800
    4.2.2:
    blank but with a thin blue line which would be the background color of the citation.
  2. Card icons for dv, ds, file on my data incorrectly sized and positioned.
    Opened as a separate ticket My Data: Card icons are incorrectly sized and positioned, regression introduced in 4.2.2. #2799 and assigned to @mheppler

All that's left for this ticket is tabular data tags, opened as a separate ticket. Closing this ticket.

@kcondon kcondon closed this as completed Dec 7, 2015
@landreev
Copy link
Contributor Author

landreev commented Dec 7, 2015

Just an info update:
We realized today the 4.2.1 query-counting test, above, was done with a pre-production build of 4.2.1.
But I re-ran the tests today, with the final, production 4.2.1 build - and the numbers are virtually the same. (they don't have to be exactly the same, because some datasets have been reindexed since last week - so the data objects loaded on the home page are not the same).
This makes sense, because the dataverse page wasn't touched much in 4.2.1; but some changes were made to it (the number of the Setting table lookups must have been reduced); so we weren't sure.

So yeah, 10:1 reduction or better, in query numbers...

@pdurbin
Copy link
Member

pdurbin commented Dec 11, 2015

All that's left for this ticket is tabular data tags, opened as a separate ticket.

That issue is #2802

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
UX & UI: Design This issue needs input on the design of the UI and from the product owner
Projects
None yet
Development

No branches or pull requests

4 participants