-
-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute an ngram field for all admin data #347
Conversation
fd88383
to
196760b
Compare
This looks really promising. It looks like the disk usage is the same now, which is awesome. I wonder if the cost of what I imagine is a fairly complex inverted index is actually in memory usage. What's the memory usage of an Elasticsearch instance running this branch, vs master? |
|
I tested this full planet build out a little bit. I'd like to do a bit more testing but here are my findings so far:
|
I spent a day investigating the memory implications of adding this additional field. I loaded a 'control' snapshot with a full planet build and also the 'admin-ngrams' snapshot to compare memory. What I noticed is that the heap naturally goes through size increases followed by GC purges, these cycles are perfectly normal in a garbage-collected language and so, even on a cold-started ES server you'll see it climb to ~6% and go up to ~11% and then fall back down and repeat (I was allocating ~10GB RAM to the heap in my tests). When I executed the acceptance-tests (no changes to API to use the new index, so the index should never be touched) I noticed that the JVM usage climbed slightly, now hitting a peak of around ~12% instead of ~11% and falling to around ~8% instead of 6% after a GC cycle. I then change the I'm going to go ahead and merge this, the only significant impact is the additional ~20% disk space used, we'd now suggest allocating a minimum of 450GB for the full planet build. |
This PR is a replacement for #345 which uses the multi-fields feature of elasticsearch in order to achieve the same thing but using less HDD space.
See linked PR for original issue notes and comments.
Closes #345