-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable prefix compression for arrays of values #85893
Comments
Pinging @elastic/es-search (Team:Search) |
I think it's also worth considering a dedicated mapping type for this, rather than an array field of keywords. That opens up more options for customizing aggregations and finer grained control of the storage, I think. |
+1 @not-napoleon to achieve this via a dedicated field type ( |
+1 Given the constraints on "synthetic source" regarding reconstructing arrays, a |
To provide some more context about this issue, the goal is to store path-like data such as file paths or stack traces, where we expect lots of redundancy across prefixes. There seems to be some users doing this, and I recently found about this path hierarchy plugin, which gives the ability to create a tree of the file structure from an index that indexes file paths as keywords. At a high-level, I can think of two main routes to improve support for this use-case:
My intuition is that we'd index data exactly the same way in both cases, so this is really about how we think it would be best exposed? |
Since I started the discussion it's just fair to mention that we meanwhile 'flatten' the arrays into a single value by concatenating the array values. There are two reasons to do so:
So this issue doesn't need to be kept open just for us. |
Agreed, closing. |
Description
For continuous profiling we store many documents (order of > 100 million) and thus strive for better compression to reduce the storage size / costs.
We recognized that prefix compression works well for single value fields, but not for arrays.
The arrays we store often start with the same values, so a prefix compression seems to be a natural choice to reduce storage size.
We tested "flattening" the arrays (concatenating the values and storing a single value) to "forcefully" enable prefix compression. This reduced the storage size by >50% (before: 59.7 bytes/record; after: 28.7 bytes/record).
Example of the field mapping (its a doc_values field):
cc @jpountz
The text was updated successfully, but these errors were encountered: