-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data View: cross-field metadata and their relationship to data visualization #97278
Comments
Pinging @elastic/datavis (Team:DataVis) |
Field metadata drives some of the recommendations: https://data.humdata.org/dataviz-guide/dataviz-elements/#/data-visualization/bar-charts ht @maartenzam |
Related: #73152 |
Pinging @elastic/kibana-visualizations @elastic/kibana-visualizations-external (Team:Visualizations) |
In order to provide better transparency of priorities, issues that will not be prioritized within the next 24 months are being closed. Tracking request in Lens general improvements ice box #184648 |
Keywords: metadata, field, recommender, data view, shared visual attributes, datavis best practices
Examples in response to Vijay's request in moving from index patterns to Data View.
Cross-field metadata
Not all metadata neatly belong to a specific field or an entire index. Sometimes it's about relationship between two or more fields within an index or even, across indices. Examples for metadata across fields, and their utility for visual exploration:
Fields whose contents relate to one another
Hierarchical relationship between fields
One field breaks down another. Examples:
It's good to know if the subunit can even be used on its own. Eg. "Paris" can be "France/Île-de-France/Paris" or "US/Texas/Paris", so, on its own, it's ambiguous, unless the City field is a unique code.
Visualizations that work well across the hierarchy:
Styling of hierarchical data might follow a primary breakdown, eg. also projected to color, while the deeper nodes inherit that (or fade out, like the sunburst):
Multidimensional variables
Usually, there are several discrete (categorical or ordinal) variables associated with documents. They collectively represent slicing and dicing ability (explorability, drilldown, drill-through etc.). In a given chart, usually only one (very rarely, two) can utilize a color mapping.
Functional dependency: independent variables vs dependent variables
Knowledge or inference of which field(s) determine the value of other field(s).
Examples:
Often, exploratory interaction is about filtering or navigating in the realm of independent variables / dimensions, while the quantities and categories of dependent variables are aggregated (or in contrast, disaggregated) and visualized.
Time and space dependency
Most metrics in an index may change over time, and/or spatial dimensions where available. It's useful to default to eg. a time series view or map view (recommender) and offer suitable visualization choices, eg. lines, if the time series is reasonably continuous.
Explanatory relationship
Key and text field pair:
Visualization and data exploration impact:
Redundant metrics
Certain metrics may redundantly encode the same information (eg. same phenomenon, different unit) or may contain precomputed values (eg. elapsed time, MB, MB/s).
Physical data representation changes over time
For example, user name of a given user changes; name of country changes; or an upstream logging system gets fixed. The new values may be in another field. A Data View may make the change disappear, by abstracting over. Benefits:
Independence of metrics
If there's no established relationship among certain fields, they can be assumed independent of one another. This doesn't mean no correlation, and showing correlations is probably a good idea, eg. via scatterplot, SPLOM, parcoords.
Shared attributes
Here, multiple fields relate to one another through common properties. This can happen across fields within the same index, or among fields that are in disparate indices.
Shared nominal types (semantic domains)
While field types are present in Elasticsearch, they represent physical domains.
For example, a part to whole ratio may be represented
float
in the indexA "megabytes transferred" metric may be represented
integer
in the indexThe physical type doesn't give much useful information for what transforms and visualizations may be even legitimate.
Nominal (semantic) types are required for
Nominal typing may include these, and more:
0
means, no error,404
means, page not found etc. Or even, some kind of index numberNote: such typing information may eventually enable more compact representation in Elasticsearch.
Several fields that reference a shared semantic type are meaningfully related. Example: both
buildings_index
androads_index
have a field for occupied land area. They share a unit (eg. square meters) and they share the property of additivity. These two fields may even be linked to a common metadata descriptor (DRY principle in data modeling). Therefore, a report, visualization or data transform may safely add land areas of buildings and roads, to get summarized land occupance.Even just the knowledge of shareed, or convertible unit is useful for dataviz, because then they can be projected to a common vertical scale.
Shared visual attributes
Due to compatible nominal types,
It's desirable that visual recommenders and defaults exploit common value=>aesthetic mapping when possible. Besides compatible nominal types, the default value=>aesthetic mapping can be associated with specific Data View fields, or even, across multiple Data Views.
Therefore, default mappings are first class entities which can be referenced by fields in Data Views (this still allows the implicit creation of mappings, if not shared among Data Views, for the user's convenience; can be made explicit and extracted when needed)
See also Beyond palettes
Multi-index Data Views
Sometimes data that relate to one another are not in the same index or index* group. Eg.
A future Data View may reference multiple index (or index) entities*, with metadata in Data View associating the relationship among indices and their fields (see cross-index fields)
Derived information in Data Views
Eventually, a Data View should be able to represent an aggregation, filtering or other data transformation of its input (indices, or another, more granular Data View).
Even in this case, field level metadata is useful, per field and across fields. Because the ultimate use in visual analytics is the same, and it requires various kinds of metadata.
So, Data Views may eventually become composable. Example: different parts of the organization may need
Even if there's a single dashboard, or a set of dashboards that share a bunch of fields, it may be worth creating a common Data View for that, atop of a possibly preexisting Data View, so that theming and mappings can be shared:
Vavaliya et al: Online Performance Assessment System for Urban Water Supply and Sanitation Services in India)
A Data View that represents data transformation actually generates metadata. For example, a grouping aggregation will yield unique rows in terms of the values in fields that are part of the grouping dimensions.
The text was updated successfully, but these errors were encountered: