Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PoC] adding basic apache arrow support to expressions #183909

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ppisljar
Copy link
Member

@ppisljar ppisljar commented May 21, 2024

Summary

The goal of this PR is to test out how feasible it is to convert from kibana datatable to arrow table and vice versa. This will allow us to further test converting various parts of kibana to this new data format.

This adds basic support for apache arrow to expressions:

  • new 'arrow' datatype and conversion from/to datatable were added
  • sample arrowlog function which logs first row of the table to the console
  • sample demoarrow function which generates a simple arrow dataset
  • legacy metric visualization was converted to use new arrow format

with this we can test converting from/to and allows us to:

  • add data fetching functions with support for arrow format. Any legacy data fetching methods can coexist with the old ones.
  • add support for arrow format to charts. we can update charts one by one and leave legacy ones as they are.
  • test performance of converting from/to kibana datatable/arrow

How to test

to test this you will need to add the following to your kibana.yaml, as arrow library is using unsafe eval under the hood:

csp.script_src: ['unsafe-eval']

then go to canvas and try with theese expressions:

demoarrow rows=1000 | table generates arrow table and converts it to datatable so it can be rendered by table
demoarrow | arrowlog generates arrow table and logs first row to the console
demodata | arrowlog | table generates datatable, converts to arrow table, logs first row, converts to datatable and renders the table
demoarrow rows=1 | arrowlog | legacyMetricVis percentageMode=false colorMode="None" showLabels=true metric={visdimension accessor=0 format="number"} to test with legacy metric vis

Issues discovered

  • current implementation of arrow for js requires unsafe-eval enabled.
  • there is very little documentation available
  • apis seems to be changing rapidly so most information online is outdated

relevant content online

https://arrow.apache.org/docs/js/
https://observablehq.com/@theneuralbit/using-apache-arrow-js-with-large-datasets
https://arrow.apache.org/docs/js/classes/Arrow_dom.Table.html
https://github.com/apache/arrow/blob/main/js/src/table.ts

outcome

  • converting from kibana datatable to apache arrow and vice versa is straight forward
  • adapting existing visualizations is not hard, but might involve significant amount of work depending on the implementation of visualization. legacy metric vis was converted in matter of hours.
  • its hard to measure performance improvement without full stack support for arrow format. Assuming its a tabular format, which is similar to js array of objects, we can assume that we will not see any relevant speedups just for the reason we are reading arrow format rather than array of js objects. Performance benefits might become before being consumed by visualisations or if in the future we start working with bigger datasets and/or processing data in web assembly using gpu.
  • performance:
    • it takes around 700ms to convert 1 million rows to arrow and around 5s to convert from arrow to datatable.
    • it takes around 10ms to convert 10.000 rows to arrow and around 50ms to convert from arrow to datatable.
    • a single pass over the table with 1 million rows (just calculating sum of a value of a single column) takes 500ms on datatable and 5000ms on arrow table
    • accessing data in columnar way (column per column rather than row per row) is faster, but still slow (takes around 3s to convert the same table, single pass on a single column is fast (300ms))

whats required to actually move forward:

  • arrow implementation up to the expression level (search service, kibana core, elasticsearch client, elasticsearch)
  • solution for unsafe-eval issue
  • estimated cost of implementation in lens/elasticcharts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant