-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unit context for 0-sided views #1235
Conversation
Add unit context - ctx0 with no configuration at all that reads straight from gstate WIP: use unit ctx in JS, indexed updates/removes still broken WIP: fix JS tests WIP: get_pkeys no longer push_back
column order no longer matters for unit context as long as num_columns == table.num_columns more tests, print inside traversal::step_end
unit context = no pivot/sort/filter/computed, any column order/num of columns read m_delta_pkeys instead of get_delta_pkeys() cleanup
Implement unit context in python add more python tests, make get_row_expanded return bool fix windows build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks for the PR!
I can independently confirm the benchmark results, too. Some improvements I'd like to look into in the future from review:
- In Emscripten I believe there is quite a bit of code generation associated with these repeated Context APIs, which leads to larger client assets in JS and WASM. Is there? If so, does embind support virtual dispatch? If not, can we perform a switch within a single dispatch C++ function so we do not need embind to generate the entire context API for each of 4 (eventually 5) context types?
- Contexts could use a cleanup, e.g.
FMODE_SIMPLE_CLAUSES
,combiner
, etc .. - I concur with e.g.
size()
->num_rows()
, and IMO this is worth just applying consistently across the board.
auto columns = view_config->get_columns(); | ||
auto filter_op = view_config->get_filter_op(); | ||
auto fterm = view_config->get_fterm(); | ||
auto computed_columns = view_config->get_computed_columns(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a t_config
which initializes these to the empty values we alreayd know these to be?
// TODO: int/float/date/datetime pkeys are already sorted here, so if | ||
// there was a way to assert that `psp_pkey` is a string typed column, | ||
// we can conditional the sort on whether m_sortby.size() > 0 or if | ||
// psp_pkey is a string column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this still needs to be re-sorted - this std::sort
just guarantees overlapping indices will be contiguous.
* | ||
* @return t_uindex | ||
*/ | ||
t_uindex size() const; | ||
t_uindex num_columns() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may as well go all the way and apply this change to Table.size()
API in JS and Python!
This PR enhances the performance of certain 0-sided Views in Perspective by 2x-10x, depending on data size.
Each
View
is backed by a context object, which maintains its own traversal of the underlying master table. This traversal allows the user to read data out based on the order of primary keys, allows for pivots to traverse the underlying datasets, allows for sorts to be applied to the subset of data in a context, etc. In the case where the context maintains a basically trivial traversal, where the order that it reads data out is equivalent to the order data is stored in the underlying table, and when it does not have to apply any sorts, filters, or computed columns, we can skip the creation of a traversal entirely, and avoid the overhead of storing primary keys, sorting them, and converting row indices to primary keys.The unit context is a context object that has no traversal and reads directly from the underlying master table of the gnode. Internally, it offers the same API as all other context types, and all construction around unit contexts occurs in internal code and has no bearing to the public API.
Externally, the unit context offers a massive performance improvement in a large use case—when the View has no pivots, sorts, filters, or computed columns, and the Table does not have a user-specified index. On a Table with a user-specified index, data must be read out in the same order as primary keys, which is different from the underlying stored order in the master table. However, this PR will allow for future improvements to this behavior.
Changelog
Benchmarks
Javascript benchmarks show a massive improvement in View creation time, and slight improvements in serialization time and time to create a delta.
In Python, where I've benchmarked this PR against much larger datasets (5m rows), the performance of
view()
is almost equivalent to the performance ofopen_view()
, which simply provides a handle to an already-created view on the server. Over large datasets and multiple, parallel clients, the unit context massively reduces the overhead ofview()
, resulting in a 5x-10x improvement in performance over a regularctx0
.