Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add map_union_min and map_union_max aggregate functions #9123

Open
whutjs opened this issue Mar 18, 2024 · 5 comments
Open

Add map_union_min and map_union_max aggregate functions #9123

whutjs opened this issue Mar 18, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@whutjs
Copy link
Contributor

whutjs commented Mar 18, 2024

Description

While Presto does not currently include the functions map_union_min and map_union_max, these functions are frequently used in our own use cases. We are interested in knowing whether the Velox community would be open to considering the addition of these functions. Any feedback is highly appreciated.

@whutjs
Copy link
Contributor Author

whutjs commented Mar 18, 2024

@mbasmanova Would you mind to review this PR or give us some feedback? Thanks.

@mbasmanova
Copy link
Contributor

@whutjs

these functions are frequently used in our own use cases.

Would you clarify a bit about this use case? Which engine do you use?

Have you considered opening an issue in PrestoDB to add these functions?

CC: @kagamiori @kgpai @kaikalur

@whutjs
Copy link
Contributor Author

whutjs commented Mar 19, 2024

@mbasmanova

Would you clarify a bit about this use case? Which engine do you use?

We don't utilize any open-source query engines; instead, we develop our own query engines from scratch. Our engine is designed for querying data from a time-series database and is primarily implemented in Java. Last year, we began integrating Velox into our Java query engine, resulting in impressive performance improvements. Bravo to the Velox project! :)
Due to the distinct data model of our time-series database, we utilize the map<long, double> type to store the time-series data. This is why we require additional aggregation functions beyond map_union_sum.

Have you considered opening an issue in PrestoDB to add these functions?

I am not familiar with PrestoDB actually.

@mbasmanova
Copy link
Contributor

@whutjs Thank you for sharing this context. Happy to hear you found Velox useful.

Due to the distinct data model of our time-series database, we utilize the map<long, double> type to store the time-series data. This is why we require additional aggregation functions beyond map_union_sum.

I see. Would it be fair to say that you may need to apply any aggregate function in a map_union_xxx style, not only sum, min, max?

I am not familiar with PrestoDB actually.

PrestoDB project is hosted at https://github.com/prestodb/presto . Velox comes with 2 sets of function packages: PrestoSQL and SparkSQL. Presto package includes functions from Presto. These functions signatures and semantics match Presto 100%. Spark package includes functions from Spark and these match Spark semantics with ANSI SQL configuration disabled.

Typically, engines other than Presto (Prestissimo) and Spark (Gluten) choose which function package to use to provide their own package. In the latter case these functions are placed in the engine's code repo.

To add map_union_min/max to Velox repo, these need to be available either in Presto or Spark. Hence, if you are interested, you may open an issue in PrestoDB GitHub repo and see if folks there would be open to add these functions. Once they are available in PrestoDB, we can add them to Velox.

Here is a recent issue that added a number of linear regression aggregate functions to PrestoDB: prestodb/presto#21328 and then to Velox: #8381

Hope this helps.

@whutjs
Copy link
Contributor Author

whutjs commented Mar 19, 2024

@mbasmanova

Would it be fair to say that you may need to apply any aggregate function in a map_union_xxx style, not only sum, min, max?

You are right. We need map_union_min/max/avg/count/sum aggregate functions currently.

To add map_union_min/max to Velox repo, these need to be available either in Presto or Spark. Hence, if you are interested, you may open an issue in PrestoDB GitHub repo and see if folks there would be open to add these functions. Once they are available in PrestoDB, we can add them to Velox.

I understand now. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants