Add map_union_min and map_union_max aggregate functions #9123

whutjs · 2024-03-18T11:10:46Z

Description

While Presto does not currently include the functions map_union_min and map_union_max, these functions are frequently used in our own use cases. We are interested in knowing whether the Velox community would be open to considering the addition of these functions. Any feedback is highly appreciated.

The text was updated successfully, but these errors were encountered:

whutjs · 2024-03-18T11:29:56Z

@mbasmanova Would you mind to review this PR or give us some feedback? Thanks.

mbasmanova · 2024-03-19T07:09:19Z

@whutjs

these functions are frequently used in our own use cases.

Would you clarify a bit about this use case? Which engine do you use?

Have you considered opening an issue in PrestoDB to add these functions?

CC: @kagamiori @kgpai @kaikalur

whutjs · 2024-03-19T07:41:52Z

@mbasmanova

Would you clarify a bit about this use case? Which engine do you use?

We don't utilize any open-source query engines; instead, we develop our own query engines from scratch. Our engine is designed for querying data from a time-series database and is primarily implemented in Java. Last year, we began integrating Velox into our Java query engine, resulting in impressive performance improvements. Bravo to the Velox project! :)
Due to the distinct data model of our time-series database, we utilize the map<long, double> type to store the time-series data. This is why we require additional aggregation functions beyond map_union_sum.

Have you considered opening an issue in PrestoDB to add these functions?

I am not familiar with PrestoDB actually.

mbasmanova · 2024-03-19T12:13:59Z

@whutjs Thank you for sharing this context. Happy to hear you found Velox useful.

Due to the distinct data model of our time-series database, we utilize the map<long, double> type to store the time-series data. This is why we require additional aggregation functions beyond map_union_sum.

I see. Would it be fair to say that you may need to apply any aggregate function in a map_union_xxx style, not only sum, min, max?

I am not familiar with PrestoDB actually.

PrestoDB project is hosted at https://github.com/prestodb/presto . Velox comes with 2 sets of function packages: PrestoSQL and SparkSQL. Presto package includes functions from Presto. These functions signatures and semantics match Presto 100%. Spark package includes functions from Spark and these match Spark semantics with ANSI SQL configuration disabled.

Typically, engines other than Presto (Prestissimo) and Spark (Gluten) choose which function package to use to provide their own package. In the latter case these functions are placed in the engine's code repo.

To add map_union_min/max to Velox repo, these need to be available either in Presto or Spark. Hence, if you are interested, you may open an issue in PrestoDB GitHub repo and see if folks there would be open to add these functions. Once they are available in PrestoDB, we can add them to Velox.

Here is a recent issue that added a number of linear regression aggregate functions to PrestoDB: prestodb/presto#21328 and then to Velox: #8381

Hope this helps.

whutjs · 2024-03-19T12:35:02Z

@mbasmanova

Would it be fair to say that you may need to apply any aggregate function in a map_union_xxx style, not only sum, min, max?

You are right. We need map_union_min/max/avg/count/sum aggregate functions currently.

To add map_union_min/max to Velox repo, these need to be available either in Presto or Spark. Hence, if you are interested, you may open an issue in PrestoDB GitHub repo and see if folks there would be open to add these functions. Once they are available in PrestoDB, we can add them to Velox.

I understand now. Thanks

whutjs added the enhancement New feature or request label Mar 18, 2024

whutjs mentioned this issue Mar 18, 2024

Implement aggregate function map_union_min and map_union_max #9122

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add map_union_min and map_union_max aggregate functions #9123

Add map_union_min and map_union_max aggregate functions #9123

whutjs commented Mar 18, 2024

whutjs commented Mar 18, 2024

mbasmanova commented Mar 19, 2024

whutjs commented Mar 19, 2024

mbasmanova commented Mar 19, 2024

whutjs commented Mar 19, 2024

Add map_union_min and map_union_max aggregate functions #9123

Add map_union_min and map_union_max aggregate functions #9123

Comments

whutjs commented Mar 18, 2024

Description

whutjs commented Mar 18, 2024

mbasmanova commented Mar 19, 2024

whutjs commented Mar 19, 2024

mbasmanova commented Mar 19, 2024

whutjs commented Mar 19, 2024