Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional linear regression functions #21630

Merged
merged 1 commit into from
Jan 12, 2024

Conversation

8dukongjian
Copy link
Contributor

@8dukongjian 8dukongjian commented Jan 4, 2024

Description

Fix #21328
Add additional linear regression functions, include:

  • REGR_AVGX
  • REGR_AVGY
  • REGR_COUNT
  • REGR_R2
  • REGR_SXX
  • REGR_SXY
  • REGR_SYY

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Add REGR_AVGX, REGR_AVGY, REGR_COUNT, REGR_R2, REGR_SXX, REGR_SXY, and REGR_SYY.

@steveburnett
Copy link
Contributor

Two things I wanted to mention:

Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worthwhile to add a couple light utilities to reduce the duplication among the aggregation tests. Otherwise code LGTM>

testNonTrivialAggregation(new Double[] {1.0, 4.0, 9.0, 16.0, 25.0}, new Double[] {1.0, 2.0, 3.0, 4.0, 5.0});
}

private void testNonTrivialAggregation(Double[] y, Double[] x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods all look nearly identical, can you create a utility method to consolidate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated. Some methods have been put into the abstract class.

Copy link

github-actions bot commented Jan 8, 2024

Codenotify: Notifying subscribers in CODENOTIFY files for diff e01dc52...bf2d675.

Notify File(s)
@steveburnett presto-docs/src/main/sphinx/functions/aggregate.rst

@8dukongjian
Copy link
Contributor Author

@steveburnett Thanks.

  1. Updated the release note in this PR.
  2. In presto-docs/src/main/sphinx/functions/aggregate.rst, the introduction to the function is added.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding all the functions into the doc! I only had a formatting nit that I noticed when I did a local build of the documentation: the new entries need a line added between the function and the definition. Here's a screenshot of what the doc build looks like:
Screenshot 2024-01-08 at 11 25 13 AM

@@ -590,6 +590,34 @@ Statistical Aggregate Functions
Returns linear regression slope of input values. ``y`` is the dependent
value. ``x`` is the independent value.

.. function:: regr_avgx(y, x) -> double
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. function:: regr_avgx(y, x) -> double
.. function:: regr_avgx(y, x) -> double

Returns the average of the independent value in a group. ``y`` is the dependent
value. ``x`` is the independent value.

.. function:: regr_avgy(y, x) -> double
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. function:: regr_avgy(y, x) -> double
.. function:: regr_avgy(y, x) -> double

Returns the average of the dependent value in a group. ``y`` is the dependent
value. ``x`` is the independent value.

.. function:: regr_count(y, x) -> double
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. function:: regr_count(y, x) -> double
.. function:: regr_count(y, x) -> double

Returns the number of non-null pairs of input values. ``y`` is the dependent
value. ``x`` is the independent value.

.. function:: regr_r2(y, x) -> double
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. function:: regr_r2(y, x) -> double
.. function:: regr_r2(y, x) -> double

Returns the coefficient of determination of the linear regression. ``y`` is the dependent
value. ``x`` is the independent value.

.. function:: regr_sxy(y, x) -> double
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. function:: regr_sxy(y, x) -> double
.. function:: regr_sxy(y, x) -> double

Returns the sum of the product of the dependent and independent values in a group. ``y`` is the dependent
value. ``x`` is the independent value.

.. function:: regr_syy(y, x) -> double
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. function:: regr_syy(y, x) -> double
.. function:: regr_syy(y, x) -> double

Returns the sum of the squares of the dependent values in a group. ``y`` is the dependent
value. ``x`` is the independent value.

.. function:: regr_sxx(y, x) -> double
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. function:: regr_sxx(y, x) -> double
.. function:: regr_sxx(y, x) -> double

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Updated. Here's the new screenshot of what the doc build looks like:
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Updated. Here's the new screenshot of what the doc build looks like:

Thanks! I just now did a new pull and local build of the doc and everything looks good! Updating my review to Approve for the docs.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

New pull of branch and new local build of docs, everything looks good. Thanks!

Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please squash commits and I can help merge. Thank you!

@8dukongjian
Copy link
Contributor Author

@tdcmeehan Thanks, done.

Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, but please make sure the commit follows our guidelines.. Consider Add additional linear regression functions as a title and list out the functions in the body.

@8dukongjian 8dukongjian changed the title Add more Linear Regression functions to Presto Add additional linear regression functions Jan 11, 2024
@8dukongjian
Copy link
Contributor Author

@tdcmeehan Updated the title and description.

@tdcmeehan
Copy link
Contributor

Thanks @8dukongjian, but I meant the commit message--the message in Git history. The link above describes this.

Add REGR_AVGX, REGR_AVGY, REGR_COUNT, REGR_R2, REGR_SXX, REGR_SXY, and REGR_SYY.
@8dukongjian
Copy link
Contributor Author

@tdcmeehan Thanks, done.

@tdcmeehan tdcmeehan merged commit d0b658e into prestodb:master Jan 12, 2024
57 checks passed
@wanglinsong wanglinsong mentioned this pull request Feb 12, 2024
64 tasks
@8dukongjian 8dukongjian deleted the add_regr_functions branch March 5, 2024 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add more Linear Regression functions to Presto
3 participants