-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add trendline PPL command #3071
Add trendline PPL command #3071
Conversation
I have this almost hooked up. I loaded the students table which has name, gpa, and grad_year fields. I get the following JSON result of null arrays: However if change the PPL to use an alias that happens to have the same name as the original field: Is it correct that ProjectOperator does not use the schema from its input? |
I would expect the only field out of this schema to be the one computation in trendline ("foo"), rather than all 3 fields in the real index, but perhaps I'm mistaken here. |
@vamsi-amazon @penghuo can you please verify ? |
Possible design for trendline output schema:
|
Requesting reviews from @LantaoJin @MaxKsyunz |
@jduo did you manage to review the spark
|
@YANG-DB , I used the PPL parser code from the Spark PR. The schema semantics seem to be the same AFAIK, but I haven't tried the Spark one out. Same with the handling of results when there aren't enough samples (returning NULL) @kt-eliatra ? |
Signed-off-by: James Duong <[email protected]>
Signed-off-by: James Duong <[email protected]>
Signed-off-by: James Duong <[email protected]>
Also evaluate the computation type in the parser Signed-off-by: James Duong <[email protected]>
Signed-off-by: James Duong <[email protected]>
Sort by creating a LogicalSort between the input plan and LogicalTrendline Signed-off-by: James Duong <[email protected]>
Also add examples with the sort option and without an alias Signed-off-by: James Duong <[email protected]>
Signed-off-by: James Duong <[email protected]>
Signed-off-by: James Duong <[email protected]>
Signed-off-by: James Duong <[email protected]>
Signed-off-by: James Duong <[email protected]>
Signed-off-by: James Duong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks LGTM !!
@jduo can u plz check the failed CI tasks ?
Looks like a merge error in the grammar (lost a semi-colon). I'll post in an update shortly. |
Add back missing semi-colon Signed-off-by: James Duong <[email protected]>
docs/user/ppl/cmd/trendline.rst
Outdated
* sort-field: mandatory when sorting is used. The field used to sort. | ||
* number-of-datapoints: mandatory. number of datapoints to calculate the moving average (must be greater than zero). | ||
* field: mandatory. the name of the field the moving average should be calculated for. | ||
* alias: optional. the name of the resulting column containing the moving average. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add something like:
By default, the column name appends "_trendline" to the field name.
Signed-off-by: Andrew Carbonetto <[email protected]>
@jduo LGTM ! |
linkchecker will be fixed here: #3193 (review) |
Description
Adds the trendline command
Related Issues
Resolves #3013
#3011
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.