Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert in-memory Column.round to Java #7521

Merged
merged 41 commits into from
Aug 16, 2023
Merged

Conversation

GregoryTravis
Copy link
Contributor

@GregoryTravis GregoryTravis commented Aug 8, 2023

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed, the GUI was tested when built using ./run ide build.

Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, I think the direction is good.

I'm glad that the refactor splitting into unary and binary ops actually made it a bit cleaner to add a ternary op as you just did.

I think the direction is good to proceed, just needs polishing. In particular, please ensure that the result type is like we expect.

@GregoryTravis
Copy link
Contributor Author

Benchmarks: significant improvement for floats; not much for ints:

float old round decimal_places=0 use_bankers=True average: 374.8ms
float new round decimal_places=0 use_bankers=True average: 21.4ms
float old round decimal_places=-2 use_bankers=True average: 266.25ms
float new round decimal_places=-2 use_bankers=True average: 16.02ms
float old round decimal_places=2 use_bankers=True average: 259.43ms
float new round decimal_places=2 use_bankers=True average: 19.17ms
float old round decimal_places=0 use_bankers=False average: 258.15ms
float new round decimal_places=0 use_bankers=False average: 18.69ms
float old round decimal_places=-2 use_bankers=False average: 211.43ms
float new round decimal_places=-2 use_bankers=False average: 19.06ms
float old round decimal_places=2 use_bankers=False average: 205.5ms
float new round decimal_places=2 use_bankers=False average: 15.09ms

int old round decimal_places=0 use_bankers=True average: 0.28ms
int new round decimal_places=0 use_bankers=True average: 0.26ms
int old round decimal_places=-2 use_bankers=True average: 0.1ms
int new round decimal_places=-2 use_bankers=True average: 0.09ms
int old round decimal_places=2 use_bankers=True average: 0.05ms
int new round decimal_places=2 use_bankers=True average: 0.06ms
int old round decimal_places=0 use_bankers=False average: 0.12ms
int new round decimal_places=0 use_bankers=False average: 0.07ms
int old round decimal_places=-2 use_bankers=False average: 0.04ms
int new round decimal_places=-2 use_bankers=False average: 0.05ms
int old round decimal_places=2 use_bankers=False average: 0.03ms
int new round decimal_places=2 use_bankers=False average: 0.05ms

@GregoryTravis GregoryTravis added the CI: No changelog needed Do not require a changelog entry for this PR. label Aug 8, 2023
@GregoryTravis GregoryTravis marked this pull request as ready for review August 8, 2023 19:41
@GregoryTravis
Copy link
Contributor Author

@radeusgd value_type tests are already in Column_Spec.

@GregoryTravis
Copy link
Contributor Author

New benchmarks; indeed, I commented out the parameters and forgot I had done it. The benchmarks show improvements for both types. (Except in the fast-return case, where it isn't doing anything anyway.)

.round floats
old float round decimal_places=0 use_bankers=True average: 368.67ms
new float round decimal_places=0 use_bankers=True average: 23.86ms
.round ints
old int round decimal_places=0 use_bankers=True average: 0.33ms
new int round decimal_places=0 use_bankers=True average: 0.34ms
.round floats
old float round decimal_places=-2 use_bankers=True average: 287.51ms
new float round decimal_places=-2 use_bankers=True average: 19.88ms
.round ints
old int round decimal_places=-2 use_bankers=True average: 313.95ms
new int round decimal_places=-2 use_bankers=True average: 14.59ms
.round floats
old float round decimal_places=2 use_bankers=True average: 239.04ms
new float round decimal_places=2 use_bankers=True average: 18.98ms
.round ints
old int round decimal_places=2 use_bankers=True average: 0.06ms
new int round decimal_places=2 use_bankers=True average: 0.07ms
.round floats
old float round decimal_places=0 use_bankers=False average: 233.73ms
new float round decimal_places=0 use_bankers=False average: 21.27ms
.round ints
old int round decimal_places=0 use_bankers=False average: 0.11ms
new int round decimal_places=0 use_bankers=False average: 0.12ms
.round floats
old float round decimal_places=-2 use_bankers=False average: 210.69ms
new float round decimal_places=-2 use_bankers=False average: 21.3ms
.round ints
old int round decimal_places=-2 use_bankers=False average: 274.66ms
new int round decimal_places=-2 use_bankers=False average: 28.59ms
.round floats
old float round decimal_places=2 use_bankers=False average: 295.04ms
new float round decimal_places=2 use_bankers=False average: 19.86ms
.round ints
old int round decimal_places=2 use_bankers=False average: 0.04ms
new int round decimal_places=2 use_bankers=False average: 0.03ms

@@ -6,7 +6,7 @@ from Standard.Test import Bench, Faker
## Bench Utilities ============================================================

vector_size = 1000000
iter_size = 100
iter_size = 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we leave this at 100? or is it so long now we want it shorter!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It takes 47s with iter_size = 10 so I assume we don't want it much longer than that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future - I think in that case we may want to consider to reduce the vector_size instead, to make each iteration a bit shorter - but to keep the number of iterations. 10 may be too few to see warmup and also to see the distribution of the timings (i.e. if the variance is low or high).

@GregoryTravis GregoryTravis added the CI: Ready to merge This PR is eligible for automatic merge label Aug 14, 2023
@jdunkerley jdunkerley linked an issue Aug 15, 2023 that may be closed by this pull request
Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am interested in ensuring consistency between the builtin and the column implementation. Either by sharing the algorithm or by sharing the tests. Otherwise OK as far as I can say.

@@ -31,8 +37,7 @@ public class Core_Math_Utils {
* @throws IllegalArgumentException if `n` is outside the allowed range.
* @throws IllegalArgumentException if `decimalPlaces` is outside the allowed range.
*/
public static double roundDouble(double n, int decimalPlaces, boolean useBankers) {

public static double roundDouble(double n, long decimalPlaces, boolean useBankers) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a copy of the code in RoundNode? Shall not we seek a way to share the code among both versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing it, we decided it was sufficient to share the tests; sharing the implementation between builtins and regular polyglot libraries is a bigger task.

*
* @param <I> the supported storage type.
*/
public abstract class TernaryMapOperation<T, I extends Storage<? super T>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the naming - I.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @radeusgd for that.

@Override
public Storage<Long> runTernaryMap(AbstractLongStorage storage, Object decimalPlacesObject, Object useBankersObject, MapOperationProblemBuilder problemBuilder) {
if (decimalPlacesObject == null) {
throw new IllegalArgumentException("decimalPlacesObject must not be null");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: shall not the exception be NullPointerException? Or use java.util.Objects.requireNonNull...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the null check.

}

if (!(decimalPlacesObject instanceof Long decimalPlaces)) {
throw new UnexpectedTypeException("a long.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exception would be thrown if decimalPlacesObject was null. E.g. do we need the above == null check at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the null check.

@@ -518,6 +518,19 @@ spec =
3.1416 . round 3 . should_equal 3.142
3.9999 . round 3 . should_equal 4.0

Test.specify "Can round negative decimals to a specified number of decimal places" <|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure the behavior of RoundNode builtin and the Column.round is the same, can we generalize this Numbers_Spec and run it twice on the same values? Once using regular round builtin, once creating a Column with the same value and rounding via Column.round? I assume the result should be the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@GregoryTravis GregoryTravis removed the CI: Ready to merge This PR is eligible for automatic merge label Aug 15, 2023
Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sharing the tests this way shall ensure the two rounding implementations are as identical as we need.


polyglot java import java.math.BigInteger

spec prefix round_fun =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -30,6 +31,8 @@ spec setup =
result.to_vector.at 0
do_round n dp=0 use_bankers=False = do_op n (_.round dp use_bankers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. do_op creates a table with row "x" and single value n and then applies the operation to column "x" and returns the result.

@GregoryTravis GregoryTravis added the CI: Ready to merge This PR is eligible for automatic merge label Aug 16, 2023
@mergify mergify bot merged commit c9d7c5c into develop Aug 16, 2023
@mergify mergify bot deleted the wip/gmt/7400-col-round branch August 16, 2023 14:45
@@ -9,6 +9,7 @@ import project.Function.Function
import project.Internal.Rounding_Helpers
import project.Nothing.Nothing
import project.Panic.Panic
import project.IO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this import used?

@@ -0,0 +1,288 @@
from Standard.Base import all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right place to have this.

Standard.Test is a test library, like JUnit. We should not distribute our tests with it.

IMO we should just move this test altogether to Table_Tests if we want to run it for both scalars and Columns. I don't think there is much harm in moving the scalar case from Tests to Table_Tests - IMO much less than distributing our internal test suites as part of a general and user-facing testing library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert in-memory Column.round to Java
4 participants