Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starlark: reuse positional array in native calls where possible #19708

Closed
wants to merge 2 commits into from

Conversation

fmeum
Copy link
Collaborator

@fmeum fmeum commented Oct 3, 2023

This is a resubmission of @stepancheg's #12782, which became stale, with this original description:

====

For certain calls like type(x) or str(x) or l.append(x) we can reuse positional array argument of fastcall to pass it to the MethodDescriptor.call and skip all runtime checks in BuiltinFunction.

This optimization cannot be applied if

  • a function has extra positional or named arguments
  • has arguments guarded by flag
  • accepts extra StarlarkThread parameter
  • has unfilled arguments with default values

So this optimation won't be used for calls list(x) or bool().

For type(1) called in loop, the result is 15% speedup:

A: n=30 mean=5.705 std=0.387 se=0.071 min=5.131 med=5.827
B: n=30 mean=4.860 std=0.345 se=0.064 min=4.396 med=4.946
B/A: 0.852 0.820..0.884 (95% conf)

For bool() (no argument) called in loop, it results in 1% slowdown:

A: n=29 mean=9.045 std=0.603 se=0.113 min=8.096 med=9.400
B: n=29 mean=9.134 std=0.520 se=0.098 min=8.248 med=9.448
B/A: 1.010 0.976..1.045 (95% conf)

====

Beyond the original PR, this includes benchmarks, a test case addressing a review comment and small adaptions to the implementation to make it work with recent changes.

Benchmark results with JDK 21 and --seconds 5:

before:
File src/test/java/net/starlark/java/eval/testdata/bench_call.star:
benchmark                        ops     cpu/op    wall/op   steps/op   alloc/op
bench_call_bool             67108863      100ns      100ns          3        87B
bench_call_dict_get         33554431      159ns      158ns          5       143B
bench_call_dict_get_none    33554431      180ns      179ns          6       143B
bench_call_list_append      33554431      170ns      169ns          5       207B
bench_call_type             67108863      139ns      138ns          4       111B

after:
File src/test/java/net/starlark/java/eval/testdata/bench_call.star:
benchmark                        ops     cpu/op    wall/op   steps/op   alloc/op
bench_call_bool             67108863      100ns      100ns          3        87B
bench_call_dict_get         33554431      155ns      154ns          5       143B
bench_call_dict_get_none    33554431      174ns      174ns          6       143B
bench_call_list_append      67108863      141ns      141ns          5       183B
bench_call_type             67108863      115ns      114ns          4        87B

@github-actions github-actions bot added the awaiting-review PR is awaiting review from an assigned reviewer label Oct 3, 2023
}

@Test
public void testNamedOnlyArgument() throws Exception {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this test to document why I think that #12782 (comment) isn't a safe optimization, but I may have misunderstood the original comment.

@sgowroji sgowroji added the team-Starlark-Interpreter Issues involving the Starlark interpreter used by Bazel label Oct 3, 2023
@fmeum
Copy link
Collaborator Author

fmeum commented Oct 7, 2023

@brandjon Just curious: With progressing Starlarkification, is there a plan to reevaluate the "exectree" representation of Starlark modules at some point?

@fmeum
Copy link
Collaborator Author

fmeum commented Dec 28, 2023

@brandjon Friendly ping

tetromino

This comment was marked as duplicate.

@tetromino
Copy link
Contributor

Based on a naive reading of the code, we will now skip the special-cased StringModule behavior below (lines 175-179 in the original version of BuiltinFunction.java), and therefore Starlark string object methods that don't take a Starlark thread (e.g. "foo".upper()`) may misbehave. Can you verify whether or not this is the case, and if it's not, add a comment why things nevertheless work? Do we have sufficient test coverage for this?

stepancheg and others added 2 commits January 24, 2024 21:38
This is a resubmission of bazelbuild#12782, with this description:

====

For certain calls like `type(x)` or `str(x)` or `l.append(x)` we
can reuse `positional` array argument of `fastcall` to pass it to
the `MethodDescriptor.call` and skip all runtime checks in
`BuiltinFunction`.

This optimization cannot be applied if
* a function has extra positional or named arguments
* has arguments guarded by flag
* accepts extra `StarlarkThread` parameter
* has unfilled arguments with default values

So this optimation won't be used for calls `list(x)` or `bool()`.

For `type(1)` called in loop, the result is 15% speedup:

```
A: n=30 mean=5.705 std=0.387 se=0.071 min=5.131 med=5.827
B: n=30 mean=4.860 std=0.345 se=0.064 min=4.396 med=4.946
B/A: 0.852 0.820..0.884 (95% conf)
```

For `bool()` (no argument) called in loop, it results in 1% slowdown:

```
A: n=29 mean=9.045 std=0.603 se=0.113 min=8.096 med=9.400
B: n=29 mean=9.134 std=0.520 se=0.098 min=8.248 med=9.448
B/A: 1.010 0.976..1.045 (95% conf)
```

====

Beyond the original PR, this includes benchmarks, a test case addressing
a review comment and small adaptions to the implementation to make it
work with recent changes.

Benchmark results with JDK 21 and `--seconds 5`:

```
before:
File src/test/java/net/starlark/java/eval/testdata/bench_call.star:
benchmark                        ops     cpu/op    wall/op   steps/op   alloc/op
bench_call_bool             67108863      100ns      100ns          3        87B
bench_call_dict_get         33554431      159ns      158ns          5       143B
bench_call_dict_get_none    33554431      180ns      179ns          6       143B
bench_call_list_append      33554431      170ns      169ns          5       207B
bench_call_type             67108863      139ns      138ns          4       111B

after:
File src/test/java/net/starlark/java/eval/testdata/bench_call.star:
benchmark                        ops     cpu/op    wall/op   steps/op   alloc/op
bench_call_bool             67108863      100ns      100ns          3        87B
bench_call_dict_get         33554431      155ns      154ns          5       143B
bench_call_dict_get_none    33554431      174ns      174ns          6       143B
bench_call_list_append      67108863      141ns      141ns          5       183B
bench_call_type             67108863      115ns      114ns          4        87B
```

Co-authored-by: Fabian Meumertzheim <[email protected]>
@fmeum fmeum requested a review from tetromino January 24, 2024 20:49
Copy link
Contributor

@tetromino tetromino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable. Verified that the benchmark seems to show a small improvement.

@fmeum
Copy link
Collaborator Author

fmeum commented Jan 30, 2024

@bazel-io fork 7.1.0

@github-actions github-actions bot removed the awaiting-review PR is awaiting review from an assigned reviewer label Jan 30, 2024
bazel-io pushed a commit to bazel-io/bazel that referenced this pull request Jan 30, 2024
This is a resubmission of @stepancheg's bazelbuild#12782, which became stale, with this original description:

====

For certain calls like `type(x)` or `str(x)` or `l.append(x)` we can reuse `positional` array argument of `fastcall` to pass it to the `MethodDescriptor.call` and skip all runtime checks in `BuiltinFunction`.

This optimization cannot be applied if
* a function has extra positional or named arguments
* has arguments guarded by flag
* accepts extra `StarlarkThread` parameter
* has unfilled arguments with default values

So this optimation won't be used for calls `list(x)` or `bool()`.

For `type(1)` called in loop, the result is 15% speedup:

```
A: n=30 mean=5.705 std=0.387 se=0.071 min=5.131 med=5.827
B: n=30 mean=4.860 std=0.345 se=0.064 min=4.396 med=4.946
B/A: 0.852 0.820..0.884 (95% conf)
```

For `bool()` (no argument) called in loop, it results in 1% slowdown:

```
A: n=29 mean=9.045 std=0.603 se=0.113 min=8.096 med=9.400
B: n=29 mean=9.134 std=0.520 se=0.098 min=8.248 med=9.448
B/A: 1.010 0.976..1.045 (95% conf)
```

====

Beyond the original PR, this includes benchmarks, a test case addressing a review comment and small adaptions to the implementation to make it work with recent changes.

Benchmark results with JDK 21 and `--seconds 5`:

```
before:
File src/test/java/net/starlark/java/eval/testdata/bench_call.star:
benchmark                        ops     cpu/op    wall/op   steps/op   alloc/op
bench_call_bool             67108863      100ns      100ns          3        87B
bench_call_dict_get         33554431      159ns      158ns          5       143B
bench_call_dict_get_none    33554431      180ns      179ns          6       143B
bench_call_list_append      33554431      170ns      169ns          5       207B
bench_call_type             67108863      139ns      138ns          4       111B

after:
File src/test/java/net/starlark/java/eval/testdata/bench_call.star:
benchmark                        ops     cpu/op    wall/op   steps/op   alloc/op
bench_call_bool             67108863      100ns      100ns          3        87B
bench_call_dict_get         33554431      155ns      154ns          5       143B
bench_call_dict_get_none    33554431      174ns      174ns          6       143B
bench_call_list_append      67108863      141ns      141ns          5       183B
bench_call_type             67108863      115ns      114ns          4        87B
```

Closes bazelbuild#19708.

PiperOrigin-RevId: 602821608
Change-Id: If40e2eb1e09ec43d56b36ba13987aa5312fd47ec
github-merge-queue bot pushed a commit that referenced this pull request Jan 30, 2024
…le (#21144)

This is a resubmission of @stepancheg's #12782, which became stale, with
this original description:

====

For certain calls like `type(x)` or `str(x)` or `l.append(x)` we can
reuse `positional` array argument of `fastcall` to pass it to the
`MethodDescriptor.call` and skip all runtime checks in
`BuiltinFunction`.

This optimization cannot be applied if
* a function has extra positional or named arguments
* has arguments guarded by flag
* accepts extra `StarlarkThread` parameter
* has unfilled arguments with default values

So this optimation won't be used for calls `list(x)` or `bool()`.

For `type(1)` called in loop, the result is 15% speedup:

```
A: n=30 mean=5.705 std=0.387 se=0.071 min=5.131 med=5.827
B: n=30 mean=4.860 std=0.345 se=0.064 min=4.396 med=4.946
B/A: 0.852 0.820..0.884 (95% conf)
```

For `bool()` (no argument) called in loop, it results in 1% slowdown:

```
A: n=29 mean=9.045 std=0.603 se=0.113 min=8.096 med=9.400
B: n=29 mean=9.134 std=0.520 se=0.098 min=8.248 med=9.448
B/A: 1.010 0.976..1.045 (95% conf)
```

====

Beyond the original PR, this includes benchmarks, a test case addressing
a review comment and small adaptions to the implementation to make it
work with recent changes.

Benchmark results with JDK 21 and `--seconds 5`:

```
before:
File src/test/java/net/starlark/java/eval/testdata/bench_call.star:
benchmark                        ops     cpu/op    wall/op   steps/op   alloc/op
bench_call_bool             67108863      100ns      100ns          3        87B
bench_call_dict_get         33554431      159ns      158ns          5       143B
bench_call_dict_get_none    33554431      180ns      179ns          6       143B
bench_call_list_append      33554431      170ns      169ns          5       207B
bench_call_type             67108863      139ns      138ns          4       111B

after:
File src/test/java/net/starlark/java/eval/testdata/bench_call.star:
benchmark                        ops     cpu/op    wall/op   steps/op   alloc/op
bench_call_bool             67108863      100ns      100ns          3        87B
bench_call_dict_get         33554431      155ns      154ns          5       143B
bench_call_dict_get_none    33554431      174ns      174ns          6       143B
bench_call_list_append      67108863      141ns      141ns          5       183B
bench_call_type             67108863      115ns      114ns          4        87B
```

Closes #19708.

Commit
aded06b

PiperOrigin-RevId: 602821608
Change-Id: If40e2eb1e09ec43d56b36ba13987aa5312fd47ec

Co-authored-by: Fabian Meumertzheim <[email protected]>
@fmeum fmeum deleted the reuse-resubmit branch January 30, 2024 23:14
@iancha1992
Copy link
Member

The changes in this PR have been included in Bazel 7.1.0 RC1. Please test out the release candidate and report any issues as soon as possible. If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=last_rc.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Starlark-Interpreter Issues involving the Starlark interpreter used by Bazel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants