Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shareable array implementation #1739

Merged
merged 10 commits into from
Jan 11, 2021
Merged

shareable array implementation #1739

merged 10 commits into from
Jan 11, 2021

Conversation

mstoykov
Copy link
Contributor

This is part of #532

The reason behind this not redefing ... how to open a file or load data
from one is that, as in the majority of the cases after that there is
some amount of ... transformation of the data, so that won't help much.

The original idea was to have a shareable JSON ... but the more concrete
array implementation is faster (in some cases by a lot) and also seems
like a better fix for the first iteration of this, as that will be the
majority of the data type. So it is very likely that the non array parts
will be dropped in favor of doing them later and this getting merged
faster.

There are still more things to be done and depending on whether or not
we want to support iterating (with for-of) it can probably be done
without some of the js code.

This is part of #532

The reason behind this not redefing ... how to open a file or load data
from one is that, as in the majority of the cases after that there is
some amount of ... transformation of the data, so that won't help much.

The original idea was to have a shareable JSON ... but the more concrete
array implementation is faster (in some cases by a lot) and also seems
like a better fix for the first iteration of this, as that will be the
majority of the data type. So it is very likely that the non array parts
will be dropped in favor of doing them later and this getting merged
faster.

There are still more things to be done and depending on whether or not
we want to support iterating (with for-of) it can probably be done
without some of the js code.
@mstoykov mstoykov force-pushed the readOnlyShareableObject branch from 4cf1ba8 to 7e43c20 Compare November 25, 2020 16:26
Comment on lines 172 to 173
case Symbol.iterator:
return function() {return target.iterator()}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unfortunately can't be done through goja.ProxyTrapConifg So if we want to have support for for-of We need to use js proxy (we can use both, but I don't think that will help with anything :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm why can't we use goja.ProxyTrapConifg? Because of Symbol.iterator? If so, can't we just get the goja definition of the iterator symbol from the runtime and compare against that in the Get property of the goja.ProxyTrapConifg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes,
I am gonna copy you the relevant part(which is most of it) of documentation that I linked ;)

Note that the Proxy may not have Symbol properties when using this as a handler because property keys are passed as strings. get() and getOwnPropertyDescriptor() for Symbol properties will always return undefined; has() and deleteProperty() for Symbol properties will always return false; set() and defineProperty() for Symbol properties will throw a TypeError. If you need Symbol properties implement the handler in JavaScript.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will probably take some breaking of the API to be fixed and also the standard Symbols will need to be exported by goja .. which they currently aren't.

I doubt this will actually make much of a performance difference to be honest ... But it will be kind of nice to not have to execute a script to wrap it I guess

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although i guess it is worth an issue to ask why the name wasn't a goja.Value to begin with instead of string

Copy link
Member

@na-- na-- Dec 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't access the symbol from the Go side, but can't we get it from the JS side by just getting Symbol.iterator from the runtime, even now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nevermind, somehow missed the first of your 3 comments...

@codecov-io
Copy link

codecov-io commented Nov 25, 2020

Codecov Report

Merging #1739 (cd385a6) into master (212cc57) will decrease coverage by 0.06%.
The diff coverage is 69.49%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1739      +/-   ##
==========================================
- Coverage   71.52%   71.45%   -0.07%     
==========================================
  Files         178      180       +2     
  Lines       13727    13833     +106     
==========================================
+ Hits         9818     9885      +67     
- Misses       3297     3326      +29     
- Partials      612      622      +10     
Flag Coverage Δ
ubuntu 71.39% <69.49%> (-0.08%) ⬇️
windows 70.02% <69.49%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
js/modules/k6/data/share.go 50.00% <50.00%> (ø)
js/modules/k6/data/data.go 62.50% <62.50%> (ø)
js/bundle.go 90.60% <100.00%> (+0.06%) ⬆️
js/common/initenv.go 100.00% <100.00%> (ø)
js/initcontext.go 92.30% <100.00%> (+0.17%) ⬆️
ui/form.go 58.33% <0.00%> (-5.31%) ⬇️
cmd/root.go 8.75% <0.00%> (-1.59%) ⬇️
js/modules/k6/http/request.go 80.00% <0.00%> (-0.65%) ⬇️
js/modules/k6/grpc/client.go 78.82% <0.00%> (-0.36%) ⬇️
cmd/common.go 36.36% <0.00%> (ø)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 212cc57...cd385a6. Read the comment docs.

Copy link
Contributor

@imiric imiric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! It mostly LGTM, save for that wrapping script that uses Proxy. It would be great if that could be simplified/avoided, but I'm not familiar with goja enough to know if that would eventually be feasible with ProxyTrapConfig or with some other approach.

Strong 👍 for supporting for...of, and I think we should also support shareable objects even if it has worse performance than arrays. That will likely be easier to use in a lot of cases, but am also not sure how difficult it is to implement or if we'd be OK with postponing it for a second iteration of this, though probably yes. Maybe @sniku can chime in on this.

@mstoykov mstoykov requested review from imiric and na-- December 9, 2020 15:37
imiric
imiric previously approved these changes Dec 10, 2020
Copy link
Contributor

@imiric imiric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few nitpicks. I guess it's OK if we leave shareable objects for later.

js/share.go Outdated Show resolved Hide resolved
js/initcontext.go Outdated Show resolved Hide resolved
js/share_test.go Outdated Show resolved Hide resolved
js/share_test.go Outdated Show resolved Hide resolved
js/share.go Outdated
Comment on lines 34 to 37
// TODO rename
// TODO check how it works with setup data
// TODO check how it works if logged
type sharedArray struct {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logging it will do:

      TypeError: Could not convert &{Object 0xc000dfa4b0 0xc0008a4750 false map[] [] 0 0 <nil>} to primitive
        at github.com/loadimpact/k6/js/common.Bind.func1 (native)

But I don't know ... what should be logged so 🤷
And passing it by setup ... doesn't work .. but it also doesn't panic - it passes some Object that seems ... empty - this can probably be fixed and made to work, but it will be harder with the cloud/distributed in mind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having it work from setup() would be good, but I don't think it's required for this PoC.

That logging error looks gnarly though... We could log the first and last N elements stringified, if that's easily doable.

@mstoykov mstoykov force-pushed the readOnlyShareableObject branch from 52cd423 to d453b82 Compare December 10, 2020 11:06
@mstoykov mstoykov force-pushed the readOnlyShareableObject branch from d453b82 to 7cc4418 Compare December 10, 2020 11:23
@mstoykov mstoykov changed the title PoC shareable array implementation shareable array implementation Dec 10, 2020
i.sharesLock.Lock()
defer i.sharesLock.Unlock()
value, ok := i.shares[name]
if !ok { //nolint:nestif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit excessive, why not have an early return here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predominantly because earlier the code at the end was way longer ... but the nolint is no longer necessary actually .. so maybe the linter checking the unneeded nolints ... has a bug 😱

// XShare is a constructor returning a shareable read-only array (general objects coming)
// indentified by the name and having their contents be whatever the call returns - again only
// arrays currently supported.
func (i *InitContext) XShare(name string, call goja.Callable) goja.Value {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in slack, I'd really like not to have new k6-specific superglobal names like open(), __VU, etc. Instead, this should be accessible via k6/data/SharedArray (so, import {SharedArray} from "k6/data";) or something like that. The k6/data package will probably be reused by #1539 for other things like segmented iterators and maybe XML/JSON/etc. streaming parsers and segmentation (though those probably will be sub-packages)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw now it should be easy to move this outside of the InitContext, you can just have the shares registry in the InitEnvironment that was added in #1706


arr := make([][]byte, len(tmpArr))
for index := range arr {
arr[index], err = json.Marshal(tmpArr[index])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good enough for the first version, but I wonder if we couldn't figure out something better. For example, gob (or some other binary encoding/decoding) is probably faster.

Or maybe no encoding and decoding is necessary at all, and we can share the raw Go objects and use Object.freeze() to make it read-only and prevent data races? goja seems to support it, and it even seems to work:

export default function () {
    const obj = [1, 2, 3];
    obj[1] = 22;
    Object.freeze(obj); // comment this for no error
    console.log(obj);
    obj[2] = 33; // this throws a TypeError if frozen
    console.log(obj);
}

Even if it's not perfect and there's some obscure way to modify the object, we'll document that the shared data is read-only, so it'd be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My problem is that this freezes only the top object not any object that is property of the top object. Also, goja does sometimes have an internal state for even simple types that change without any locking or stuff ... because JS is single-threaded by definition.

So IMO this will break horribly, but was one of the initial things I thought about ... also this means that I get an object from 1 VM and put it in another, without any kind of checking that it doesn't for example have functions ... that modify the original VMs properties or something ... as a whole I think while this approach will probably be "faster", unless goja has support for this it will probably be way too much work to actually properly check that things won't blow up, and any change to goja will likely necessitate refactoring.

Maybe gob will be a better fit - maybe we should try it for a second iteration ... I mostly went directly for JSON as we already do this for the setupData ... and also ... JS(ON) ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm didn't realize Object.freeze() only touches the first layer, but even that's easy to work around. Copying the public domain deepFreeze() from here (or, originally, here) seems to work well enough:

function deepFreeze(o) {
    Object.freeze(o);
    if (o === undefined) {
        return o;
    }

    Object.getOwnPropertyNames(o).forEach(function (prop) {
        if (o[prop] !== null
            && (typeof o[prop] === "object" || typeof o[prop] === "function")
            && !Object.isFrozen(o[prop])) {
            deepFreeze(o[prop]);
        }
    });

    return o;
};

export default function () {
    const obj = [1, [1, 2], { c: 3, d: 4 }];
    deepFreeze(obj); // comment this for no error
    console.log(JSON.stringify(obj));
    obj[1].push(3); // this throws a TypeError if frozen
    obj[2].f = 5; // and this also
    console.log(JSON.stringify(obj));
}

And we already have a JS wrapper anyway, why not go all the way? 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, goja does sometimes have an internal state for even simple types that change without any locking or stuff ... because JS is single-threaded by definition.

So I don't think this is a good idea - we can explore it in a second iteration ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, goja does sometimes have an internal state for even simple types that change without any locking or stuff ... because JS is single-threaded by definition.

Sorry, I didn't address this properly. I know about the internal state, when you pass a goja Runtime some Go value, it wraps it in its own internal types. And that should be fine, since it doesn't touch the original value, as far as I know. At most it'll make a copy of a string to convert it to utf-16 or something like that...

So I think it's worth it to try using Object.freeze() (or, rather, deepFreeze()) even now, in the first iteration. It's going to be fine, even if it doesn't work at 100% and some JS guru can get around it and still modify something. They are doing it on their own risk of a data race... Besides, it's not like we haven't had issues with transforming objects to JSON and back before - we've had different key names and Go strangeness with that as well. And we do it for setup() because we have to ship the data between instances in the cloud. Here the requirements are much different.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't completely understand everything Dmitry says, but it seems like this won't be feasible... 😞

One benefit of recursively freezing objects instead of copying them (i.e. marshaling and unmarshaling), besides improved performance, would have been the better UX. With the current solution, each runtime will get its own copy of the array item, and would seeming be able to make changes in it. But if it accesses the same array item the next time, the changes it made on the previous iteration will be gone, since it'd receive a fresh copy of the data... Not the end of the world, but unless we document it very well, someone is liable to spend a bit of time pulling their hair. So, maybe it's worth recursively Object.freeze() things even when we're copying the data on the Go side?

js/initcontext.go Outdated Show resolved Hide resolved
This meant that I needed to JSON.parse inside goja so it creates the
goja objects as otherwise it can't freeze stuff from golang.

At this point I decided that it is better to use JSON.stringify in the
first place as well
@mstoykov
Copy link
Contributor Author

Some benchmarks with the latest code

@mstoykov mstoykov requested review from na-- and imiric December 15, 2020 12:44
Copy link
Contributor

@imiric imiric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, just some minor suggestions.

Also:

  • I'm out of my depths with the Object.freeze discussion, but shouldn't we take Dmitry's advice and use the Go API rather than wrap it in JS? It would probably complicate things a bit, but it should be worth it if it considerably increases performance.

  • The benchmark in that gist shows the share test actually using more memory than the no-share test(?). I can understand the CPU increase, but isn't the whole point of this to reduce memory usage?

js/modules/k6/data/data.go Show resolved Hide resolved
js/modules/k6/data/data.go Show resolved Hide resolved
throw "the previous line should've errored";
} catch(e) {
if (!e.toString().includes("Host object field 1 cannot be made configurable")) {
throw "wrong error " + e.toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to see separate Go testcases for each scenario, rather than this large JS script. It would be easier to run them individually and also to modify or add new ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm ... maybe, it used to be fairly short. On the other hand that will increase the amount of time the test takes(goja initialization isn't free even without the babel and corejs), while changes to this test seem ... unlikely unless we add functionality and I don't think we will be adding all that much as I doubt a lot of users will want additional Array like functionality.

I am even starting to question if enabling for-of was worth it ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still prefer legibility/usability over a slightly longer run time, but it's fine if you don't think it's worth it.

I saw your comment about the 100-200x performance penalty with the iterator, and now I'm not so sure if we should keep it either 😄 I would assume access by index will be more common, and looping by index would work as well, but some users might prefer the for-of notation and consider the performance loss acceptable(?). We need documentation to explain this either way, so I'm leaning towards leaving it in as is.

js/share_test.go Show resolved Hide resolved
@mstoykov
Copy link
Contributor Author

@imiric :

  • I thought about this some more, and unless I drop the for-of part ... I still need to go through javascript code, at which point I am pretty sure the difference he is showing will be even less noticeable. The benchmark I posted was mostly to compare the current solution to the "native" performance(of just a JS array) and see if there is enough of a difference, and I don't think there is. You can read my long comment on the current performance characteristics and my opinion on whether optimizing it .. matters. I still think that it is better to have this as is and come back instead of continuing to tinker with it up until we need to release v0.30.0.

  • the benchmark I linked was about access times, not memory ... I specifically don't load the whole array in the first case as If I did it will take around 90-100mb per VU as I explained here and I don't have that kind of memory. Sorry for the misunderstanding - the non shared test has 10 elements the shared one has 100k elements in the array ;)

@mstoykov
Copy link
Contributor Author

An interesting use case for this that I just thought of so I am just writing here - we can move it to issue if this gets merged and we think there is merit:
It should be possible to generate a shareable object in setup using the response from a request (and not only) and then being able to use it from each VU. This will need some additional work ... due to the way this works in the cloud.

An alternative (that will likely be better) is to make it possible for setup to return SharedArray and for it to ... keep being such, which again will need some work for cloud/distributed execution. This will also probably need to work if it returns an object/array that has SharedArray(s) in it.

}
arr := make([]string, obj.Get("length").ToInteger())

cal, err := rt.RunString(`(function(input, output) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain (and maybe add as a code comment here) why we're encoding the data to JSON on the JS side? JSON.stringify() does something magical that json.Marshal() doesn't? Or does this save us from utf8<->utf16 conversions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit comment explains it :P. Using JSON.parse was the easiest way to get a goja object thing that can then be frozen as otherwise json.Decode gets you a golang map which ... just isn't a goja object. You can make it into a value but it still the golang map underneath and doesn't let itself be frozen (the goja author actually commented on that).
So at that point, I had the question - do I use two different libraries to do the encode/decode or not, and went with "not" ;).

They have some ... interesting behavior differences around stuff like functions and other strange types.

But IMO it is much better to be as JS like as possible - explaining that what we do is JSON.parse/JSON.stringify a value instead of json.Encode/Decode in golang and that is why your strange value doesn't work will be again easier for users. Also possibly make it possible for them to make it work.

Another benefit is this way I iterate only once here instead of twice, or once but use string indexes and Object.Get 😱.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a code comment with that, since I doubt anyone will read the commit description in the future and might try to "optimize" it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the problem .. people don't read git commits 😭

na--
na-- previously approved these changes Jan 8, 2021
Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't played with this since mid-December, but the code LGTM in general.

Having the test in the same module will be better, but if that's annoying to do because of import loops or something like that, maybe you can move just the small try/catch tests in there, as discrete unit tests for error handling and maybe a small one for sanity check. But leave the big test where it is, as an integration test of sorts?

Copy link
Contributor

@imiric imiric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Noticed a couple of typos, but it's not a big deal.

The goheader linter failures are because they're new files? 😕

@mstoykov mstoykov merged commit b427501 into master Jan 11, 2021
@mstoykov mstoykov deleted the readOnlyShareableObject branch January 11, 2021 16:43
@mstoykov
Copy link
Contributor Author

The goheader linter failures are because they're new files?

No idea ... I had some golangci-lint's cache problems locally, but they seemed to go away. Maybe we are hitting something similar in github actions ...

@na-- na-- added this to the v0.30.0 milestone Jan 19, 2021
@imiric imiric mentioned this pull request Jan 20, 2021
salem84 pushed a commit to salem84/k6 that referenced this pull request Feb 3, 2021
This is part of grafana#532

This only implements a shareable array that is generated through a callback.
This way any additional processing can be done by any js code once and then the result will be shared between VUs in a readonly fashion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants