-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for shared memory file using the 'r' flag on open. #2303
Conversation
b03bebb
to
816f512
Compare
I will try to look at the code in detail and address the rest of the points by the end of the week, but I wanted to chime in about the user-facing API and answer these questions first. I am not sure if we should start with a modification to Somewhat prompted by #2298, I realized that if we have a generic const myBinaryFile = new ShareArrayBuffer('myfile', function(){
return open('myfile', 'r');
}); However, we can't do the reverse and use |
Hey @na--, Thanks a lot for the feedback, that's really useful 🙏🏻 I actually started with having a A few comments come to my mind:
Regardless, I'm going to refactor my PR to move towards your recommendation 🐻 |
👍, or at least I can't think of a reason to do anything differently - it's similar to
True, and technically users should probably be able to do whatever they want in the closure/callback. For example, they might want to I'd say that we can even allow the creation of
Sorry, I am not sure I understand exactly what you mean here. That if we have |
I also haven't looked in depth in this but would like to point out (as it seems that it hasn't been noticed or noted) that SharedArrayBuffer is an actual ECMAScript type which is modifiable through Atomics. I am kind of against our type to also be called that, as in some future moment we might want to support what is in the specification in some variant. This has implications for distributed execution and will likely need some further discussion. So I feel like maybe we should go in a different direction from what is discussed here and instead of going for something more general to go for a more concrete solution and only have |
Ah... 😞 Any ideas for alternative names?
Ignoring the naming issue, with what part of my arguments above do you disagree with? |
Thanks for the pointer @mstoykov on SharedArrayBuffer and Atomic. Having close to zero experience in JavaScript still, I would have missed that 🙇🏻. Regarding the implementation you describe @mstoykov that's also, I think, exactly what the current implementation is doing. When a user calls Regarding either one or the other implementation, I don't have enough experience with neither our JS scripting nor with JS itself to have strong opinions yet. So I'll be relying on your experience and intuitions to decide which way "feels" the best 🍿 |
if err != nil { | ||
return nil, err | ||
} | ||
|
||
if len(args) > 0 && args[0] == "b" { | ||
ab := i.runtime.NewArrayBuffer(data) | ||
if len(args) > 0 && contains(args[0], 'b') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there is a particular reason why we changed string to rune? Or maybe we can stick with the string and use the strings
package function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right: I had completely forgotten about the strings.Contains*
methods. I'll use that instead indeed :)
i.mu.RLock() | ||
array, exists := i.data[filename] | ||
i.mu.RUnlock() | ||
if !exists { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a conception of the early returns, which I found pretty helpful because it also reduces nesting and improves the reading of the methods.
For example:
if exists {
return array
}
Do you think we can apply it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out. I really just copied it from sharedArray
's implementation. In principle, I would have prefered to keep it as a "negative" branching in this specific scenario, for (what feels to me as) clarity. But, in this specific scenario, I think it might actually lead to better performance to indeed check for existence instead. Considering the use case of this, I would expect most of the calls to this function to result in the data being already present in the slice. Hence, I assume (big assumption, I haven't verified that) it would make branch prediction's life easier if we were checking on a value that has 90% chances of being right, rather than the opposite.
Thanks for pointing it out 🐻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay :)
Maybe clarify a bit. I'm not sure if we can gain some performance (since the variable value is calculated) here, but I was saying more from the readability perspective.
For example:
if exists {
return array
}
// ..some logic..
if another {
...
}
return array
is more straightforward (and easy to read) than
if ! exists {
// ..some logic..
if another {
// ...
array = bar()
}
}
return array
My issue with the second option is that since there is a solo exit from the function, we have to keep several conditions in mind (continue reading) before recognizing what is returning. But for sure, it's subjective, so just considering this as the sharing opinion.
// SharedArrayBuffer: sab, | ||
// rt: rt, | ||
// }) | ||
return runtime.NewArrayBuffer(sab.arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This in no way prevents the underlying []byte
from being changes as shown by this
import { sleep } from "k6";
var b = open("./sab.js", "rb")
export default function() {
var v = new Int8Array(b);
v[0] = __VU + __ITER
sleep(1)
console.log(v[0]);
}
(name it sab.js
for easier)
running with go run -race . run -u 20 - i 30 sab.js
I get
WARNING: DATA RACE
Write at 0x00c000546840 by goroutine 45:
github.com/dop251/goja.(*int8Array).set()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/typedarrays.go:198 +0x7e
github.com/dop251/goja.(*typedArrayObject)._putIdx()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/typedarrays.go:520 +0x108
github.com/dop251/goja.(*typedArrayObject).setOwnIdx()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/typedarrays.go:542 +0x4a
github.com/dop251/goja.(*Object).setOwn()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/object.go:1403 +0x16c
github.com/dop251/goja._setElemStrictP.exec()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/vm.go:1403 +0x235
github.com/dop251/goja.(*_setElemStrictP).exec()
<autogenerated>:1 +0x44
github.com/dop251/goja.(*vm).run()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/vm.go:410 +0x174
github.com/dop251/goja.(*baseJsFuncObject)._call()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/func.go:193 +0x8b3
github.com/dop251/goja.(*baseJsFuncObject).call()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/func.go:203 +0x125
github.com/dop251/goja.(*baseJsFuncObject).Call()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/func.go:156 +0x65
github.com/dop251/goja.(*baseJsFuncObject).Call-fm()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/func.go:155 +0xae
github.com/dop251/goja.AssertFunction.func1.2()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/runtime.go:2267 +0x136
github.com/dop251/goja.(*vm).try()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/vm.go:547 +0x355
github.com/dop251/goja.AssertFunction.func1()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/runtime.go:2266 +0x1c6
go.k6.io/k6/js.(*VU).runFn()
/home/mstoykov/work/k6io/k6/js/runner.go:786 +0x46d
go.k6.io/k6/js.(*ActiveVU).RunOnce()
/home/mstoykov/work/k6io/k6/js/runner.go:728 +0x887
go.k6.io/k6/lib/executor.getIterationRunner.func1()
/home/mstoykov/work/k6io/k6/lib/executor/helpers.go:145 +0x7d
go.k6.io/k6/lib/executor.SharedIterations.Run.func4()
/home/mstoykov/work/k6io/k6/lib/executor/shared_iterations.go:271 +0x5a7
go.k6.io/k6/lib/executor.SharedIterations.Run·dwrap·31()
/home/mstoykov/work/k6io/k6/lib/executor/shared_iterations.go:283 +0x58
Previous write at 0x00c000546840 by goroutine 43:
github.com/dop251/goja.(*int8Array).set()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/typedarrays.go:198 +0x7e
github.com/dop251/goja.(*typedArrayObject)._putIdx()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/typedarrays.go:520 +0x108
github.com/dop251/goja.(*typedArrayObject).setOwnIdx()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/typedarrays.go:542 +0x4a
github.com/dop251/goja.(*Object).setOwn()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/object.go:1403 +0x16c
github.com/dop251/goja._setElemStrictP.exec()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/vm.go:1403 +0x235
WARNING: DATA RACE
Write at 0x00c000546840 by goroutine 45:
github.com/dop251/goja.(*int8Array).set()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/typedarrays.go:198 +0x7e
github.com/dop251/goja.(*typedArrayObject)._putIdx()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/typedarrays.go:520 +0x108
github.com/dop251/goja.(*typedArrayObject).setOwnIdx()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/typedarrays.go:542 +0x4a
github.com/dop251/goja.(*Object).setOwn()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/object.go:1403 +0x16c
github.com/dop251/goja._setElemStrictP.exec()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/vm.go:1403 +0x235
github.com/dop251/goja.(*_setElemStrictP).exec()
<autogenerated>:1 +0x44
github.com/dop251/goja.(*vm).run()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/vm.go:410 +0x174
github.com/dop251/goja.(*baseJsFuncObject)._call()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/func.go:193 +0x8b3
github.com/dop251/goja.(*baseJsFuncObject).call()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/func.go:203 +0x125
github.com/dop251/goja.(*baseJsFuncObject).Call()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/func.go:156 +0x65
github.com/dop251/goja.(*baseJsFuncObject).Call-fm()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/func.go:155 +0xae
github.com/dop251/goja.AssertFunction.func1.2()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/runtime.go:2267 +0x136
github.com/dop251/goja.(*vm).try()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/vm.go:547 +0x355
github.com/dop251/goja.AssertFunction.func1()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/runtime.go:2266 +0x1c6
go.k6.io/k6/js.(*VU).runFn()
/home/mstoykov/work/k6io/k6/js/runner.go:786 +0x46d
go.k6.io/k6/js.(*ActiveVU).RunOnce()
/home/mstoykov/work/k6io/k6/js/runner.go:728 +0x887
go.k6.io/k6/lib/executor.getIterationRunner.func1()
/home/mstoykov/work/k6io/k6/lib/executor/helpers.go:145 +0x7d
go.k6.io/k6/lib/executor.SharedIterations.Run.func4()
/home/mstoykov/work/k6io/k6/lib/executor/shared_iterations.go:271 +0x5a7
go.k6.io/k6/lib/executor.SharedIterations.Run·dwrap·31()
/home/mstoykov/work/k6io/k6/lib/executor/shared_iterations.go:283 +0x58
Previous write at 0x00c000546840 by goroutine 43:
github.com/dop251/goja.(*int8Array).set()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/typedarrays.go:198 +0x7e
github.com/dop251/goja.(*typedArrayObject)._putIdx()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/typedarrays.go:520 +0x108
github.com/dop251/goja.(*typedArrayObject).setOwnIdx()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/typedarrays.go:542 +0x4a
github.com/dop251/goja.(*Object).setOwn()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/object.go:1403 +0x16c
github.com/dop251/goja._setElemStrictP.exec()
/home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/vm.go:1403 +0x235
I don't know how exactly this needs to be fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I must admit I can't think of any solution either at the moment. The main thing I see, is that your example builds upon the data being an ArrayBuffer
hence, copiable to a typed array object such as Int8Array
. Which leads me to think that I'd be better off with a dedicated object type under the hood, instead of relying on ArrayBuffer
indeed. Cheers!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned earlier ... arguably the thing I imagined and is kind of discussed (IMO) in #1931 is that we will have an immutable ArrayBuffer.
Which at least I will expect to be put in Int8Array but it will throw an exception on trying to change its value, but not on reading it. So it will act exactly as one all the way through, except for writing data in it. (arguably the exception should only happen in strict mode but k6 is always in strict mode and I doubt we are changing that.)
We are revisiting our strategy on that front. Closing. Feel free to reopen in the future if it's judged useful. |
What this PR does
This PR is an attempt at providing some resolution to #1931. It introduces the
r
option to the JS init contextopen()
function. When used against a binary file as such:open('myfile', 'br')
it returns anArrayBuffer
instance holding the file bytes, that's uniquely mapped and stored in the process. When a file is opened the first time, its content is added to afilename->content
mapping. Opening the same file multiple times would result in storing its content in memory only once.It allows to do something along the lines of:
As of the opening of this PR, this mapping is done directly in the
data
module, and is done against filenames. For a given filename, there exists at any point in time a single copy of its content stored in thesharedArrayBuffer
attribute of the dataRootModule
.My understanding is that
ArrayBuffer
isn't stricto sensu immutable, but it's not designed for being modified either. It felt like an acceptable compromise. Let me know if you disagree and/or see issues with this assumption.Scope
This PR doesn't have tests yet. I preferred to wait for your input on the design before adding any. I will add some as soon as we reach a satisfactory 🏗️ agreement on the design.
See Feedback requests and Open Question for more scope related questions.
Alternatives
An alternative I considered was using
mmap
instead. When a file is opened with thebr
flags, we would open it in read only mode, andmmap
it in read-only mode. As a reminder,mmap
is (posix?) a system call that allows a process to map the content of a file directly into its virtual address space: any further operations done to the mmaped file are done in memory, and the OS makes sure to sync those to the underlying file storage whenever it judges necessary.This would have been very convenient, considering that it would be "shared" by design, anyone having the *mmaped file descriptor inside the process can access it, just like it would with any other file. Plus, it supports a similar set of flags as the OS
open
syscall and would easily enforce the "read-only" property.However, there are in my opinion two main limitations to using
mmap
:Expected feedback and Open Questions
initcontext.go
'sOpen
function call in thedata
module directly like this:dataModule.(*data.RootModule).GetOrCreateSharedArrayBuffer(filename, fileData)
would you say this is acceptable? If not, what would be an alternative?ArrayBuffer
and not aDynamicArray
with paniquing setters, was mostly due to the fact that theDynamicArray
interface doesn't allow access to its underlying bytes, makingcommon.ToBytes
ineffective. If you have a better idea, please elaborate 🙏🏻How to review this PR
There really isn't so much code in this PR as of today. I would recommend starting from the entry point: the modifications made to the
Open
method of theinitcontext.go
file, though. Then it might be interesting to follow whatGetOrCreateSharedArrayBuffer
indata/data.go
does under the hood.