Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for shared memory file using the 'r' flag on open. #2303

Closed
wants to merge 1 commit into from

Conversation

oleiade
Copy link
Member

@oleiade oleiade commented Dec 17, 2021

What this PR does

This PR is an attempt at providing some resolution to #1931. It introduces the r option to the JS init context open() function. When used against a binary file as such: open('myfile', 'br') it returns an ArrayBuffer instance holding the file bytes, that's uniquely mapped and stored in the process. When a file is opened the first time, its content is added to a filename->content mapping. Opening the same file multiple times would result in storing its content in memory only once.

It allows to do something along the lines of:

// The file is opened, read, and its content's bytes are stored
// and mapped to the filename. It returns its content as an
// ArrayBuffer instance.
const myBinaryFile = open('myfile', 'br');

export default function () {
  const data = {
    field: 'this is a standard form field',
    file: http.file(myBinaryFile, 'myfile'),
  };
  
  http.post('https://example.com/upload', data);
}

As of the opening of this PR, this mapping is done directly in the data module, and is done against filenames. For a given filename, there exists at any point in time a single copy of its content stored in the sharedArrayBuffer attribute of the data RootModule.

My understanding is that ArrayBuffer isn't stricto sensu immutable, but it's not designed for being modified either. It felt like an acceptable compromise. Let me know if you disagree and/or see issues with this assumption.

Scope

This PR doesn't have tests yet. I preferred to wait for your input on the design before adding any. I will add some as soon as we reach a satisfactory 🏗️ agreement on the design.

See Feedback requests and Open Question for more scope related questions.

Alternatives

An alternative I considered was using mmap instead. When a file is opened with the br flags, we would open it in read only mode, and mmap it in read-only mode. As a reminder, mmap is (posix?) a system call that allows a process to map the content of a file directly into its virtual address space: any further operations done to the mmaped file are done in memory, and the OS makes sure to sync those to the underlying file storage whenever it judges necessary.

This would have been very convenient, considering that it would be "shared" by design, anyone having the *mmaped file descriptor inside the process can access it, just like it would with any other file. Plus, it supports a similar set of flags as the OS open syscall and would easily enforce the "read-only" property.

However, there are in my opinion two main limitations to using mmap:

Expected feedback and Open Questions

  • Note: I have experimented with a bunch of alternatives and ended up cutting most of what I've done to limit it to this absolute minimum implementation. The reason is: I really had a hard time gaining any sort of certainty regarding the design. Hence, it would be really useful to get feedback from you folks on that front.
  • Question: this PR makes the initcontext.go's Open function call in the data module directly like this: dataModule.(*data.RootModule).GetOrCreateSharedArrayBuffer(filename, fileData) would you say this is acceptable? If not, what would be an alternative?
  • Feedback needed: Does this address the Shared-memory read-only files #1931 feature request in your opinion? If not, what's missing?
  • Feedback needed: Would you have designed this feature in another way? If so, how?
  • Feedback needed: The main reason for using an ArrayBuffer and not a DynamicArray with paniquing setters, was mostly due to the fact that the DynamicArray interface doesn't allow access to its underlying bytes, making common.ToBytes ineffective. If you have a better idea, please elaborate 🙏🏻

How to review this PR

There really isn't so much code in this PR as of today. I would recommend starting from the entry point: the modifications made to the Open method of the initcontext.go file, though. Then it might be interesting to follow what GetOrCreateSharedArrayBuffer in data/data.go does under the hood.

@oleiade oleiade self-assigned this Dec 17, 2021
@oleiade oleiade force-pushed the fix/1931_shared_memory_read_only_files branch from b03bebb to 816f512 Compare December 17, 2021 15:42
@na-- na-- requested review from codebien and mstoykov December 20, 2021 13:55
@na--
Copy link
Member

na-- commented Dec 20, 2021

  • Feedback needed: Does this address the Shared-memory read-only files #1931 feature request in your opinion? If not, what's missing?

  • Feedback needed: Would you have designed this feature in another way? If so, how?

I will try to look at the code in detail and address the rest of the points by the end of the week, but I wanted to chime in about the user-facing API and answer these questions first.

I am not sure if we should start with a modification to open() here. It is probably a good idea to have it, but it should probably come after we have an explicitly constructed SharedArray alternative for binary data, e.g. SharedArrayBuffer or something like that.

Somewhat prompted by #2298, I realized that if we have a generic SharedArrayBuffer, we can easily use it to implement the equivalent of open('filename', 'br') like this:

const myBinaryFile = new ShareArrayBuffer('myfile', function(){
    return open('myfile', 'r');
});

However, we can't do the reverse and use open(file, 'rb') to implement a generic SharedArrayBuffer, e.g. construct a complex request body in-script, like the user in #2298 requires... 😞 Ergo, I think we should probably start with SharedArrayBuffer first, and optionally have open() modifier that is a shorthand for it.

@oleiade
Copy link
Member Author

oleiade commented Dec 20, 2021

Hey @na--,

Thanks a lot for the feedback, that's really useful 🙏🏻 I actually started with having a SharedArrayBuffer type and API looking like what you propose, but we discussed it with @codebien, and we were unsure what the use case would be outside the context of open. In hindsight, and looking at the example you posted, I see it would make sense indeed.

A few comments come to my mind:

  • the constructor prototype would look like this: new SharedArrayBuffer(name, closure), with name being the underlying unique name of the buffer, and closure being a function returning an ArrayBuffer.
  • In your example: new SharedArrayBuffer(name, function() { return open('myfile', 'r'); } we have no control on the open statement, and can't ensure which option is being passed to it, right? I assume the options are not really important there (considering that it doesn't matter as long as the SharedArrayBuffer receives some data).
  • With this API, I would find something along the lines of const sharedArray = open('./myfile', 'br') somewhat incompatible with your example. I mean, it feels like it should be either. What do you think?

Regardless, I'm going to refactor my PR to move towards your recommendation 🐻

@na--
Copy link
Member

na-- commented Dec 20, 2021

  • the constructor prototype would look like this: new SharedArrayBuffer(name, closure), with name being the underlying unique name of the buffer, and closure being a function returning an ArrayBuffer.

👍, or at least I can't think of a reason to do anything differently - it's similar to SharedArray and seems to do everything we need, right?

  • In your example: new SharedArrayBuffer(name, function() { return open('myfile', 'r'); } we have no control on the open statement, and can't ensure which option is being passed to it, right? I assume the options are not really important there (considering that it doesn't matter as long as the SharedArrayBuffer receives some data).

True, and technically users should probably be able to do whatever they want in the closure/callback. For example, they might want to open() multiple files and build a request body out of them. As long as they return a normal ArrayBuffer, which we should validate, we shouldn't really care about what they do inside the callback.

I'd say that we can even allow the creation of SharedArrayBuffer() in VU contexts as well, not only in the init context. open() will still be limited only to the init context, but there are probably valid use cases for having SharedArrayBuffer() in VU contexts that don't require the opening of files, and I don't think there is a technical reason to restrict if from our side 🤷‍♂️

  • With this API, I would find something along the lines of const sharedArray = open('./myfile', 'br') somewhat incompatible with your example. I mean, it feels like it should be either. What do you think?

Sorry, I am not sure I understand exactly what you mean here. That if we have SharedArrayBuffer, we shouldn't add the r flag to open()? If I understood you correctly, I am fine either way - we can add both now, or we can skip the open() enhancement initially and add it in the future, once we see there's demand for it. I'm probably slightly leaning towards changing open() in the future though, it will allow us to wait and see how users use SharedArrayBuffer and we can make changes based on that feedback.

@mstoykov
Copy link
Contributor

I also haven't looked in depth in this but would like to point out (as it seems that it hasn't been noticed or noted) that SharedArrayBuffer is an actual ECMAScript type which is modifiable through Atomics.

I am kind of against our type to also be called that, as in some future moment we might want to support what is in the specification in some variant. This has implications for distributed execution and will likely need some further discussion.

So I feel like maybe we should go in a different direction from what is discussed here and instead of going for something more general to go for a more concrete solution and only have open("./file.jpg", "rb") be returning a ArrayBuffer like type.

@na--
Copy link
Member

na-- commented Dec 21, 2021

I also haven't looked in depth in this but would like to point out (as it seems that it hasn't been noticed or noted) that SharedArrayBuffer is an actual ECMAScript type which is modifiable through Atomics.

I am kind of against our type to also be called that, as in some future moment we might want to support what is in the specification in some variant. This has implications for distributed execution and will likely need some further discussion.

Ah... 😞 Any ideas for alternative names? SharedBinaryData?

So I feel like maybe we should go in a different direction from what is discussed here and instead of going for something more general to go for a more concrete solution and only have open("./file.jpg", "rb") be returning a ArrayBuffer like type.

Ignoring the naming issue, with what part of my arguments above do you disagree with?

@oleiade
Copy link
Member Author

oleiade commented Dec 21, 2021

Thanks for the pointer @mstoykov on SharedArrayBuffer and Atomic. Having close to zero experience in JavaScript still, I would have missed that 🙇🏻.

Regarding the implementation you describe @mstoykov that's also, I think, exactly what the current implementation is doing. When a user calls open('myfile', 'br'), it returns an ArrayBuffer whose data are under the hood mapped to the filename. When that array buffer is addressed later on, the data are simply fetched back from that mapping. This comes with assumptions/limitations as explained in the PR description.

Regarding either one or the other implementation, I don't have enough experience with neither our JS scripting nor with JS itself to have strong opinions yet. So I'll be relying on your experience and intuitions to decide which way "feels" the best 🍿

if err != nil {
return nil, err
}

if len(args) > 0 && args[0] == "b" {
ab := i.runtime.NewArrayBuffer(data)
if len(args) > 0 && contains(args[0], 'b') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is a particular reason why we changed string to rune? Or maybe we can stick with the string and use the strings package function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right: I had completely forgotten about the strings.Contains* methods. I'll use that instead indeed :)

i.mu.RLock()
array, exists := i.data[filename]
i.mu.RUnlock()
if !exists {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a conception of the early returns, which I found pretty helpful because it also reduces nesting and improves the reading of the methods.

For example:

if exists {
   return array
}

Do you think we can apply it here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out. I really just copied it from sharedArray's implementation. In principle, I would have prefered to keep it as a "negative" branching in this specific scenario, for (what feels to me as) clarity. But, in this specific scenario, I think it might actually lead to better performance to indeed check for existence instead. Considering the use case of this, I would expect most of the calls to this function to result in the data being already present in the slice. Hence, I assume (big assumption, I haven't verified that) it would make branch prediction's life easier if we were checking on a value that has 90% chances of being right, rather than the opposite.

Thanks for pointing it out 🐻

Copy link
Contributor

@olegbespalov olegbespalov Dec 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay :)

Maybe clarify a bit. I'm not sure if we can gain some performance (since the variable value is calculated) here, but I was saying more from the readability perspective.

For example:

if exists {
   return array
}

// ..some logic..

if another {
  ...
}

return array

is more straightforward (and easy to read) than

if ! exists {
   // ..some logic..

   if another {
       //  ...
       array = bar()
    }
}

return array

My issue with the second option is that since there is a solo exit from the function, we have to keep several conditions in mind (continue reading) before recognizing what is returning. But for sure, it's subjective, so just considering this as the sharing opinion.

// SharedArrayBuffer: sab,
// rt: rt,
// })
return runtime.NewArrayBuffer(sab.arr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This in no way prevents the underlying []byte from being changes as shown by this

import { sleep } from "k6";
var b = open("./sab.js", "rb")

export default function() {
	var v = new Int8Array(b);
	v[0] = __VU + __ITER
	sleep(1)
	console.log(v[0]);
}

(name it sab.js for easier)
running with go run -race . run -u 20 - i 30 sab.js I get

WARNING: DATA RACE
Write at 0x00c000546840 by goroutine 45:
  github.com/dop251/goja.(*int8Array).set()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/typedarrays.go:198 +0x7e
  github.com/dop251/goja.(*typedArrayObject)._putIdx()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/typedarrays.go:520 +0x108
  github.com/dop251/goja.(*typedArrayObject).setOwnIdx()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/typedarrays.go:542 +0x4a
  github.com/dop251/goja.(*Object).setOwn()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/object.go:1403 +0x16c
  github.com/dop251/goja._setElemStrictP.exec()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/vm.go:1403 +0x235
  github.com/dop251/goja.(*_setElemStrictP).exec()
      <autogenerated>:1 +0x44
  github.com/dop251/goja.(*vm).run()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/vm.go:410 +0x174
  github.com/dop251/goja.(*baseJsFuncObject)._call()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/func.go:193 +0x8b3
  github.com/dop251/goja.(*baseJsFuncObject).call()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/func.go:203 +0x125
  github.com/dop251/goja.(*baseJsFuncObject).Call()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/func.go:156 +0x65
  github.com/dop251/goja.(*baseJsFuncObject).Call-fm()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/func.go:155 +0xae
  github.com/dop251/goja.AssertFunction.func1.2()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/runtime.go:2267 +0x136
  github.com/dop251/goja.(*vm).try()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/vm.go:547 +0x355
  github.com/dop251/goja.AssertFunction.func1()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/runtime.go:2266 +0x1c6
  go.k6.io/k6/js.(*VU).runFn()
      /home/mstoykov/work/k6io/k6/js/runner.go:786 +0x46d
  go.k6.io/k6/js.(*ActiveVU).RunOnce()
      /home/mstoykov/work/k6io/k6/js/runner.go:728 +0x887
  go.k6.io/k6/lib/executor.getIterationRunner.func1()
      /home/mstoykov/work/k6io/k6/lib/executor/helpers.go:145 +0x7d
  go.k6.io/k6/lib/executor.SharedIterations.Run.func4()
      /home/mstoykov/work/k6io/k6/lib/executor/shared_iterations.go:271 +0x5a7
  go.k6.io/k6/lib/executor.SharedIterations.Run·dwrap·31()
      /home/mstoykov/work/k6io/k6/lib/executor/shared_iterations.go:283 +0x58

Previous write at 0x00c000546840 by goroutine 43:
  github.com/dop251/goja.(*int8Array).set()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/typedarrays.go:198 +0x7e
  github.com/dop251/goja.(*typedArrayObject)._putIdx()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/typedarrays.go:520 +0x108
  github.com/dop251/goja.(*typedArrayObject).setOwnIdx()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/typedarrays.go:542 +0x4a
  github.com/dop251/goja.(*Object).setOwn()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/object.go:1403 +0x16c
  github.com/dop251/goja._setElemStrictP.exec()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/vm.go:1403 +0x235
      WARNING: DATA RACE
Write at 0x00c000546840 by goroutine 45:
  github.com/dop251/goja.(*int8Array).set()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/typedarrays.go:198 +0x7e
  github.com/dop251/goja.(*typedArrayObject)._putIdx()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/typedarrays.go:520 +0x108
  github.com/dop251/goja.(*typedArrayObject).setOwnIdx()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/typedarrays.go:542 +0x4a
  github.com/dop251/goja.(*Object).setOwn()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/object.go:1403 +0x16c
  github.com/dop251/goja._setElemStrictP.exec()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/vm.go:1403 +0x235
  github.com/dop251/goja.(*_setElemStrictP).exec()
      <autogenerated>:1 +0x44
  github.com/dop251/goja.(*vm).run()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/vm.go:410 +0x174
  github.com/dop251/goja.(*baseJsFuncObject)._call()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/func.go:193 +0x8b3
  github.com/dop251/goja.(*baseJsFuncObject).call()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/func.go:203 +0x125
  github.com/dop251/goja.(*baseJsFuncObject).Call()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/func.go:156 +0x65
  github.com/dop251/goja.(*baseJsFuncObject).Call-fm()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/func.go:155 +0xae
  github.com/dop251/goja.AssertFunction.func1.2()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/runtime.go:2267 +0x136
  github.com/dop251/goja.(*vm).try()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/vm.go:547 +0x355
  github.com/dop251/goja.AssertFunction.func1()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/runtime.go:2266 +0x1c6
  go.k6.io/k6/js.(*VU).runFn()
      /home/mstoykov/work/k6io/k6/js/runner.go:786 +0x46d
  go.k6.io/k6/js.(*ActiveVU).RunOnce()
      /home/mstoykov/work/k6io/k6/js/runner.go:728 +0x887
  go.k6.io/k6/lib/executor.getIterationRunner.func1()
      /home/mstoykov/work/k6io/k6/lib/executor/helpers.go:145 +0x7d
  go.k6.io/k6/lib/executor.SharedIterations.Run.func4()
      /home/mstoykov/work/k6io/k6/lib/executor/shared_iterations.go:271 +0x5a7
  go.k6.io/k6/lib/executor.SharedIterations.Run·dwrap·31()
      /home/mstoykov/work/k6io/k6/lib/executor/shared_iterations.go:283 +0x58

Previous write at 0x00c000546840 by goroutine 43:
  github.com/dop251/goja.(*int8Array).set()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/typedarrays.go:198 +0x7e
  github.com/dop251/goja.(*typedArrayObject)._putIdx()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/typedarrays.go:520 +0x108
  github.com/dop251/goja.(*typedArrayObject).setOwnIdx()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/typedarrays.go:542 +0x4a
  github.com/dop251/goja.(*Object).setOwn()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/object.go:1403 +0x16c
  github.com/dop251/goja._setElemStrictP.exec()
      /home/mstoykov/work/k6io/k6/vendor/github.com/dop251/goja/vm.go:1403 +0x235

I don't know how exactly this needs to be fixed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I must admit I can't think of any solution either at the moment. The main thing I see, is that your example builds upon the data being an ArrayBuffer hence, copiable to a typed array object such as Int8Array. Which leads me to think that I'd be better off with a dedicated object type under the hood, instead of relying on ArrayBuffer indeed. Cheers!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned earlier ... arguably the thing I imagined and is kind of discussed (IMO) in #1931 is that we will have an immutable ArrayBuffer.

Which at least I will expect to be put in Int8Array but it will throw an exception on trying to change its value, but not on reading it. So it will act exactly as one all the way through, except for writing data in it. (arguably the exception should only happen in strict mode but k6 is always in strict mode and I doubt we are changing that.)

@na-- na-- added this to the v0.37.0 milestone Jan 5, 2022
@na-- na-- modified the milestones: v0.37.0, v0.38.0 Feb 28, 2022
@na-- na-- removed their request for review March 22, 2022 15:07
@oleiade
Copy link
Member Author

oleiade commented Apr 5, 2022

We are revisiting our strategy on that front. Closing. Feel free to reopen in the future if it's judged useful.

@oleiade oleiade closed this Apr 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants