-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite script/files loading to be be url based #1059
Rewrite script/files loading to be be url based #1059
Conversation
This includes also change to the archive file structure. Now instead of separating files by whether they are scripts or no, they are separated based on whether their URI scheme. Through the (majority) of k6 now instead of simple strings a url.URL is used to idenitify files(scripts or otherwise). This also means that all imports or `open` can have schemes. Previously remote modules were supported specifically without scheme. With this change we specifically prefer if they are with a scheme. The old variant is supported but logs a warning. Additionally if remote module requires a relative/absolute path that doesn't have a scheme it is relative/absolute given the remote module url. Because of some of the changes, now caching is done through a afero.Fs instead of additional map. This also fixes not laoding remotely imported files from an archive, but instead requesting them again. fixes #1037, closes #838, fixes #887 and fixes #1051
Things that need discussion/more work, maybe:
|
Codecov Report
@@ Coverage Diff @@
## master #1059 +/- ##
=========================================
- Coverage 72.78% 72.58% -0.2%
=========================================
Files 133 133
Lines 9884 9911 +27
=========================================
Hits 7194 7194
- Misses 2272 2291 +19
- Partials 418 426 +8
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #1059 +/- ##
==========================================
+ Coverage 72.79% 72.98% +0.18%
==========================================
Files 133 138 +5
Lines 9890 10152 +262
==========================================
+ Hits 7199 7409 +210
- Misses 2276 2305 +29
- Partials 415 438 +23
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the approach for using url.URL
structs to record the paths, but I'm starting to think that this may be an implementation detail that has been unnecessarily exposed too "high" in the k6 stack in this pull request... 😕
It seems to me that it makes more sense to treat the paths in the user-facing parts of k6 (k6 run <whatever>
, import X from "<whatever>"
, require("whatever")
, open("whatever")
, etc.) as simple strings - they will just use exactly what the user passed. This would make those parts of k6 dumb to the actual mechanics of what happens under the hood, but will allow us to have sensible error messages to the user, displaying exactly the whatever
string path they specified. All the while, the actual file loading code underneath can deal with the complexities in all of the many different execution scenarios.
Basically, if we have a Loader
interface (for lack of a better term, since it would mostly replace the current one) with the methods LoadModule(string) (Something?, error)
(for import
/ require()
) and LoadFile(string) (io.ReadCloser, error)
(or something else, like GetFS()
- for open()
and any future file APIs), the user-facing k6 parts should be happy. But underneath, two or more implementations of that interface (one for archives and one for the rest?) deal with the complexities of actually resolving and loading whatever was requested:
- using
url.URL
everywhere, or only for some things? - using different layer cakes of
afero.Fs
(or whatever) implementations that deal with whatever is needed - caching, slash conversion for windows archives, etc. - resolving things properly - either via the afero FSes, or via the github/cdnjs shortcuts, etc.
- ?? (I'm definitely missing something...)
js/bundle.go
Outdated
// NewBundle creates a new bundle from a source file and a filesystem. | ||
func NewBundle(src *lib.SourceData, fs afero.Fs, rtOpts lib.RuntimeOptions) (*Bundle, error) { | ||
func NewBundle(src *lib.SourceData, fileFS afero.Fs, rtOpts lib.RuntimeOptions) (*Bundle, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this rename? Are we sure that this file system will always be a fileFS
, not something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my current reworking due to windows ... I now provide the whole map[string]afero.Fs
, for reasons that will become apparent :)
But yeah the idea here was that this would be the fileFs that was used to load the original script from ... Obviously this doesn't work very well as
- We won't cache that one when we first load it in the
cmd/run.go
or elsewhere - Won't work if we load (and again not cache it) if it was from a remote script ... aka https Fs :)
@@ -58,11 +62,11 @@ func TestNewBundle(t *testing.T) { | |||
}) | |||
t.Run("Invalid", func(t *testing.T) { | |||
_, err := getSimpleBundle("/script.js", "\x00") | |||
assert.Contains(t, err.Error(), "SyntaxError: /script.js: Unexpected character '\x00' (1:0)\n> 1 | \x00\n") | |||
assert.Contains(t, err.Error(), "SyntaxError: file:///script.js: Unexpected character '\x00' (1:0)\n> 1 | \x00\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems strange and slightly wrong to me - why should the user see the file
schema in the error messages if they didn't explicitly specify the schema themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should print both the original thing that the user wrote and what we resolved it to as in loader.Load
I find the resolved final url to be even more useful as otherwise you don't actually know what we resolved whatever you provided us with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the resolved final url to be even more useful as otherwise you don't actually know what we resolved whatever you provided us with
I don't understand what you mean by this, sorry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you have two scripts which have require("./scripts.js")
but are in different directories and our message only tells you ./script.js ... had an error ...
you will have trouble to find out which ./script.js
it is. This gets even worse if you have a https
import which imports ./script.js
or the more likely ./utils.js
.
But yeah I agree that having the actual name of what the user provided is a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you have two scripts which have require("./scripts.js") but are in different directories and our message only tells you ./script.js ... had an error ... you will have trouble to find out which ./script.js it is.
In my head, the clear way to solve this is that both the originating file and the problem should be part of the log message, something like this:
/dir1/script1:line1234: cannot open "whatever the user wanted to open"
/dir2/script2:line3456: cannot open "whatever the user wanted to open"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, wrong resolve... in both cases, whatever the user wanted to open
should be exactly the string the user specified though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so... is this fixable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider the new one an improvement on the previous error. And what you propose to be a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how this improves and fixes a ton of things, but I'm honestly starting to get scared by the scope of it and by merging this before what is supposed to be a bugfix release...
cmd/archive.go
Outdated
fs := afero.NewOsFs() | ||
src, err := readSource(filename, pwd, fs, os.Stdin) | ||
filesystems := createFilesystems() | ||
src, err := readSource(filename, pwd, filesystems, os.Stdin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as cmd/run.go
... I think you can currently re-bundle archives, which is a somewhat nice feature, for example as a way to change their options: k6 archive -u 20 -d 2m oldArchiveWithOtherOptions.tar
. So, passing the caching filesystem probably isn't for the best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome .. I was intending on having this as feature that upgrades the archive to a new version :). This is not actually a problem because after the archive is read it's filesystems are used not the ones that are provided here. Those are used if the file is not an archive but a script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm but then it means that we have a potentially multi-megabyte tar archive saved in memory in that cachedFS that won't be garbage-collected. Not a huge issue for k6 archive
, unless the original tar is huge, but a minor issue for k6 run
and k6 cloud
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would it not be garbage-collected ? We don't save any reference to anything of the fs in ReadSource and newRunner completely ignores it if it's an archive. I don't see why it won't be collected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the root object of that memory tree would be filesystems
here (and in the other commands), which persists for the whole duration of the command execution. So, the tar archive will continue to be present there in its files
, due to the fsext.CacheOnReadFs
code, I think
…chiveLoadingAndSupportSchemesInFiles
The error message is not perfect, but this will be a very strange case either way. Mostly for coverage ;)
cmd/run.go
Outdated
} else { | ||
srcLocalPath = filepath.Join(pwd, src) | ||
} | ||
srcLocalPath = filepath.Clean(afero.FilePathSeparator + srcLocalPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm this seems strange if we're running on Windows - why is it ok to prepend the FilePathSeparator
to an absolute path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we are stripping it in the TrimFilePathSeparatorFs
;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right... This whole thing feels like it's way, way more complicated than it needs to be... Unfortunately, I don't have a very clear idea how to simplify it... 😞
Case in point, I very strongly think that code in cmd
should have absolutely no idea that TrimFilePathSeparatorFs
exists, much less trying to accommodate it in such a way... 😕 To have such a seemingly low-level implementation detail exposed here is another sign that we need to rethink the abstractions... Can you at least add a comment here, so other people (or us, in the future) aren't confused by this like I was?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, why is this function and createFilesystems()
here, instead of in loader/
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No better reason than: nobody moved it and this is the only place where it used ... so ...
IMO we(I) am going to refactor this at some point and possibly have an "object" that is basically the filesystems with methods such as Resolve and Load.
But given that it is only used here I am somewhat against moving it somewhere else just because this package should probably not have it ... at least for now.
I am adding some readSource tests though as I am sick of fixing and breaking things between windows and linux
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 👍 for tests of this code... it would probably be easier to test if it's not in cmd
though 😉 But even if you're leaving it here, please at least add a comment that explains this seemingly strange line...
Regarding any future refactoring, I think I sent you a proto-proposal on Slack? Basically, I think we should scrap the loader
and just have an interface called Environment
or an ExecutionEnvironment
or something like that, with methods for import
/require()
and open()
. And have 2-3 different implementations of that interface:
Native
/System
/OS
/ whatever - initialized once with the current working directory (so we don't pass it around everywhere, append slashes, etc. strange things we currently do 😑 ), and then used to read (and cache) files from the local file system and actual network.Archive
- initialized from archives, never does any actual network connections or touches the real FS.Test
/Dummy
- for simple tests, maybe just aNative
environment with the OS FS swapped out with a in-memory one.
The rest of the code, especially in the higher-level parts in cmd/
and js/
, should have absolutely no idea about any of the implementation details in these environments - Windows/Linux file slashes, remote HTTPS scripts or imports, etc. But all of that, or something better (since this probably isn't enough...) could wait until a future refactoring of the code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find any other immediate issues with this, besides the minor ones I noted above, but I'm not super happy with it... It fixes a ton of bugs and issues, hopefully without adding others, so we should merge it after a bit more testing, but there definitely seems to be enough work for a second more abstraction-oriented refactoring in the future...
The current architecture definitely feels off, the handling of URLs for sure seems painful in a lot of places, especially when it comes to the use of Opaque, which I didn't even know was a thing before this PR, and I doubt we use for its intended purpose... 😕 It seems like this PR adds a lot of complexity, and I'm hopeful that we could reduce it, or at get a lower ratio between the accidental/essential parts of it with another pass in the future.
Codecov Report
@@ Coverage Diff @@
## master #1059 +/- ##
=========================================
Coverage ? 72.98%
=========================================
Files ? 138
Lines ? 10153
Branches ? 0
=========================================
Hits ? 7410
Misses ? 2305
Partials ? 438
Continue to review full report at Codecov.
|
…chiveLoadingAndSupportSchemesInFiles
This includes also change to the archive file structure. Now instead of
separating files by whether they are scripts or no, they are separated
based on whether their URI scheme.
Through the (majority) of k6 now instead of simple strings a url.URL is
used to identify files(scripts or otherwise).
This also means that all imports or
open
can have schemes.Previously remote modules were supported specifically without scheme.
With this change we specifically prefer if they are with a scheme. The
old variant is supported but logs a warning.
Additionally if remote module requires a relative/absolute path that doesn't have a
scheme it is relative/absolute given the remote module url.
Because of some of the changes, now caching is done through a afero.Fs
instead of additional map. This also fixes not loаding remotely imported
files from an archive, but instead requesting them again.
fixes #1037, closes #838, fixes #887 and fixes #1051