Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite lib.archive and others for better support of everything #1051

Closed
mstoykov opened this issue Jun 14, 2019 · 0 comments · Fixed by #1059
Closed

Rewrite lib.archive and others for better support of everything #1051

mstoykov opened this issue Jun 14, 2019 · 0 comments · Fixed by #1059

Comments

@mstoykov
Copy link
Contributor

mstoykov commented Jun 14, 2019

The current state

When you make an archive, k6 creates two folders /scripts/ and /files/, where it puts files loaded with import (i.e. require()) and with open() respectfully. As far as I know, there is absolutely no reason for this distinction. This also means that, in the unlikely event of opening and requiring the same file, you will get two copies in the archive.

k6 also writes /metadata.json with a bunch of metadata for the archive, including the consolidated options and the environment variables for the script. It also includes what the actual path to the main script is, because that file is not in /scripts/, instead it is saved as /data. Again, there's absolutely no reason why this is done this way. The only upside to this is that it is easy to find.

Also, again for absolutely no apparent reason, all files are not just under /scripts/ or /files/, but under /scripts/_/ and /files/_/.

As a side effect of the fact that, currently, remote files are imported without a scheme (e.g. import x from "mysite.com/scripts/x.js"), they are written inside of /scripts/ but not /scripts/_/. So maybe this is the reason for the _ to distinguish between local and remote files? I find it unlikely, but... I would say a lucky coincidence, since it could help with #887.

Another thing that ... is not quite right is that, if you import cdnjs.com/libraries/Faker, the cdnjs-specific loader will download a version from some URL like https://cdnjs.cloudflare.com/ajax/libs/Faker/3.1.0/faker.js, but it will write it in cdnjs.com/libraries/Faker. So, if in one place you load it one way and the other in another, it will load it twice...

The things that that we want:

  1. Simpler and easier to understand archives that don't have some strange separation of the FS and strange symbols inside.
  2. A way to distinguish reliably between files from the local fs and files from remote loading.
  3. Not loading or storing the same file multiple times because it was loaded by a slightly different name.

To this end I propose that I extend #1046 (or make new PR) where

  1. All moduleSpecifiers are resolved to a full URI (except k6 specific ones) with scheme and everything and that is used to load the file from "everywhere".
  2. The archive contains:
    2.1. The file metadata.json as currently is.
    2.2 A folder for each scheme that was loaded/required with a full tree of the files below. Here we should probably urlencode urls? There is also the problem that you can currently specify query parameters and they are respected which also need a way of loading them.
    2.3. Nothing else - the current /data is just a file in one with a scheme (usually file).
    2.3.1. Maybe I will leave /data, as it is useful to know which the entry file is ...easily ... maybe not.
  3. After that we drop the Scripts and Files fields as per Remove lib.Archive.Files and lib.Archive.Scripts #838.
  4. When running from an archive, everything (including https files) are loaded from the archive (Running from archive still downloads scripts from the internet #887).
  5. When running from local machine, we currently do CacheOnRead from the FS - we can probably do something similar for URLs.
  6. Write some compatibility layer that translates the old archive format to the new one on loading. This might turn out to even require us to set some flag somewhere that we need to try to load it remotely if no file is found. Here we will also need to... magically figure out what cdnjs.com/libraries/Faker means... which will be hard given that this takes the latest version at that time... but we can probably just shotgun it and get the first thing from the archives which looks close enough.
@na-- na-- changed the title Rewrite lib.archive and others for better support of everything. Rewrite lib.archive and others for better support of everything Jun 14, 2019
mstoykov added a commit that referenced this issue Jun 26, 2019
This includes also change to the archive file structure. Now instead of
separating files by whether they are scripts or no, they are separated
based on whether their URI scheme.

Through the (majority) of k6 now instead of simple strings a url.URL is
used to idenitify files(scripts or otherwise).

This also means that all imports or `open` can have schemes.
Previously remote modules were supported specifically without scheme.
With this change we specifically prefer if they are with a scheme. The
old variant is supported but logs a warning.
Additionally if remote module requires a relative/absolute path that doesn't have a
scheme it is relative/absolute given the remote module url.

Because of some of the changes, now caching is done through a afero.Fs
instead of additional map. This also fixes not laoding remotely imported
files from an archive, but instead requesting them again.

fixes #1037, closes #838, fixes #887 and fixes #1051
mstoykov added a commit that referenced this issue Jul 31, 2019
This also includes a change to the archive file structure. Now instead of
separating files by whether they are scripts or not, they are separated
based on their URI scheme.

Throughout the (majority) of k6 now, instead of simple strings, a url.URL is
used to identify files (scripts or otherwise).
This also means that all imports can have schemes. Previously remote modules
were supported specifically without a scheme. With this change k6 prefers if
they are with a scheme. The old variant is supported, but logs a warning.
Additionally if remote module requires a relative/absolute path that doesn't
have a scheme, it is relative/absolute given the remote module url.

Because of some of the changes, now caching is done through a afero.Fs
instead of additional maps. This also fixes the remotely imported files not 
properly loading from an archive, but instead requesting them again. This is
supported with old archives and for github/cdnjs's shortcut loaders. 

For future-proof reasons the archive now also records the GOOS value of the 
k6 that generated it. 

Anonymize case insensitively as windows paths are (usually) case insensitive. 

fixes #1037, closes #838, fixes #887 and fixes #1051
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant