Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax computed importstr ? #196

Open
ant31 opened this issue May 26, 2016 · 34 comments
Open

Relax computed importstr ? #196

ant31 opened this issue May 26, 2016 · 34 comments

Comments

@ant31
Copy link

ant31 commented May 26, 2016

Do you think it will be possible to use importstr with variable?

e.g:

 {
 files: [
        { 
         file: filepath,
         content: importstr filepath
         } for filepath in ['a.txt', 'b.txt', 'c.txt']
  ],
}
some_function(file):: (
    local content = importstr file;
    #process the content
  )
@ant31
Copy link
Author

ant31 commented Dec 9, 2016

Any technical reason for importstr to not allow import variable ?
I understand for import, but I think imporstr is a little different, it's equivalent to do an file.open(filepath) in other languages, and they mostly all accept a variable.

@sparkprime
Copy link
Contributor

Widening it to a variable is just as "bad" as widening it to an arbitrary expression, so if we were going to do anything it would be that.

The property that we gain by having the restriction is that you can do a simple static analysis and determine which files are dependencies, and then determine whether or not the code needs re-executing. This is potentially useful although I'm not sure anyone has made use of it yet. However if we lift the restriction and lose the property, we won't be getting it back again. So it needs careful thought.

There is one difference between Jsonnet and other programming languages, and that's that everybody writes programs that terminate in a relatively short period of time, so one answer is that if you want to know what files affect the evaluation of a config, you should execute the config. Or, in the case of deciding whether or not a cached JSON value needs recomputing, save not just the JSON output but the list of files used to compute it.

@ant31
Copy link
Author

ant31 commented Dec 10, 2016

Thank you for the detailed explanation .

we won't be getting it back again

What about an explicit keyword or function to express it?
like: importFrom(expression) , this way users explicitly chose to lose the static analysis property ?

@ant31
Copy link
Author

ant31 commented Dec 17, 2016

I'm closing it,
I've been able to do it easily via a native extension.

Thank you.

@ant31 ant31 closed this as completed Dec 17, 2016
@sparkprime
Copy link
Contributor

I suppose native extensions broke static analysis already :) Same as they do in Java etc.

@nikolay
Copy link

nikolay commented Dec 15, 2017

@sparkprime The stdlib is so limited and so opinionated that, basically, we're forced to write native extensions, which kind of defeats the purpose of using Jsonnet.

@sbarzowski
Copy link
Collaborator

@nikolay
It's not about the stdlib, it's about the language. Hermiticity is a core principle of Jsonnet. The output shouldn't be affected by the environment. Jsonnet code shouldn't jump around the filesystem and read files. Import mechanism is designed for importing Jsonnet libraries etc. and not for arbitrary IO.

This guarantees that it will work the same on every machine and makes it much easier to reason about the code. But of course it restricts the structure of the solution - any input data must be prepared beforehand and passed to Jsonnet.

So it's sort of a "it's not a bug it's a feature" situation. That said, it would always be nice to make passing data to Jsonnet easier (and more obvious).

we're forced to write native extensions

Native extensions are not a proper solution here. It's a way to bypass the restrictions of the language. But there are reasons for these restrictions. Also native functions were supposed to be pure functions - for cases in which Jsonnet is not fast enough or a native library is used. Using them to perform IO is an abuse of the mechanism, IMO. It's good it worked for @ant31, but I don't think it's a recommended way.

(Related #422)

@sparkprime
Copy link
Contributor

If you were forced to write native functions to do things that were 1) pure and 2) relatively simple, like not regex matching or gzip compression or whatever, then there's a case for adding it to the stdlib instead.

Things that are pure, complex, but well-specified with pure implementations available in all languages (like gzip) then we could add them as builtins.

Things that have "writing" side effects should not be added as native functions by anyone. Things that have "reading" side effects could be added, as long as you're OK with it maybe reading them more times than you expect (or not at all). Typically this is OK for things like reading from the same dir (you will want to do an import cache to make it completely robust), but less OK when reading remotely.

Not having computed imports is somewhat orthogonal to all of this. It's essentially about knowing what production systems may be impacted by config refactorings. You can see this in Bazel as well - you have to predeclare what you might read. The idea is then it can always know what to rebuild.

When people ask for computed imports, I wonder what they're trying to do. We could definitely strengthen the way Jsonnet can take "input" data from outside. There was a proposal at one point to make it easier to do pipe-like functionality, e.g.

jsonnet pipe -e 'J + {x: 3}'

could be a syntax sugar for

jsonnet -e 'let J = import "/dev/stdin"; J + {x: 3}'

That gives you a jq-like functionality but it's also not that big a difference from the status quo.

@nikolay
Copy link

nikolay commented Dec 18, 2017

@sparkprime Well, that above is nice, but can't be done with importstr if you have more than external dynamic config file you want to import.

@sparkprime
Copy link
Contributor

sparkprime commented Dec 23, 2017

So you have more than one dynamic config file, and you also need their filenames to be parameterized from outside of Jsonnet?

@nikolay
Copy link

nikolay commented Dec 23, 2017

Yes, @sparkprime. There are many legit use cases for it.

@sparkprime
Copy link
Contributor

What if there was a way to give a list of files on the commandline, and have them all accessible internally as a dict keyed by filename? You can do this now using --code-str but it'd be a clunky bash one-liner.

@sparkprime sparkprime reopened this Jan 9, 2018
@nikolay
Copy link

nikolay commented Jan 9, 2018

@sparkprime That would definitely be an improvement over the status quo if you want to preserve the immutability at any price.

What I'm trying to accomplish is something along the lines of Kapitan and ksonnet, but using as much Jsonnet as possible and as little Bash as possible, too.

This approach would also allow the Bash script to concatenate/template-expand those configs and pass them to the Jsonnet compiler.

@sparkprime
Copy link
Contributor

@ant31 would that have helped for your original issue?

@hausdorff what do you think?

@dgarstang you once said you were continually bitten by this, what do you think?

@ant31
Copy link
Author

ant31 commented Jan 11, 2018

@sparkprime the approach of giving the files in the command line ?
It would not solve my initial usecase.

I'm packaging all together kubernetes resources, import the json/yaml and perfom transformations with jsonnet:

manifest.jsonnet
resources/deployment.json
resources/svc.json
resouces
data/config.ini

I want to abstract as much jsonnet internal as possible to end-users to feel like 'plain' json.
inside the manifest I have something like:

expand(
resources: {
  file: 'resources/deployment.yaml',
  name: 'toto-app'
 }, 
 {
   file: 'resources/svc.yaml',
   name: 'toto-app'
 }];
configs: [{
 file: 'data/config.ini'
}])

The structure for the end-user is easy and doesn't requires any knowledge about jsonnet expect the call to the 'expand' function.
the work in done internally:

expand(resources, configs):: 
 [createK8sResource(importstr(x.file)) for x in resources] + 
 [createConfigMap(importstr(config.file) for x in configs]

@sparkprime
Copy link
Contributor

@ant31 In your case only a complete computed import string would do then.

@sbarzowski
Copy link
Collaborator

Unless I'm missing something providing an object with directory contents (possibly recursive), and have imported directories specified in the commandline would also work and be a bit easier to control. It would require handling the paths 100% on the jsonnet side (to get resources/svc.yaml it would require std.imported_dirs["resources"]["svc.yaml"]).

This is as strong as computed imports, but doesn't let jsonnet jump around the whole filesystem, only the specified part, hides the absolute paths and discourages libraries from silently depending on /dev/urandom and doing other un-jsonnety stuff.

Maybe that could even be integrated with the current extVar/tla mechanism.

What do you think?

@ant31
Copy link
Author

ant31 commented Jan 12, 2018

but doesn't let jsonnet jump around the whole filesystem

How this would be different in term of Hermiticity ?

@sbarzowski
Copy link
Collaborator

sbarzowski commented Jan 13, 2018

but doesn't let jsonnet jump around the whole filesystem

How this would be different in term of Hermiticity ?

The caller of jsonnet command has control over what data gets passed to jsonnet. This changes a lot.

Consequences somewhat related to hermicity:

  • It provides a "upper bound" on what a jsonnet program depends, without running the program. You can be sure that unless contents of the directory change, you can zip it send somewhere, run jsonnet on a differrent machine with the unpacked zip and the result will be exactly the same.
  • Libraries cannot sneakily depend on global stuff (/etc/, /proc/). It's the easiest for library authors to take any data they need as arguments - which is the right thing. Or at the very least they need to explicitly tell what they depend on.
  • Even if some jsonnet program expects to get the whole / nobody can prevent you from passing something else (e.g. mounted volume or archived contents from some other machine) - it's no longer the environment in which jsonnet runs, it's just some data you pass to it.
  • You can actually see the directories you pass as a fancy way to represent json data.
  • The same data can be provided from sources other than the actual filesystem (e.g. json file) and jsonnet won't be able to tell a difference.

It's fairly safe feature to add:

  • It doesn't require any changes to jsonnet language semantics, only to the way data is loaded.
  • The behavior of jsonnet command stays exactly the same, unless you pass an additional option.

Proof of concept

In the previous section I said that:

  • You can actually see the directories you pass as a fancy way to represent json data.

So let's do just that. I wrote a little Go program https://github.com/sbarzowski/dir2json

You can use it to easily provide directory contents to jsonnet:

dir2json dir_you_want_to_pass_to_jsonnet > /tmp/dir.json # it's likely too big for a commandline argument, so I need a temporary file
jsonnet -e "function(directory) directory["my_file.txt"]" --tla-code-file="/tmp/dir.json"

The downside is that it reads the whole directory eagerly and stores it in a temporary file. And it's not exactly convenient.

More precise proposition

Let's add --ext-dir <var>=<path> and --tla-dir <var>=<path> which create an object out of the directory and pass it as extVar or top level argument respectively. Exactly like dir2json above, but it would load the file contents lazily.

So the example from PoC would become:

jsonnet -e "function(directory) directory["my_file.txt"]" --tla-dir="tmp/dir.json"

EDIT: probably std.parseJson and maybe std.parseYaml would be needed for this feature to reach its full potential.

@jwise
Copy link

jwise commented Mar 28, 2018

I mentioned a use for this in #479, as automatically noted. Another use that I found recently was importing a set of variables. Consider:

local DmConfig(p) = {
  local defaults = {
    dm_path:: error "must override dm_path",

    y_coeffs: import (self.dm_path + "/y_coeffs.json"),
    u_coeffs: import (self.dm_path + "/u_coeffs.json"),
    v_coeffs: import (self.dm_path + "/v_coeffs.json"),
  },

  { "out": defaults + p }
}

{ ... DmConfig({dm_path: "test1/dm"}) ... }

(The reason I specify defaults + p here, rather than directly overriding things here, is #217, which I forgot about until I picked up this project again... but it is just as illustrative if you strip off the function generation there.)

Without a computed import, we have to repeat the path to the coefficient includes three times.

mbarbero added a commit to mbarbero/jsonnet that referenced this issue May 8, 2020
Adds a new builtin function (importstrDyn) that accept a computed string.

I did not modified the importstr logic itself as I don't understand well
enough how the parser is working.
@alexei-matveev
Copy link

alexei-matveev commented Jun 17, 2021

I like the idea of "hermeticity" and "explicitness" of imports too.
Still it felt like a waste of resources somewhat to prepare a "directory index"
for a list of Grafana dashboards I had to wrap in k8s Objects.

Did you consider a variation of "dir2json" or a "builtin" to make
"filesystem snapshots" that are lazy? Somewhat like this,
still very explicit:

#!/usr/bin/env jsonnet                                                                                                                            
local fs = {
  // Try s/importstr/import/ here! Hint: it wont terminate.                                                                                       
  "fs.jsonnet": importstr "fs.jsonnet",
  "a.txt": importstr "a.txt",
  "b.txt": importstr "b.txt",
  "sub": {
    "c.txt": importstr "sub/c.txt",
    "d.txt": importstr "sub/d.txt"
  },
};

fs

Maybe there could be a "builtin" to Import a static list or a tree like above
to reduce boilerplate? Also like this example shows it is usually the
"eval" that is "evil" for analysis. As long as the imported strings are not
interpreted (parsed into Jsonnet Code?) it is just data in bytes after all.

@sbarzowski
Copy link
Collaborator

As long as the imported strings are not interpreted (parsed into Jsonnet Code?) it is just data in bytes after all.

I agree. I would be on board with importdir which imports a directory structure with each directory being an object and each file being a string. It would be lazy, at least with regard to loading the actual files.

So you could do something like: importdir "my_directory/" and it would load everything from there.

This would require extending the imports abstractions slightly to allow producing directory listings and it could be a bit annoying to implement. Also opens questions about handling of hidden files.

Did you consider a variation of "dir2json" or a "builtin" to make "filesystem snapshots" that are lazy?

Hmm.... Not sure I understand. What would this tool do exactly? Is it the same as importdir? Or do you suggest creating files like in your example automatically. It would be very easy to create an external tool to do that.

@alexei-matveev
Copy link

suggest creating files like in your example automatically

An easy/standard way to create "data structures" eventually like the one
returned in the example should probably be the goal. Files with textual
representations are overrated.

Yes, I think you just described with "importdir" the kind of builtin most commenters would be happy with, including me. And yes, without such "importdir" many will likely end up with ad-hoc tools creating files/structures like in the example above. I am undecided which is better when I think of
"explicit is better that implicit" though.

Disclaimer: I am a happy user of jsonnet since about just a few months. Thanks you for caring about it, btw!

@Jezza
Copy link

Jezza commented Jul 22, 2021

importdir could be as simple as producing a map that's just the filename -> content.

importdir './dir/' => {
    "file.txt": "content",
    "config.toml": "value = 2",
}

@sbarzowski
Copy link
Collaborator

@Jezza Yes, that's basically the idea. There are still some complications, though:

  1. There might be further directories, so it should probably be recursive.
  2. We support importing from things which are not traditional filesystems. We have an abstract interface for that. We would need to extend the interface so that listing things from a directory is possible. Right now, we don't even have a concept of a directory. We'll need to figure out the guarantees, e.g. (importdir 'a/b')['foo'] == importstr 'a/b/foo' – but we don't know what separator to use, so....
  3. We need to implement it in all interpreters.
  4. We'll need to support it ~forever, so a stupid design mistake can be very costly.

Because of that, a file generation approach would take at least 10x less work IMO. This can be done with very little commitment and as a single, separate tool. And it doesn't require any knowledge of the internals. Ofc. it will work only with the actual file systems, not with the abstract importers.

@jacksongoode
Copy link

Any further interest in carrying this issue forward? It seems like a desirable and one without many alternatives other than creating a mapping of all file paths yourself.

@sparkprime
Copy link
Contributor

How do people feel about a separate commandline tool for turning directory contents into jsonnet import manifests? We could provide this like jsonnetfmt -- a separate tool but part of the release and maintained.

@alexei-matveev
Copy link

It's not about the stdlib, it's about the language.

commandline tool for turning directory contents into jsonnet import manifests?

Another possibility to keep it out of the core language would be a command line flag like

jsonnet --ext-DIR lazyDir=./fs/ ...

or a variation thereof making lazyDir accessible in the code.
It is probbaly not much different from this use case of hypothetical
tool for generating "import manifests":

jsonnet --ext-code lazyDir="$(jsonnet-codegen-import-manifests ./fs/)" ...

I must add the whole issue does not seem a big deal to me anymore ---
far less pressing that it seemed originally.

@sparkprime
Copy link
Contributor

@alexei-matveev I'd actually be very fascinated to know why your attitude changed as it would give insight into how important this feature is

@alexei-matveev
Copy link

alexei-matveev commented Nov 22, 2022

Not much to report, actually. The isssue did not prove to be a "cost center"
when manually maintaining the catalog of those Grafana dashboards all sitting in the same directory.
Adding or removing an entry to such catalog was not a big deal compared to all the rest
and we managed to stay with a single catalog so far. We are not quite "web scale" yet :-)

Off-topic: i spent much more time converting YAML with comments to Jsonnet and preserving comments.
Maybe jsonnetfmt could learn that trick?

@sparkprime
Copy link
Contributor

If there's a YAML parser out there that preserves comments, it should be pretty easy (and there must be, because YAML reformatters exist).

@sbarzowski
Copy link
Collaborator

Off-topic: i spent much more time converting YAML with comments to Jsonnet and preserving comments.
Maybe jsonnetfmt could learn that trick?

A tool like this already exists: https://github.com/waisbrot/yaml2jsonnet. There was a discussion about it here: #765 and @waisbrot created such a tool.

@jacksongoode
Copy link

Still not entirely sure, if the solution proposed by @sbarzowski with dir2json is the current standard of addressing this issue. I guess there is some inherent limitation in jsonnet attempting to try/load static resources?

@vkaul11
Copy link

vkaul11 commented Jun 20, 2023

I am also bogged down by this issue. Can we update something on this @sparkprime ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants