Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: Use StringDict-like return value for Thread.Load #463

Open
ashi009 opened this issue Mar 22, 2023 · 9 comments · May be fixed by #464
Open

FR: Use StringDict-like return value for Thread.Load #463

ashi009 opened this issue Mar 22, 2023 · 9 comments · May be fixed by #464

Comments

@ashi009
Copy link

ashi009 commented Mar 22, 2023

I'm implementing a stardoc-like program (extracting docs from bzl files). In such a program, we don't really need to know the concrete types of foreign symbols. For now, the Thread.Load returns a StringDict, which means I need to resolve the loading bzl path, and exec that file just to make the Exec work.

I'm proposing that let Thread.Load returns a StringDict-like interface with Get and Keys methods instead. Which will allow users to mock a load.

The Thread.Load is used as follow:

dict, err2 := thread.Load(thread, module)
thread.beginProfSpan()
if err2 != nil {
err = wrappedError{
msg: fmt.Sprintf("cannot load %s: %v", module, err2),
cause: err2,
}
break loop
}
for i := 0; i < n; i++ {
from := string(stack[sp-1-i].(String))
v, ok := dict[from]
if !ok {
err = fmt.Errorf("load: name %s not found in module %s", from, module)
if n := spell.Nearest(from, dict.Keys()); n != "" {
err = fmt.Errorf("%s (did you mean %s?)", err, n)
}
break loop
}
stack[sp-1-i] = v
}

@adonovan
Copy link
Collaborator

I'm implementing a stardoc-like program (extracting docs from bzl files). In such a program, we don't really need to know the concrete types of foreign symbols.

That's not true in general. If you intend to produce the documentation for all bindings in a module as a dynamic analysis--i.e. by loading the module, causing the execution of all its top-level statements including possible side effects on the host application, or even nontermination--then the exact definitions of everything loaded do potentially matter. (I think the Bazel documentation tools made a design mistake by taking the dynamic route.)

Why not produce documentation statically, that is, by parsing the source file and looking for the statements that define top-level variables? This route is guaranteed to produce its answer deterministically, efficiently, and robustly.

Loosely related:

@ashi009
Copy link
Author

ashi009 commented Mar 27, 2023

I'm implementing a stardoc-like program (extracting docs from bzl files). In such a program, we don't really need to know the concrete types of foreign symbols.

That's not true in general. If you intend to produce the documentation for all bindings in a module as a dynamic analysis--i.e. by loading the module, causing the execution of all its top-level statements including possible side effects on the host application, or even nontermination--then the exact definitions of everything loaded do potentially matter. (I think the Bazel documentation tools made a design mistake by taking the dynamic route.)

Correct, but good enough in terms of doc generation.

Why not produce documentation statically, that is, by parsing the source file and looking for the statements that define top-level variables? This route is guaranteed to produce its answer deterministically, efficiently, and robustly.

Although Starlark is designed to replace the way-too-flexible python used in the early versions of Bazel, the core concepts like rule, aspect, and provider remain dynamically defined at runtime.

Static analysis won't work for those dynamically defined constructors (and they are not rare!) That's why Stardoc is considering building the doc gen into bazel (https://github.com/bazelbuild/stardoc/blob/master/docs/future_plans.md). But I don't think that will come any time soon.

Until then, I want to have a zero-config replacement for Stardoc which requires lots of instruments to get even the most basic docs.

@adonovan
Copy link
Collaborator

the core concepts like rule, aspect, and provider remain dynamically defined at runtime.

Fair point about the dynamism of the language being a limit on what you can achieve using the static approach. (I mistakenly thought you were making an analogy with .bzl as opposed to meaning it literally.)

Correct, but good enough in terms of doc generation.

I'm glad you linked the Starlark future plans doc, because I think it is evidence that the "dynamic fake" approach is actually not good enough for doc generation, and that really it's better to get the build tool to execute the initialization of the .bzl files using 100% accurate semantics and then dump an easily parseable representation of the dynamic effects (e.g. rules, providers, aspects) from which you can generate documentation. (I argued for this approach while I was working on the Java implementation of Starlark used by Bazel, as Stardoc was the reason for a large number of hacky intrusions into the intepreter that would not have been necessary had Stardoc called "bazel query" as a subprocess.)

@ashi009
Copy link
Author

ashi009 commented Mar 27, 2023

the core concepts like rule, aspect, and provider remain dynamically defined at runtime.

Fair point about the dynamism of the language being a limit on what you can achieve using the static approach. (I mistakenly thought you were making an analogy with .bzl as opposed to meaning it literally.)

Correct, but good enough in terms of doc generation.

I'm glad you linked the Starlark future plans doc, because I think it is evidence that the "dynamic fake" approach is actually not good enough for doc generation, and that really it's better to get the build tool to execute the initialization of the .bzl files using 100% accurate semantics and then dump an easily parseable representation of the dynamic effects (e.g. rules, providers, aspects) from which you can generate documentation. (I argued for this approach while I was working on the Java implementation of Starlark used by Bazel, as Stardoc was the reason for a large number of hacky intrusions into the intepreter that would not have been necessary had Stardoc called "bazel query" as a subprocess.)

I'd love to have this built into the bazel as a subprocess. But I'm a little skeptical about when it will be ready (almost 2 years overdue already). My somewhat production-ready experiment with the patched starlark-go is about 1000-loc, and already delivering better results compared to stardoc, moreover, it doesn't need bzl_library at all.

From this perspective, I really want to have this patch to be merged (after some revising on naming and doc).

@adonovan
Copy link
Collaborator

From this perspective, I really want to have this patch to be merged (after some revising on naming and doc).
I'm proposing that let Thread.Load returns a StringDict-like interface with Get and Keys methods instead.

Thread.Load cannot be changed without breaking API compatibility. We could add a "Thread.Load2" hook that takes precedence, but I still don't really understand how StringDictLike is materially different from StringDict.
Both data types have operations to get the keys, test key membership, or look up a value, but in order to implement any of those operations you need to have executed the module and found its set of top-level key/value pairs. A load statement will always perform at least one Get operation, and the execution of the statements after the load statement will require the correct value to have been "got". So how does it help?

@ashi009
Copy link
Author

ashi009 commented Mar 28, 2023

Here is how this trick plays out. Instead of loading top-level variables, I can pretend that they are loaded -- just unknown. In Bazel context, this won't hurt much -- as a rule rarely need a symbol to complete rule definition -- they might need a foreign symbol to complete its implementation.

func (ld *Loader) load(t *starlark.Thread, module string) (starlark.StringDictLike, error) {
	l, err := label.Parse(module)
	if err != nil {
		return nil, err
	}
	fsys, ok := ld.moduleRoots[l.Repo]
	if !ok {
		return unknownModule{l.String()}, nil
	}
...
}

type unknownModule struct{ name string }

func (m unknownModule) Keys() []string { return nil }
func (m unknownModule) Has(name string) bool {
	glog.Warningf("Unresolved symbol %q from %q", name, m.name)
	return true
}
func (m unknownModule) Get(name string) starlark.Value {
	return UnknownSymbol{}
}

I'm open to the idea of introducing Load2, but I'd argue that starlark-go is not hitting v1.0 yet, and perhaps a cleaner API is preferable?

@adonovan
Copy link
Collaborator

I still don't understand. If the point of the trick is to allow you to execute the file containing the load statement without knowing anything about the module that it loads, then what will happen when execution encounters a statement that causes Get to be called? It will create an UnknownSymbol value and then execution will go wrong. How is that useful?

If you don't care about the actual values in the module it's easy to fake one by creating a real StringDict whose keys are the names mentioned in the load statement and whose values are all UnknownSymbol.

@ashi009
Copy link
Author

ashi009 commented Mar 28, 2023

I still don't understand. If the point of the trick is to allow you to execute the file containing the load statement without knowing anything about the module that it loads, then what will happen when execution encounters a statement that causes Get to be called? It will create an UnknownSymbol value and then execution will go wrong. How is that useful?

Frist of all the UnknownSymbol is both callable and acts as a dict and will always return another UnknownSymbol for further invocations.

type UnknownSymbol struct{}

var (
	_ starlark.HasAttrs = UnknownSymbol{}
	_ starlark.Callable = UnknownSymbol{}
)

Second, generating docs only requires executing the part of code that declares rule, provider, and aspect. A ruleset rarely needs symbols from another repo to complete its rule definition. As long as you are not executing the rule's implementation function, nothing will fail.

This statement holds against a dozen of top rulesets.

If you don't care about the actual values in the module it's easy to fake one by creating a real StringDict whose keys are the names mentioned in the load statement and whose values are all UnknownSymbol.

Well, this is not easy to pull off with the existing high-level APIs:

The alternative is to use:

However, this requires no API change, and I'm willing to give it a shot.

@adonovan
Copy link
Collaborator

The alternative is to use:
https://pkg.go.dev/go.starlark.net/syntax#Parse to parse the source
https://pkg.go.dev/go.starlark.net/syntax#Walk to find the load statements and extract those names and create a StringDict
https://pkg.go.dev/go.starlark.net/starlark#FileProgram to load the parsed syntax.File
However, this requires no API change, and I'm willing to give it a shot.

Right, that's what I had in mind. I don't think you need Walk, BTW, since LoadStmts are all at top level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants