Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec: new import syntax, and first class modules #111

Closed
alandonovan opened this issue Oct 2, 2020 · 8 comments
Closed

spec: new import syntax, and first class modules #111

alandonovan opened this issue Oct 2, 2020 · 8 comments

Comments

@alandonovan
Copy link
Contributor

alandonovan commented Oct 2, 2020

There are several minor problems with Starlark's current syntax for importing another module:

load("module.star", "name1", ..., local="name2")
  1. load is a reserved word, unnecessarily. Starlark already reserves import.
  2. The parens are unnecessary.
  3. The imported names must be quoted, even though they function as binding identifiers. (Neither of the two implementations of Starlark checks statically that the strings contain valid identifiers.)
  4. It is unfamiliar to Python programmers.

The motivation for all these suboptimal choices was to permit Starlark programs to be executed by an unmodified Python interpreter, in which 'load' has been defined as a function. I don't think this is a compelling reason to keep it.

I propose we add a new from/import syntax to Starlark---essentially that of Python, but with quotes around the module name---and gradually eliminate the 'load' syntax. These forms would be temporarily be equivalent:

load("module.star", "name1", "name2", name4="name3")
from "module.star" import name1, name2, name3 as name4

Furthermore, I propose that we support modules as first-class values. That is, both of these statements would import the module of file "dir/name.v2.star" and bind it to the local name "name".

import "dir/name.v2.star"
import "dir/name.v2.star" as name

In the first case, the local name is derived from the last segment of the module name ("name.v2.star"), minus any dotted suffixes (".v2.star").

Imports would be remain subject to the same restriction as load statements of appearing only at top-level, and would continue to bind names in the file-local lexical block.

Migration considerations

  • Adding support for both syntaxes is straightforward. Some refactoring tools would need to be taught to look for both variants.
  • Google's legacy Python3-based testing tool would need to statically transform import statements into some valid Python syntax (e.g. today's load statements) before providing the result to the interpreter.
  • Buildifier could replace loads with imports when it encounters them.
  • Google's code base would require a large-scale cleanup.
@stepancheg
Copy link
Contributor

The motivation for all these suboptimal choices was to permit Starlark programs to be executed by an unmodified Python interpreter, in which 'load' has been defined as a function. I don't think this is a compelling reason to keep it.

Compatibility with Python is quite a useful feature.

  • For example, once we did a large scale migration of BUCK files following some change in Buck, and we used some Python-AST library to automate it (we could use buildifier, but for some reasons, Python library was more convenient)
  • Another application is syntax highlighting (in editors, highlighting libraries) currently can be simply done with python tools, and with this proposal, the separate syntax handler is required (e. g. pygments)

@murali42
Copy link

murali42 commented Oct 7, 2020

Google's legacy Python3-based testing tool would need to statically transform import statements into some valid Python syntax (e.g. today's load statements) before providing the result to the interpreter.

That would mean starlark is no longer a syntactic subset of python, which would be a big deviation from status quo?

@alandonovan
Copy link
Contributor Author

One could easily write a Python function that scans a file for import "foo" and translates it to load("foo") so that existing Python-based syntax tools continue to work. I imagine most highlighting tools would do the right thing without modification, if they treat both dotted.path and "dotted.path" as ordinary expressions.

@maxmcd
Copy link

maxmcd commented Oct 21, 2020

I have always liked the load() syntax because it communicates that the import behavior is a departure from python's import behavior. It's also very nice that the first argument is a string, as situations like this have always seemed kind of silly: https://stackoverflow.com/questions/8350853/how-to-import-module-when-module-name-has-a-dash-or-hyphen-in-it

I wonder what you do with import "dir/other-name.v2.star" though? Throw an error and require an explicit alias?

I was personally motivated to move towards the "only import modules" functionality to limit the import search space so that there's room for a tool that's similar to goimports, so it's nice to see this discussion and the possible change.

@murali42
Copy link

Translating import "foo" into load("foo") would require a tricky and complex regex

  • need to deal with different types of quotes
  • might need to ignore these if they appear inside multi-line string literals or unbalanced string literals inside comments...

More importantly, the benefits don't seem to justify deviating from the current invariant "all starlark code is syntactically valid python3"

@pcj
Copy link
Member

pcj commented Oct 26, 2020

Personally I prefer the proposed import, it always seemed a bit strange that the identifiers were strings instead of symbols. I'm OK with starlark diverging from being a strict subset of python, because ultimately, it is quite different. My 0.02.

@emcfarlane
Copy link

emcfarlane commented Aug 8, 2022

Could this proposal be split into a smaller scope by first making the changes to the load statement to support first class module loading?

Define the local namespaced module module based on filepath name:

load("module.star")

Define the module name based on the kwarg name.

load(name = "module.star")

Using kwargs could avoid issues with translating filenames to identifiers and gives the ability to rename it.

Another option to support building top level modules would be to return the module from the load statement:

module = load("module.star")

which would define a global scoped module module.

@nemith
Copy link

nemith commented Dec 13, 2022

My largest gripe with load has been able to load the entire module. I have found a bit of a workaround that works at least in the Go implementation of starlark with the existing load statement syntax.

In addition to the to the normal values being exported via the load statement you could export a custom well-known identifier to represent the module itself with a special starlark value where all the identifiers in the modules are actually attributes. (i.e just how starlarkstruct.Modulealready works in starlark-go)

For naming this well known identifier I have chosen @module. The @ prefix means that it is not importable without specifying a name to assign it to. It also makes sure it will never collide with real identifiers in the loaded file/module.

So you end up with statements like

load("mymodule.star", mymodule="@module")
mymodule.foo()

A quick implementation for using built-ins could be

func loadModule(module *starlarkstruct.Module) starlark.StringDict {
	new := starlark.StringDict{
		"@module": module,
	}
	for k, v := range module.Members {
		new[k] = v
	}
	return new
}

You could do something similar for globals coming from execed files as well (wrapping them in a starlarkstruct.Module first for the special @module identifier).

However being that with the @ this isn't a valid in an identifier this approach may fail on other implementations.

CC: @carlverge

stepancheg added a commit to stepancheg/starlark that referenced this issue Jul 25, 2023
State that Starlark syntax (but not semantic) is a strict subset
of Python, and this property will be maintained in the future
iterations of Starlark.

Context: users have asked what tools they can use to work with
starlark sources, to implementing codemods, linters, code formatters,
syntax highlighters etc.

One possible option is to recommend Python tools when AST is needed,
for example, Python's builtin `ast.parse`. However, to make that
recommendation correct, we should guarantee that Starlark syntax
(but not semantics) will stay strict subset of Python: so each valid
Starlark program could be parsed as Python, and tools won't need
to be heaviy rewritten when Starlark language changes.

This PR automatically closes
bazelbuild#111
as it contravenes this newly established requirement.
stepancheg added a commit to stepancheg/starlark that referenced this issue Jul 25, 2023
State that Starlark syntax (but not semantic) is a strict subset
of Python, and this property will be maintained in the future
iterations of Starlark.

Context: users have asked what tools they can use to work with
starlark sources to implement codemods, linters, code formatters,
syntax highlighters, documentation generators etc.

One possible option is to recommend Python tools when AST is needed,
for example, Python's builtin `ast.parse`. However, to make that
recommendation correct, we should guarantee that Starlark syntax
(but not semantics) will stay strict subset of Python: so each valid
Starlark program could be parsed as Python, and tools won't need
to be heaviy rewritten when Starlark language changes.

This PR automatically closes
bazelbuild#111
as it contravenes this newly established requirement.
stepancheg added a commit to stepancheg/starlark that referenced this issue Jul 26, 2023
State that Starlark syntax (but not semantic) is a strict subset
of Python, and this property will be maintained in the future
iterations of Starlark.

Context: users have asked what tools they can use to work with
starlark sources to implement codemods, linters, code formatters,
syntax highlighters, documentation generators etc.

One possible option is to recommend Python tools when AST is needed,
for example, Python's builtin `ast.parse`. However, to make that
recommendation correct, we should guarantee that Starlark syntax
(but not semantics) will stay strict subset of Python: so each valid
Starlark program could be parsed as Python, and tools won't need
to be heaviy rewritten when Starlark language changes.

This PR automatically closes
bazelbuild#111
as it contravenes this newly established requirement.
laurentlb pushed a commit to stepancheg/starlark that referenced this issue Aug 7, 2023
State that Starlark syntax (but not semantic) is a strict subset
of Python, and this property will be maintained in the future
iterations of Starlark.

Context: users have asked what tools they can use to work with
starlark sources to implement codemods, linters, code formatters,
syntax highlighters, documentation generators etc.

One possible option is to recommend Python tools when AST is needed,
for example, Python's builtin `ast.parse`. However, to make that
recommendation correct, we should guarantee that Starlark syntax
(but not semantics) will stay strict subset of Python: so each valid
Starlark program could be parsed as Python, and tools won't need
to be heaviy rewritten when Starlark language changes.

This PR automatically closes
bazelbuild#111
as it contravenes this newly established requirement.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants