-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal/embedlite: a low level alternative to embed which can be imported by any std package #51463
Comments
I should note that, in theory, any runtime or syscall related package should be able to import this |
cc @rsc as the owner of |
I forgot to say: I'm happy to do the implementation as a separate CL if this sounds like a good idea. I imagine it shouldn't be too difficult, in terms of teaching the compiler about the new special package. |
This would also be very useful outside the compiler. Sometimes a string or byte array is all we need... |
@beoran not sure I understand what you mean; my use case is for If you mean it would also be useful outside the standard library, to not pull in |
Also, to throw an alternative idea out there: we could relax the rule that the I can't say whether this would be better or worse than |
Relaxing the requirement for the embed import, also outside of the standard library is a great idea. Many older tools similar to //go:embed would just generate strings or byte arrays and not require any special package imports. That is the use case I am thinking of. |
#43217 was declined at the time due to the concerns expressed in #43217 (comment). But, might it be worth revisiting that decision soon-ish, when as of 1.18 all supported Go versions support |
@antichris thanks for that link, I hadn't noticed that thread last year. I tend to agree with @bcmills's argument that requiring an import does solve multiple issues with old Go versions. I don't think we can call that edge case solved once enough Go versions have passed; there will always be some people trying to use ancient versions of Go, and in the case that there are no other build errors, we don't want So, to reiterate, my proposal is to either:
Either way, this proposal is not about third party packages using |
The more I think about it, the more I lean towards option 2 in my last comment: relax the |
Change https://go.dev/cl/391455 mentions this issue: |
Isn't the check for importing embed also used to skip some work if no embed is used in the file? Does option 2 have any performance impact on compiling the standard library or is the effect negligible? |
That's a good question, and one that I pondered as well. If you look at the CL above, you'll notice that the change to the compiler itself is just a boolean check, so it should have no impact. The other change is The worst case scenario for this second change is a Go file belonging to a standard library package which does not import If we are worried about the impact, the alternative is |
I'm actually running into So I'm thinking that my reasoning above with regards to performance is wrong, and we should go back to the original idea of |
I've now implemented |
Is there some other way we can do this? It bothers me that any analyzer that wants to understand embed (or at least understand package time) now has to know that "internal/embedlite" is the same as "embed". That's not a very internal detail at that point. |
Continuing my previous comment: For example, if staticcheck knew about embed, it would also need to know about internal/embedlite.
For what it's worth, it's not as clear to me. Today we say "some std packages can't import fmt", because we value having an acyclic dependency graph more than having fmt everywhere. It seems OK to say "some std packages can't import embed" for the same reason. The specific set of (non-internal) std packages is: errors, io, io/fs, math/bits, path, runtime, sort, sync, sync/atomic, syscall, time, unicode/utf8, unsafe. Of these, essentially all of these shouldn't have large embedded data. The only (partial) exception is time. Working around the problem in time seems better than creating a new name for embed that every analyzer needs to know. The current tzdata generator makes short strings and +'s them together, resulting in a parsed AST of depth 7,000. It is probably better to just use one very long string literal. It's not like diffs will be meaningful with shorter lines. @bcmills is going to look into that. (If the long literal is problematic, another option is +'ed together shorter strings but using parentheses to make the AST a balanced binary tree). Worst case we could special case the timetzdata build tag in some way in cmd/go. That would still be less overall impact to analyzers than internal/embedlite. |
If it's just for that small set, can it use |
Just for kicks I tried encoding the zipfile as a UTF-8 string with the bare minimum of escaping for a valid Go source file — specifically, encoding So that's not much of an advantage over just using |
@seankhliao, the problem is that the user code only imports |
Note that although the string constant in zipdata.go is in the form of a zip file, it is not compressed. So, yes, it has a lot of zero bytes. We could only compress the data if we wanted to implement some sort of decompression in |
And, to be clear, the string constant doesn't have to be in the form of a zip file at all. That was just convenient because we are starting with a zip file ($GOROOT/lib/time/zoneinfo.zip). But generate_zipdata.go could parse that zip file and produce any format we like. I think the main constraint is that the data has to take the form of a string constant, so that importing time/tzdata as a backup data source does not increase program startup time or memory usage. |
The nice thing about the uncompressed zip file is time not depending on a compression algorithm. :-) |
Another thing you could do is make cmd/dist build it as one of the generated z-files. Now that it's just 'printf %q' that seems entirely reasonable. |
You're right, I hadn't thought about that. I guess that also applies to my original idea to lift the restriction on importing
Funnily enough, that is exactly the approach I took for a different package in https://go-review.googlesource.com/c/go/+/380474. The difference there is that we want a I think Bryan's solution in https://go-review.googlesource.com/c/go/+/404435 is similarly OK as a fix for the build/reformat/tooling slowness. But quoting Bryan from the CL, it still leaves us with the two other wrinkles:
That seems like the best solution overall, then. There is still a code generation step of sorts, but it's not done manually by the developer, and its result is not committed into git, meaning there isn't duplication in the source distribution (even if there is still duplication in binary archives). |
Per my comment above, I'd be OK with either using a single long string and checking it into VCS (Bryan's suggestion) or teaching cmd/dist to generate that single long string (Russ's suggestion). In any case, I no longer think |
See https://go-review.googlesource.com/c/go/+/389834; this change uses
go:embed
intime/tzdata
to remove thousands of lines of code, which is great, but it also runs into an import cycle in the form oftime/tzdata -> embed -> io/fs -> context -> time -> time/tzdata
.I think it's clear that it would be a bad outcome to say "some std packages can't embed files". In this particular case it's a net benefit, and at the end of the day I just want to embed a file into a string var, so it's doable in practice. The old code did it by hand, anyway. The tzdata package doesn't need to import
io/fs
in practice.I propose that we add a package like
internal/embedlite
which allows using//go:embed
directives with[]byte
andstring
, and is hard-coded by the toolchain to be allowed for use in std, but it forbids the use of embedding intoio/fs.FS
- for those use cases, just use plain oldembed
.The package itself would be practically a placeholder, with no exposed API. It would also have no dependencies other than the bare minimum, the runtime. The toolchain would learn about
internal/reflectlite
, just like it knows aboutembed
, allowing the use of embedding into bytes of strings if either package is imported. So then,time/tzdata
could do:The text was updated successfully, but these errors were encountered: