-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: spec: Go 2: allow manual control over imported package initialization #48174
Comments
init()
for imports to avoid explosive package initialisationinit()
for imports to avoid explosive package initialisation
init()
for imports to avoid explosive package initialisation
Would this cascade to all dependencies modules imported as As a library author, I'm not sure I want to field support from people having bugs that turn out related to their own fault, not allowing my As an application author, I appreciate this problem completely. To the point of finding the worst offenders (in my use cases) and getting those libraries to switch from init(), to instead having functions that depend on initialization call that initialization. But this is a game of whack-a-mole, and does depend on the participation of those dependencies to change, so I can appreciate the proposal. I just don't know whether to 👍 or 👎 . |
Were this proposal to be accepted, a key feature would need to be that the guarantees around
I think it would need to in order to be useful. |
In short, yes. Currently, when initialising a package, we have some internal init function that does 3 things, in this order:
I'm just suggesting that we omit any
I see no reason why it should be.
As I've described it here, the packages must be imported using
Definitely 👍 |
While I do like the idea, I noticed the Another possibility if we did want this as a language change (ie if this is going to be commonly used, which it doesn't seem like it will be) is that we could use the |
I tagged it as a language change as package initialization order is part of the spec and I didn't see enough room for implementations to defer running init. I don't think the guarantees are strong enough, init() can have side effects such as writing a file or registering with a common package and this will break those |
This seems like treating the symptom rather than the cause, i.e. libraries abusing init(), and it will give authors of these low quality libraries an excuse to continue doing so. |
@fzipp Indeed. If a library has to do expensive initialization, then in most cases it is perfectly possible to provide an Init for the package which the user then has to call themselves before using said library. The use of init is only strictly necessary for the automatically generated init() for initialization of variables, and for ensuring that the initialization runs on the primary thread of the program for GUI or game libraries. All other libs should provide an Init or better, NewXyzzy for the Xyzzy they provide. |
I don't think this is as easy as tagging them as "low quality" libraries. Some of the worse offenders are high-profile things like prometheus, gRPC and generated protobuf code which I generally think are good libraries, they obviously just haven't optimised for this (to be fair, I imagine the majority of their use cases don't treat them as optional and aren't so worried about time-to-main). I'm also a bit dubious about the ability to get them all to change on this basis (and not regress again later). A nice property of this proposal is that it's up to the consumer of the library whether they want to pay the price upfront or later.
Sure, that's a breaking change for any of these cases that don't have or require those functions at present though. |
Indeed, the biggest contributor in my case is initialisation of maps in generated protobuf packages, not low quality libraries with expensive |
This seems to me like fixing the problem in the wrong place. If the problem is slow initialization of maps in generated protobuf packages, then let's fix that. If the problem is packages that have slow Providing a mechanism for deferred initialization of some packages, and requiring the program to explicitly initialize them at runtime, seems to me like a recipe for subtle bugs. If the code logic is slightly wrong, then the package will be invoked with no initialization. This could be bad in any number of ways, but there is no simple way to avoid it. |
This is clearly the better option but seems infeasible to me. Perhaps I'm aiming for unrealistic startup performance, but I don't think we can optimise package initialisation to the point where it's no longer a blocker for use cases like the one I've described. At some point of scale, there are simply too many packages. As it stands, my import graph takes around 15ms +-5ms to initialise and allocates 2.2mb of data. When I spin these up to try and sandbox my build tasks, the terminal gets very choppy presumably as each process contends for resources.
I was hoping that problem had already been solved for the plugin API. That seems to load in packages from |
See also #38450. |
I don't quite see the analogy to plugins. When you open a plugin, all the packages in the plugin are initialized at that point. There is no deferred initialization. If you want to carry that idea over to this one, then I think we would have to some sort of static plugin, in which the plugin opens an import that is already in the program. That would take the place of the normal explicit import. That is, you wouldn't write I'm not sure that is a great idea but it would at least be safe, albeit hard to use. This proposal is not safe. You should take a look at #38450 and see whether that will solve your problem. |
Thanks Ian. Appreciate you hearing me out.
Thanks probably my ignorance. The plugin API would allow me to "import" a package based on some condition. This is what i meant by deferred: packages are initialised after
Yeah, okay. I think what you've described there is similar what I had in mind with this proposal. I assume there's some reason we can't just treat a normal package as a "local plugin", and avoid additional flags or build modes to
Perhaps we could look at the imports in the program, and see if a package is imported with
If I understand correctly, no. This only works statically. My condition is based on args passed to the program so can't be optimised out by the compiler. |
Based on the discussion above, this is a likely decline. Leaving open for four weeks for final comments. |
No further comments. |
The proposal
As a go program grows, it can hit a critical point where the number of packages causes package initialisation to become increasingly expensive. As most of these packages tend to be third party, authors have little control over this. At some point, package initialisation can become prohibitive for use cases that are sensitive to startup performance. In short, I propose that we introduce a mechanism to defer initialisation of imported packages until they are needed. Borrowing from the plugin design, perhaps something like this could work:
In this example, the initialisation of the
core
andutil
packages is deferred until we actually intend to use those packages. The packages must be imported with_
to avoid using an uninitialised package.**Libraries like
gocloud.dev
that import various SDKs might find this feature especially useful. For example, importinggocloud.dev/blob
has some quite expensive initialisation:If we could defer this until we call some code that actually needs tracing, we can improve startup performance a lot.
** This might not be ideal. We might want to refer to symbols e.g. types in the package even if it's not initialised yet, otherwise we won't be able to cast the symbols returned from
Lookup()
. Client side interfaces could certainly help here though.Feasibility
The
//go:deferred
comment here tells the runtime not to initialise that package on importing it. Because we still import the package, this shouldn't cause any problems for thebuild.Import()
algorithm, or the compiler/linker.The
runtime.InitDeferredImport(string)
function can then be used to initialise that package, and anything it imports. It seems we already have the ability to defer package initialisation for plugins. This function will likely follow the same code path asplugin.Open()
, minus thedlopen()
stuff, and return something akin to a*plugin.Plugin
that can be used to lookup the package symbols.It feels like there's a lot of prior art indicating this is possible, but perhaps somebody from the core team has a better idea?
Further motivation
At a certain point of scale, initialisation can seriously hinder startup performance. There are some efforts to incrementally improve this by making changes to the SDK and other libraries, however these only mitigate the problem. At some point, without a way to defer imports, library authors and golang developers have limited ability to manage the initialisation cost as the package graph balloons in size.
My particular use-case is motivated by setting up linux namespaces to improve isolation for sub-processes we spawn in our build system. We spawn processes rapidly, and in parallel so startup performance is very important. Ideally, we'd be able to just
clone()
to a new thread, set up our network and mounts, and then exec the sub-process. I understand uncontrolled forking like this isn't safe within the golang runtime, so we've had to think of a different approach.We've taken a page out of docker's book, and are trying to re-exec the current binary to do some setup in the new namespace, before finally fork/exec'ing once more to create the sub-process. As it stands, package initialisation is prohibitively expensive for this approach to work. We allocate about 2.1mb of data on startup, and spends 10s of ms doing this. Because this is all happening in parallel, we get a lot of resource contention trying to do all these allocations, which has a big enough impact that this approach isn't feasible.
The actual code that sets up the namespace has no dependencies so needs to initialise only a very small set of packages. Because there's no mechanism to defer this initialisation, we actually end up initialising 100s of packages.
Additionally, the most expensive initialisation comes from the remote execution API , which is only used when building remotely. Ideally we'd only initialise this when configured to do so. Only around 100kb of this allocation comes from our code. The rest is from third party libraries to which we have little control.
The text was updated successfully, but these errors were encountered: