-
-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use an intermediate representation format for GDScript #8605
Comments
Great idea! I think this would also be really useful in debugging issues when working on the engine. For example, in working on GDExtension issues with ptrcalls, I really wished I could have seen the GDScript bytecode in my test scripts, so I could tell which functions calls were actually being emitted as ptrcalls, and which weren't, because it was sometimes difficult to tell which it would actually do just from looking at the source code. (Side note: GDScript no longer makes ptrcalls, but I'm sure some other similar issue could come up in the future.) |
Cool idea :) My main concern is this point you highlighted:
It would be nice if the IR was involved in the normal compilation pipeline so that it would be impossible for the representation to be unfaithful. I'm not sure how that would work exactly, but I know there are compilers out there that transform to an IR as a step before generating the final machine code. |
I have to admit AOT compilation is quite exciting... |
A good balance with IR and AOT to me would be important, as I feel there's a risk that if we rely on AOT too much for performance in exported projects we can run into making debugging and scene testing difficult and laborious. With steps of optimization done on IR ahead of AOT (or without it, for example when running the project in the editor) you still gain some degree of performance improvements, but with just AOT and simple optimizations you can get a major difference between the performance in testing and in export, forcing projects which push the boundaries on performance to re-export every time they want to test even if they're not interested in specifically testing export level performance. I don't see IR and AOT as mutually exclusive, quite the opposite I find it a good step to improve it For one having optimization on IR allows us to rely less on competent optimization for AOT, allowing us to use a far simpler bare bones compiler that we can even bundle in the engine, which would greatly help users who are daunted by setting up a compiling environment, especially a cross-compiling one, we can then allow using an external, more competent, compiler for those who set it up |
This would also allow us to filter out blocks like |
@AThousandShips IR itself does not imply any kind of code optimization. While I do think GDScript would benefit greatly from optimization passes, IR is not a requirement for that. |
I agree, didn't say it does, but it's a useful tool for it, it allows more manageable optimization than machine code, and allows doing it on the exported code, having persistent optimized code, avoiding having to do that every time the source is parsed Machine code also makes things a lot harder to grasp, with jumps and similar, as opposed to a structured data format more coherently, and the more manageable mutability of it As contrasted with AOT for runtime improvements when running from editor, etc. So yes, I'm well aware thank you 🙃, and thought the aspects specific to IR Vs the non-persistent machine code was obvious as the point of my comments |
@AThousandShips oh if we go as far as to treat that metaprogrammy, i'd rather not rely on what looks like a runtime function call tho ( how about some fancy decorators like like eg this: @tool
extends Node
@runtime var a = 5
@runtime func _ready():
print("Hello, World!")
func _process(_delta):
@editor:
_handle_my_gizmos()
@runtime:
a += 1
@editor func _some_callback_for_a_plugin():
# do some expensive stuff that doesn't need to go into the final product where the editor would load this: extends Node
func _process(_delta):
_handle_my_gizmos()
func _some_callback_for_a_plugin():
# do some expensive stuff that doesn't need to go into the final product while the runtime would load this: extends Node
var a = 5
func _ready():
print("Hello, World!")
func _process(_delta):
a += 1 but that feels more like a discussion for an additional/followup proposal. just wanted to give my 2 cents before gdscript learns to magically remove anything mentioning that engine hint :P unless of course you are talking about introducing |
What you're suggesting is entirely unrelated to this proposal, but it has been similarly proposed in the past already: |
This technique also opens up another way to wire up simpler interface registrations exclusively for any scripting languages that don't need string->address methods to recognise native interfaces (notably, GDExtension) and GDScript will be a great candidate. This also helps greatly for export binaries that don't need them, thus helps in their size significantly especially in platforms where binary size matters, such as HTML5. String labels still exist in the editor and GDExtension because without them it's impossible for GDScript language server to recognise and compile them, but in the GDScript-only release they will be removed. |
@SysError99 not sure what "string labels" you're referring to. If it's about class and function names, this wouldn't be able to remove them. The simplest example to show why they are necessary is any dynamic call: extends Node
func _ready():
$SomeNode.rotate(PI) In this case the |
50% related and 50% unrelated question: This script is attached to a certain node and this script is referencing a different node using a path The initial statement is still correct, a better way to illustrate it though: extends Node
func foo() -> void:
get_child(0).rotate(PI) Here it's 100% impossible to know the type of the first child node Also, as someone who doesn't know a lot about inner workings of gdscript compiler + VM I genuinely wonder:
|
No, because the script does not know to which scene it's attached to and the same script could be attached to multiple scenes with different trees.
The thing is that it still needs to know what to call. To do so it needs to request the function from the ClassDB, which is done via string. This is cached when the GDScript is compiled if it is known, so it doesn't need to request at every call, but since it's a pointer it cannot be serialized. This will require the IR to still keep the names and request the pointers when compiling to proper bytecode, meaning the export template still needs the names.
Again no for the same reasons of the previous point. We could potentially remove strings by replacing them with indices by putting the information in an array instead of a map, assuming those indices are known at compile time. This would require an overall refactor of core code and would break all GDExtensions. The main issue with this is making sure that the functions are never reordered, as this would break compatibility (there might be ways to validate this automatically, but it's one extra burden for contributors). This cannot be done effectively because GDScript is still mainly a dynamically typed language. It can't really know the index in advance in most cases, so it has to request via strings and those would have to be present on the export template anyway. Also note that the engine is not compiled on export, those are distributed pre-compiled (export templates). So we cannot strip the strings from this compiled binary, even we were to extract the subset of the used types. It would require recompilation of the template itself. |
@vnen Essentially, it's not a direct reference to those calls like statically typed programming languages, but rather using a much shorter form of label (in this case, a number) instead of string. In this implementation, all known strings will have a central index that acts like a string map instead of using full bytes of string in compile time. Let's put these in the editor's executable, we have four strings in common that's used in the GDScript language server:
After it is being converted during compile time, these become just an index. We will use
When the script gets "transpiled" in the release, they will instead use these indexes instead of strings, hence the reason why strings are not required in the release. The serious limitation of this implementation is that we still need "some" strings for Godot's side, because it's virtually impossible to remap strings back to the much shorter form (an index number). Plus, with this implementation, it breaks all string-based wirings in the script, and so many functions need them, given that native calls aren't |
Nothing has been done. It's all theoretical right now. |
Describe the project you are working on
The GDScript implementation.
Describe the problem or limitation you are having in your project
GDScript currently is compiled when loaded, even in a release build. There are a few problems with this approach:
Describe the feature / enhancement and how it helps to overcome the problem or limitation
An intermediate representation (IR for short) is able to help solving those issues.
It also allows to make an export template without the GDScript compiler, which can reduce in size and avoid potential exploits. This is optional, so people who use the compiler at release for dynamic scripts and modding support can still have it the way it is now (or a mix of the two).
There are a few potential drawbacks from this as well:
Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams
Currently, GDScript is compiled to a bytecode which is later executed by the VM. This bytecode is not suitable for serialization, primarily because it contain a lot of pointers. Since the memory layout will likely be different when the executable runs again (especially in different machines), those pointers cannot be stored.
The plan is to include in the IR named references that can be reconstructed into the pointers. This includes global classes and function pointers which are used in the GDScript VM for fast access.
For each script when the project is exported, the process will go as follows:
.gdc
extension, in the same place the.gd
file is..gd
file and the remap will find the.gdc
..gdc
file will be exported, the.gd
will not.For loading, the
.gdc
file will be read and put to another code generator. This one will be very simple as it will be just a matter of converting instructions from the IR into bytecode (which will follow a similar structure), including resolving the all the pointers.IR format
While I haven't yet fleshed out the format exactly (as I believe it's easier to do while implementing it), it will be somewhat like this:
GDIR
)._ready
code for the@onready
feature).Things that can be accessed via index (like own properties, local variables, and function arguments) won't have a name associated to it stored in the data section and will use the index directly.
The instructions will have a similar structure to the bytecode. They'll have an opcode and a number of arguments. The arguments are encoded as "addresses" which can be either the regular bytecode addresses or special ones for the IR (such as getting the value of constant or a function pointer). There is no break between instructions since they will have a predictable length. All of this is stored as bytes which, if opened in a text editor, or even a hex editor, won't have anything recognizable beyond the data section.
If this enhancement will not be used often, can it be worked around with a few lines of script?
It will be used in almost every exported project, as it brings benefits to pretty much all of the cases.
Is there a reason why this should be core and not an add-on in the asset library?
It is a core part of GDScript and is not project specific, since it will be used by pretty much all projects.
The text was updated successfully, but these errors were encountered: