Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive size of generated Swift code #1204

Open
BalestraPatrick opened this issue Jan 11, 2022 · 8 comments
Open

Excessive size of generated Swift code #1204

BalestraPatrick opened this issue Jan 11, 2022 · 8 comments

Comments

@BalestraPatrick
Copy link

Hello!

Many parts of our codebase use SwiftProtobuf. Recently we started tracking app size in a more accurate manner and we noticed a trend that is pretty worrying for us. Generated Swift protos code increase our app size a lot. Recently, we removed a single proto file that was about 400 LoC which contained about 70 message definitions (including the various transitive imports) and the generated code was about 5KLoC. 304KB of our app size was attributed to symbols coming from the generated Swift Protobuf code.

We are building with SWIFT_OPTIMIZATION_LEVEL = -Osize in release mode but I wonder if there are other ways to reduce the size of the generated Swift code.

I can't exactly share my full proto, but I was wondering if this is a known issue with Swift or maybe there are ways to reduce the impact of the generated code. Does anyone have experience with this particular issue?

@tbkka
Copy link
Collaborator

tbkka commented Jan 11, 2022

Code size is a common concern with code-generated approaches such as this. Protobuf implementations for some other languages rely heavily on reflection which makes them smaller but significantly slower.

If you're only using the binary encoding, it should be easy to strip out the field names and other content that's only there to support JSON and TextFormat encoding. Right now, I think this would require a small change to the code generator, but I've long been interested in emitting that content as separate .swift sources that contain only those extensions. It would then be easy to delete those files. (Alternately, we could consider splitting the JSON and TextFormat support into a separate generator.) You could also look critically at whether there are other parts of the generated code that you might omit: For example, the generated == implementations are somewhat bulky and may not be needed in your application.

@thomasvl
Copy link
Collaborator

fyi - #18 is open for tracking splitting out the textual support.

@dflems
Copy link
Contributor

dflems commented Jan 12, 2022

Wrote a little wrapper to patch the generated swift source code to remove conformance to SwiftProtobuf._ProtoNameProviding, which seems to have shaved off about 10% of the total binary size of the generated Swift protobuf in our app (according to the linkmap). Would be nice for this to be an option in the generator for sure!

I briefly looked into removing == as well but _MessageImplementationBase is Hashable so it needs an implementation of it or a change to the runtime.

update: Turns out we're using JSON encoding/decoding a little bit in the codebase and can't merge this, sadly

@allevato
Copy link
Collaborator

allevato commented Jan 12, 2022

Another improvement I wanted to look at in this area to reduce the amount of code generation was to make serialization and other related functionality (hashing, equatability) table-driven. Unfortunately, the only way to get static arrays of constant data into a data segment is through a SIL transform that only runs on optimized builds, and even when that transform applies is very unpredictable. If it isn't applied, then we'd end up generating code that heap-allocates those arrays and populates them element-by-element, and that code would run the first time a particular message is serialized, parsed, equality-tested, or hashed, which would make client code performance unpredictable in ways that we should probably avoid*.

* To be fair, this is already happening with the name tables we generate for text/JSON serialization, but that's restricted to a much smaller set of serialization operations that are expected to be less efficient than binary format.

thomasvl added a commit to thomasvl/swift-protobuf that referenced this issue Mar 7, 2022
thomasvl added a commit to thomasvl/swift-protobuf that referenced this issue Mar 7, 2022
The presence can be checked with `isEmpty`on the Array, and they can be cleared
by assigning to `[]`.

Fixes apple#944
Helps with apple#1204
thomasvl added a commit that referenced this issue Apr 5, 2022
The presence can be checked with `isEmpty`on the Array, and they can be cleared
by assigning to `[]`.

Fixes #944
Helps with #1204
@cprovatas
Copy link

What if there was a option to opt-in to only one serialization mechanism? Say a client only needs binary encoding / decoding? Would that make any difference in the size of the generated code?

@tbkka
Copy link
Collaborator

tbkka commented Jun 1, 2022

The idea of having an opt-in is a good one, and it's something we've discussed on many occasions. It would certainly make some difference, though someone would have to actually try it and measure to figure out how much savings. But the detailed design is tricky:

  • Is there a good way to factor/subset the support library?
  • This could be done with generation options, but we'd need a good way to test every combination to make sure everything still works.
  • We've also considered the possibility of emitting different serialization support as Swift "extensions" in separate source files (E.g., "MyProto-TextFormatSupport.swift") so people can simply delete capabilities that don't interest them by deleting the associated files.

At this point, I would say that we have lots of good ideas; we really need some folks to actually try implementing some of these ideas and see how well they work out.

@thomasvl
Copy link
Collaborator

thomasvl commented Jun 7, 2022

#1240 has a draft of some work I did to split the generated code into what is needed for the just binary, and then extra files needed for the textual formats.

Since a Visitor/Decoder pattern is used by the library, there isn't a lot of code specific to the formats. At the moment, the file numbers and binary encoding information is part of the base generated code, as that's a very small amount of data. The textual support then layers on the needed mapping between field numbers and the names. Since the JSON names can mainly be derived from the TextFormat names; most cases, it means we just need one string and a marker saying we can derive the other one. Splitting that in two completely different things could result in even larger code when folks need both since we'd potential be more verbose instead of allowing things to be derived.

One thing #1240 doesn't yet take on is splitting up the core runtime library so if you don't need the textual formats, you don't have to link that backing code. No effort as been done to see how much that might save/etc. Using that PR as a starting point would likely make some sense to start getting more clarity into what the potential savings would be.

@acecilia
Copy link

acecilia commented Mar 19, 2024

👋 Related with the size of the generated code, the size of the SwiftProtobuf SDK itself is also considerable: 1.4MB for latest version 1.25.2 (this is the size of the binary built statically inside a production app - measured using linkmap).

Adding this comment here with the size information just for context

Screenshot 2024-03-19 at 22 34 31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants