Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve build times #96

Closed
wants to merge 5 commits into from
Closed

Improve build times #96

wants to merge 5 commits into from

Conversation

MahdiBM
Copy link
Contributor

@MahdiBM MahdiBM commented Jul 1, 2023

Motivation

We in Vapor want to replace the old Github release bot we have with a new AWS lambda function for better integrations with the Discord server and generally cleaning things up, etc... (The PR related to this PR: vapor/penny-bot#61)

To make this happen I took a look and luckily found out Github does provide OpenAPI spec files, which although not perfectly in line with the actual API, they can still help a lot.

Next step was to get this package aka swift-openapi-generator working ... which wasn't hard to set up thanks to the useful error messages that the package shows e.g. "you need to have a openapi.yml/json/yaml file", etc..

But things went downhill from there ...
First of all the Github spec file is massive ... 7/8MB of yaml or 35MB of json ... I thought I could probably get away with letting the generator generate everything, and then just use what i actually need, but turned out the process takes 8+ minutes, which is much more than what i'm willing to wait for the generator.

So I tried to only let the generator generate what I actually need, and commented out 97% of the Github spec file.
This did result in much better build times. The build time went from ~8 minutes to ~68s, which is much better.
Though this is still decently slower than what i would have liked ...

(One other problem unrelated to this package is that it seems SPM build plugins are still not that "clean" basically ... I've for once in a loong time had to resort to the good ol' way of deleting DerivedData to get the builds rolling, which luckily i know why those build errors happen so i can at least try not to trigger them.)

After looking at Xcode's build timeline, i noticed there is a lot of time spent for compiling SwiftSyntax:
build-worst
(All builds are clean builds)

So long story short, i removed the 1 little place that used SwiftFormat (which uses SwiftSyntax), and got rid of SwiftSyntax and SwiftFormat altogether. The build time decreased by ~20s:
build-mid

For the reference, this is a normal build with all types manually generated, which only takes ~28s (to be fair, the generator probably does generate like 30% more types than what i manually generated, which is my fault):
build-best

Modifications

What did it cost?

Basically only an extension on top of String, which tries to format a string (String.swiftFormatted throws).
(And also a few other functions/properties that used it)

I think that extension is not worth this amount of build time and if we really want it, we can implement something so we format the string ourselves.

Result

Build time decreased from 68s -> 47s.
Without oapi-generator and the manually generated types, I can see the build time only taking ~25s, In which case we can assume the generator-related build time went from ~42s -> ~22s, which is a ~50% build time improvement for this package.

P.S. swift-syntax was declared as a dependency of this package, but it wasn't directly used by the package anywhere ... only SwiftFormat was indirectly using it, so i don't think we needed to have it as a dependency in Package.swift anyway.

@czechboy0
Copy link
Contributor

Hi @MahdiBM,

both @simonjbeaumont and I are currently on vacation, so let us get back to you on this once we're back.

I just wanted to reply now to make you aware of an alternative way to use the generator. It's more manual, but allows you to build the generator and generate your code once and check it into your repo. Combined with the reduced OpenAPI doc, I believe the resulting compile times should be very reasonable.

https://swiftpackageindex.com/apple/swift-openapi-generator/0.1.3/documentation/swift-openapi-generator/manually-invoking-the-generator-cli

@MahdiBM
Copy link
Contributor Author

MahdiBM commented Jul 1, 2023

Thanks @czechboy0 take your time :)

What instantly crossed my mind is that the CLI tool can be a command plugin for us to manually trigger ... I'll try to play around with that.

@MahdiBM
Copy link
Contributor Author

MahdiBM commented Jul 4, 2023

Playing more with the library, I realize swift-format was being used in tests as well, which i wasn't paying attention to.
I could try to re-enable tests to use swift-format if that's desired.
There is the problem that it'll pull swift-format/swift-syntax, which i'm not sure if it's a good idea considering they'll only be used in tests.

@czechboy0
Copy link
Contributor

czechboy0 commented Jul 11, 2023

Hi @MahdiBM – swift-format and swift-syntax are pretty important building blocks of Swift OpenAPI Generator, so I don't think it's likely we'd want to drop them as dependencies.

However, you bring up a great point - the overall compile time, which includes (1) building the generator, (2) running the generator to generate the user code, and then (3) building the generated user code. I think these three are worth discussing separately, and to improve each, we might need to employ different solutions.

In your case, which of the three is the most important one affecting your workflow? I ask because, for example, locally, when incremental builds are enabled, even if the initial build of the generator takes a bit long, it's only done once, so your main issue could be the time it takes the generator to execute on a large OpenAPI doc. In turn, in CI, even on a small OpenAPI doc, you might dislike the build time of the generator itself, because CI generally performs clean builds.

Before I break this out into individual issues, I want to understand which of the above you would want to see fixed first.

Thanks!

@MahdiBM
Copy link
Contributor Author

MahdiBM commented Jul 11, 2023

(1) building the generator, (2) running the generator to generate the user code, and then (3) building the generated user code.
I think all are very important. If i were to rank them, i would say 2, 3, 1 in descending order.

2 is important because it is more closely related to the developer experience.

Then 3, because the generated code makes the CI realllly slow (imagine 2 min -> 24 min compile time with -c release). A few hours ago even @gwynne made an issue about it: swiftlang/swift#67214. This does seem to be a Swift problem on linux, but it's still slow enough even on an M1 mac, and i see a lot of build-time optimization opportunities in the generated code.
For example there are a lot of redundant no-use empty structs like "Headers", and generally a lot of the generated code could be generated in a factored-out manner, meaning if a lot of places are using the same exact struct, then maybe that struct should only be generated once under a "Shared" namespace, then all places could use that. I can also think of better/less usage of enums, but that one could be just my preference (which probably does make the generator compile faster).

1 is less important because it won't scale much like with 2 and 3. The generator might take like 20 more seconds to build, but that's it.

Still i think all 3 are valid problems for the library and should be optimized more.

@gwynne
Copy link

gwynne commented Jul 11, 2023

swift-format and swift-syntax are pretty important building blocks of Swift OpenAPI Generator, so I don't think it's likely we'd want to drop them as dependencies.

@czechboy0 Can you elaborate on why swift-format is considered critical? Notwithstanding Swift macros (which are more metaprogramming than codegen anyhow), it's been my experience that "easily readable output" ends up somewhere near the bottom of the list of features considered "critical" in a code generator. (To cite a few examples, there's m4(1), cpp(1), lex/flex(1), yacc/bison(1), protobuf's protoc, Swift's own gyb...) Considering a generated API can be browsed through its .swiftinterface, or for that matter as a DocC-generated API reference, I would have expected formatting to be a very minor concern relative to performance.

(P.S.: Obviously I've run with the assumption that swift-format's use in this package is to make the generated code readable; if that's not the case, I apologize for belaboring the point 😅. Either way I'm very interested in the reasoning behind it!)

@czechboy0
Copy link
Contributor

Can you elaborate on why swift-format is considered critical?

It's important, possibly not critical, but the reason is getting deterministic Swift output that passes a linter (swift-format, in this particular example), and produces reasonable diffs when it changes.

The generator deliberately does no work on adding the appropriate amount of whitespace, and we completely rely on the formatter to output reasonably looking code. If we remove swift-format, we'll have to bloat the generator logic again just to maintain some reasonable diff-ability and readability. Plus, our tests also rely on output diffing for assertions, so not being able to perform integration tests that way, and random whitespace changes breaking our tests is not a place we want to be in.

So the TL;DR: to properly decouple generating Swift code from prettifying Swift code, we need the former to be able to not care about whitespace or readability, and all that is handled in the latter phase. How the latter phase is implemented, whether using swift-format or some home-grown formatter, is a separate debate (that we could have). But that decoupling is important for how the generator project is structured and tested.

@simonjbeaumont Might also have some thoughts to add here.

@simonjbeaumont
Copy link
Collaborator

As @czechboy0 describes, we currently make use of swift-format within the generator pipeline to simplify the generation, punting all effort to make it readable to another tool. This is so that the code we produce is readable.

it's been my experience that "easily readable output" ends up somewhere near the bottom of the list of features considered "critical" in a code generator.

I'm sympathetic to us challenging whether it's critical. @gwynne in your opinion, what would be the minimum bar for the generated code w.r.t. to formatting?

We currently make use of it in the tests to normalise some generated code vs expected code before testing for equality to get a sense of code equivalence. If we wanted to retain this in tests only, we could consider extracting this suite of tests into an outer package that has the swift-format dependency.

Leaving aside the utility for maintainers, and thinking solely about the adopters for a minute, they fall in one of two camps:

  1. Those that use the plugin, for whom the generated code is a build-time byproduct, buried in .build/ or DerivedData.
  2. Those that do ahead-of-time generation and check the code into their repository.

For (1), I guess @gwynne's point is that folks might not need look at the actual code if the tools present them with a good way of looking at the interface.

For (2) they are in full control and I suppose they could run swift-format themselves (or any other formatting tool) if they so desire.

@gwynne
Copy link

gwynne commented Jul 12, 2023

For (1), I guess @gwynne's point is that folks might not need look at the actual code if the tools present them with a good way of looking at the interface.

@simonjbeaumont Yep, 100% correct - looking at the actual generated code (versus the interface and/or docs) in the plugin use case isn't something I'd expect most people to do.

Ironically, I have spent a great deal of time dredging through the output of code-generating tools (including a particularly ghastly and ancient pair that make protoc's output look like idiomatic Swift by compare...). I tried using clang-format (and, longer ago, similar less powerful tools) on the generated code now and then, but I always ended up deciding it wasn't worth bothering. Either:

  • There was so much generated code that formatting it didn't make much practical difference (boilerplate is still boilerplate in any code style), or
  • The generated code was so obfuscated (intentionally or otherwise) that formatting just made it look like the output of a assembly decompiler.

The former is definitely true in the use case that first motivated this issue. The nicely formatted OpenAPI models are only slightly more useful at best than they'd be in "minimized" form (at least to me); it's an IDE's autocomplete that makes them practical to use.

As to the one-off generator use case, at this point it's very straightforward in such a setup to add on a separate formatting step if desired.

I'm sympathetic to us challenging whether it's critical. @gwynne in your opinion, what would be the minimum bar for the generated code w.r.t. to formatting?

Basically, just about anything above the level of what "minimized JavaScript" looks like, with the one-letter function and variable names and no newlines at all, meets the bar for me. Something like "one physical line for each declaration not containing nested method/property/type declarations" (off the top of my head) would be plenty IMO.

@czechboy0
Copy link
Contributor

Yep, 100% correct - looking at the actual generated code (versus the interface and/or docs) in the plugin use case isn't something I'd expect most people to do.

One use case of looking at the generated code that I expect most people to encounter at some point: debugging, and jumping to a frame in the generated code, inspecting variables, and contributing improvements. We actually tried to keep the generated code as direct and top-down readable for this reason, instead of jumping through tons of levels of indirection.

@gwynne
Copy link

gwynne commented Jul 13, 2023

One use case of looking at the generated code that I expect most people to encounter at some point: debugging, and jumping to a frame in the generated code, inspecting variables, and contributing improvements. We actually tried to keep the generated code as direct and top-down readable for this reason, instead of jumping through tons of levels of indirection.

Ooh, good point! I tend to think of generated code as an implementation detail, especially given #sourceLocation() (or similar) annotations. But given the compilation speed concerns and given how often major compile-time regressions show up in the compiler (see also swiftlang/swift#58380 and swiftlang/swift#61948 - swiftlang/swift#67214 is my third major performance regression bug, and there are others I never ended up filing), I think even with this use case in mind it's better to cover it with the ability to add a separate formatting pass. It should definitely be made as painless as possible to add said formatting pass, of course.

@czechboy0
Copy link
Contributor

I realize that in addition to the three parts of the build process, which this PR aims to speed up:

  1. building the generator
  2. running the generator
  3. building the generated code

There are also two use cases that are hit by the slowness in different ways, and might prefer different solutions:
A. build plugin adopters
B. adopters manually running the CLI and checking in their code

Some of the discussion above touches on different stages (1/2/3) and use cases (A/B), that I feel that it's tricky to really nail down how this PR addresses the issue, and at what cost.

To that end, I think it's unlikely we'll take this PR as-is, as it removes pretty important pieces of the generator, and would cause issues elsewhere. I think it's fair to discuss in the future whether formatting should be included or not, but I think that should be its own issue/PR, where we can discuss it on its own merits, instead of being motivated by build times of swift-format.

Let me file three separate issues and split the conversation into them, and then I'd propose we close this PR (but potentially bring it back later if we feel it's justified to do this and really solves one of the problems well).

@czechboy0
Copy link
Contributor

Ok, filed #115, #116, and #117, to properly separate the discussion of the individual phases. Please chime in there, especially if you want to propose a path forward 🙂

@MahdiBM, if you agree with this approach, feel free to close this PR and let's continue the discussion in the new issues.

@MahdiBM
Copy link
Contributor Author

MahdiBM commented Jul 13, 2023

Thanks for creating the issues 🙂

@MahdiBM MahdiBM closed this Jul 13, 2023
@MahdiBM MahdiBM deleted the main branch July 13, 2023 08:56
czechboy0 added a commit that referenced this pull request Jul 21, 2023
### Motivation

This PR adds the option to use the package as a Command plugin instead
of a BuildTool plugin.

This benefits those who use heavy OpenAPI documents, and prefer not to
have to wait for an extra round of OpenAPI code generation which can be
accidentally triggered at times, for example if you clean your build
folder.

The whole idea of creating this Command plugin came after @czechboy0 's
comment here:
#96 (comment)

### Modifications

Generally, add a Command plugin target to the package, plus modifying
the functions etc... to match/allow this addition.

### Result

There is a new Command plugin, and users can choose between the Command
plugin and the BuildTool plugin at will.

### Test Plan

As visible in the PR discussions below, we've done enough manual-testing
of the Command plugin.

---------

Co-authored-by: Honza Dvorsky <[email protected]>
Co-authored-by: Honza Dvorsky <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants