Speedup Ninja backend with many extract_objects or targets #13879

bonzini · 2024-11-06T15:25:21Z

This speeds up the target generation of QEMU, which is very slow.

Before: 41 seconds (total time 174 seconds)
After: 29 21 seconds (total time ~~150~~ 134 seconds)

eli-schwartz · 2024-11-06T16:18:44Z

mesonbuild/backend/backends.py

-                             proj_dir_to_build_root: str) -> T.Tuple[T.List[str], T.List[build.BuildTargetTypes]]:
-        obj_list: T.List[str] = []
-        deps: T.List[build.BuildTargetTypes] = []
+                             proj_dir_to_build_root: str,
+                             obj_list: T.List[str], deps: T.List[build.BuildTargetTypes]) -> None:


@dcbaker I suspect you may have some opinions about this.

--verbose? :)

mesonbuild/compilers/compilers.py

mesonbuild/backend/ninjabackend.py

eli-schwartz · 2024-11-20T04:50:56Z

Pushed 1b15bd0 and 58d1efb for now. The backend changes look very interesting, though also take quite a bit more effort to understand what the code is doing both before and after. :D

is_source() is called almost 900000 times in a QEMU setup. Together with the previously added caching, this basically removes _determine_ext_objs() from the profile when building QEMU. Signed-off-by: Paolo Bonzini <[email protected]>

get_target_generated_sources often calls File.from_built_relative on the same file, if it is used by many sources. This is a somewhat expensive call both CPU- and memory-wise, so cache the creation of build-directory files as well. Signed-off-by: Paolo Bonzini <[email protected]>

Do not reinvent it in NinjaBackend.determine_ext_objs(), so as to use the recently added caching of the results of File.from_built_relative(). Signed-off-by: Paolo Bonzini <[email protected]>

bonzini · 2024-11-20T07:17:39Z

The backend changes look very interesting, though also take quite a bit more effort to understand what the code is doing both before and after. :D

Fair enough! I've split the changes more finely, which should be easier to both review and apply individually. If you prefer to have multiple PRs let me know.

Regexes can be surprisingly slow. This small change brings ninja_quote() from 12 to 3 seconds when building QEMU. Before: ncalls tottime percall cumtime percall 3734443 4.872 0.000 11.944 0.000 After: ncalls tottime percall cumtime percall 3595590 3.193 0.000 3.196 0.000 Signed-off-by: Paolo Bonzini <[email protected]>

"Inline" CompilerArgs.__iter__() into CompilerArgs.__init__(), so that replace list(Iterable) is replaced by the much faster list(List). Before: ncalls tottime cumtime 19268 0.163 3.586 arglist.py:97(__init__) After: ncalls tottime cumtime 18674 0.211 3.442 arglist.py:97(__init__) Signed-off-by: Paolo Bonzini <[email protected]>

Unless an argument is marked as Dedup.OVERRIDDEN, pre_flush_set and post_flush_set will always be empty and the loops in flush_pre_post() will not be doing anything interesting: for a in self.pre: dedup = self._can_dedup(a) if a not in pre_flush_set: # This just makes new a copy of self.pre new.append(a) if dedup is Dedup.OVERRIDDEN: # this never happens pre_flush_set.add(a) for a in reversed(self.post): dedup = self._can_dedup(a) if a not in post_flush_set: # Here self.post is reversed twice post_flush.appendleft(a) if dedup is Dedup.OVERRIDDEN: # this never happens post_flush_set.add(a) new.extend(post_flush) In this case it's possible to avoid expensive calls and loops, instead relying as much on Python builtins as possible. Track whether any options have that flag and if not just concatenate pre, _container and post. Before: ncalls tottime cumtime 45127 0.251 4.530 arglist.py:142(__iter__) 81866 3.623 5.013 arglist.py:108(flush_pre_post) 76618 3.793 5.338 arglist.py:273(__iadd__) After: 35647 0.156 0.627 arglist.py:160(__iter__) 78998 2.627 3.603 arglist.py:116(flush_pre_post) 73774 3.605 5.049 arglist.py:292(__iadd__) The time in __iadd__ is reduced because it calls __iter__, which flushes pre and post. Signed-off-by: Paolo Bonzini <[email protected]>

self.post is only ever appended to on the right hand. However, it is then reversed twice in flush_pre_post(), by using "for a in reversed.post()" and appendleft() within the loop. It would be tempting to use appendleft() in __iadd__ to avoid the call to reversed(), but that is not a good idea because the loop of flush_pre_post() is part of a slow path. It's rather more important to use a fast extend-with-list-argument in the fast path where needs_override_check if False. For clarity, and to remove the temptation, make "post" a list instead of a deque. Signed-off-by: Paolo Bonzini <[email protected]>

Accumulate into lists that are passed by the caller, thus avoiding allocations and calls to extend() on recursive extract_objects(). Signed-off-by: Paolo Bonzini <[email protected]>

The proj_dir_to_build_root argument of determine_ext_objs() is always empty, remove it. Signed-off-by: Paolo Bonzini <[email protected]>

proj_dir_to_build_root is empty by default, in fact always except on some cases of the VS2010 backend. Add it after the fact in flatten_object_list(), which reduces the numbers of os.path.join(). Signed-off-by: Paolo Bonzini <[email protected]>

bonzini requested review from dcbaker and jpakkane as code owners November 6, 2024 15:25

bonzini force-pushed the speedups branch 4 times, most recently from c7bd527 to f501fa9 Compare November 6, 2024 16:08

eli-schwartz requested changes Nov 6, 2024

View reviewed changes

bonzini force-pushed the speedups branch 4 times, most recently from 47d8542 to 8c0faca Compare November 7, 2024 09:35

bonzini changed the title ~~Speedup Ninja backend with many extract_objects~~ Speedup Ninja backend with many extract_objects or targets Nov 7, 2024

bonzini force-pushed the speedups branch 2 times, most recently from 234adf6 to bc660cb Compare November 7, 2024 12:13

bonzini requested a review from eli-schwartz November 7, 2024 12:15

eli-schwartz reviewed Nov 20, 2024

View reviewed changes

mesonbuild/backend/ninjabackend.py Outdated Show resolved Hide resolved

bonzini added 3 commits November 20, 2024 08:09

compilers: cache the results of is_source()

f1fbed6

is_source() is called almost 900000 times in a QEMU setup. Together with the previously added caching, this basically removes _determine_ext_objs() from the profile when building QEMU. Signed-off-by: Paolo Bonzini <[email protected]>

ninjabackend: use File.from_built_relative()

589510f

Do not reinvent it in NinjaBackend.determine_ext_objs(), so as to use the recently added caching of the results of File.from_built_relative(). Signed-off-by: Paolo Bonzini <[email protected]>

bonzini force-pushed the speedups branch 3 times, most recently from 14150bd to 84b08ff Compare November 20, 2024 07:16

bonzini force-pushed the speedups branch from 84b08ff to 60192f2 Compare November 20, 2024 07:18

bonzini added 5 commits November 20, 2024 08:21

backends: avoid extend() in _flatten_object_list

2130bbb

Accumulate into lists that are passed by the caller, thus avoiding allocations and calls to extend() on recursive extract_objects(). Signed-off-by: Paolo Bonzini <[email protected]>

bonzini added 2 commits November 20, 2024 08:22

backends: remove unused argument

3593e90

The proj_dir_to_build_root argument of determine_ext_objs() is always empty, remove it. Signed-off-by: Paolo Bonzini <[email protected]>

bonzini force-pushed the speedups branch from 60192f2 to 58a072d Compare November 20, 2024 07:22

bonzini requested a review from eli-schwartz November 20, 2024 13:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speedup Ninja backend with many extract_objects or targets #13879

Speedup Ninja backend with many extract_objects or targets #13879

bonzini commented Nov 6, 2024 •

edited

Loading

eli-schwartz Nov 6, 2024

bonzini Nov 6, 2024

eli-schwartz commented Nov 20, 2024

bonzini commented Nov 20, 2024

Speedup Ninja backend with many extract_objects or targets #13879

Are you sure you want to change the base?

Speedup Ninja backend with many extract_objects or targets #13879

Conversation

bonzini commented Nov 6, 2024 • edited Loading

eli-schwartz Nov 6, 2024

Choose a reason for hiding this comment

bonzini Nov 6, 2024

Choose a reason for hiding this comment

eli-schwartz commented Nov 20, 2024

bonzini commented Nov 20, 2024

bonzini commented Nov 6, 2024 •

edited

Loading