Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider logging ProjectStarted more sparingly #5316

Closed
KirillOsenkov opened this issue Apr 26, 2020 · 13 comments
Closed

Consider logging ProjectStarted more sparingly #5316

KirillOsenkov opened this issue Apr 26, 2020 · 13 comments
Assignees
Labels
Area: Logging Performance-Scenario-Build This issue affects build performance. triaged
Milestone

Comments

@KirillOsenkov
Copy link
Member

ProjectStartedEventArgs is usually very heavyweight, with GlobalProperties, Properties and Items.

For a 130 MB binlog you get 6844 events totalling ~0.5 GB, where the most massive event starts around 5 MB, with 29 GlobalProperties (including CurrentSolutionConfigurationContents of 40 KB), 8517 items (most with tons of metadata) and 1289 properties.

We see projects being called multiple times with the same properties and items. Maybe if those are immutable collections we should see if we've logged this exact snapshot before, and if yes, avoid logging it again? Feels like something along these lines is already in place, since some projects don't get properties and items logged already (and I don't understand when they are or are not logged).

Basically worth looking into. ProjectStarted is the second most heavy of all args types, topped only by BuildMessage totalling ~ 1 GB.

@KirillOsenkov KirillOsenkov added Area: Logging Performance-Scenario-Build This issue affects build performance. labels Apr 26, 2020
@rainersigwald rainersigwald added this to the Backlog milestone Apr 29, 2020
@KirillOsenkov
Copy link
Member Author

From @mmitche:

here's a thought on something i noticed. In that huge log there are a TON of project started nodes, with a ton of total properties and itemgroups that get logged. But in quite a few cases, the target that is being invoked doesn't actually exist...
 
Should msbuild just be skipping the logging?

@KirillOsenkov
Copy link
Member Author

See related: #3616

@KirillOsenkov
Copy link
Member Author

I think I have an idea of how to deduplicate these.

  1. in the BinaryLogger, when we are about to write a list of properties, or a list of items, we first hash. If we saw the hash before, just write a stub with the hash. Otherwise add the hash to the global hashset and write out the full thing, including the hash.
  2. in the viewer, when we read a property list or item list, we see if it's a hash or a full thing. If it's the full thing, read it and associate the hash with this list, otherwise we must have seen it before so just retrieve from the cache.

We can do it for a list of properties and a list of items.

@KirillOsenkov
Copy link
Member Author

@mmitche suggests it would be even better if we didn't send the duplicate properties and items across nodes at all. This would cut down on the chattiness for all loggers.

@mmitche
Copy link
Member

mmitche commented May 12, 2020

My experiments indicate that only about 10% of the overall project invocations have unique input item + property sets. Overall these properties are gigantic (e.g. aspnetcore had about 3 gigs of data) so the wins could be significant.

@KirillOsenkov
Copy link
Member Author

Avoid logging ProjectStarted items and properties if the target doesn't exist or is empty

@KirillOsenkov
Copy link
Member Author

I think we should log properties and items at the ProjectEvaluationFinished, not at ProjectStarted. This way they will be logged only once if multiple projects start with the same evaluation. This will reduce the amount of data we log that has to flow through the logging system.

And we can make an escape hatch to only turn on the new mode for BinaryLogger.

@KirillOsenkov
Copy link
Member Author

What I've learned is that properties and items on ProjectStartedEventArgs aren't sent across nodes at all. This explains why only some projects have properties and items - they were built on the in-proc node.

See here:

// UNDONE: (Serialization.) Rather than storing the properties directly in this class, we could
// grab them from the BuildRequestConfiguration associated with this projectId (which is now poorly
// named because it is actually the BuildRequestConfigurationId.) For central loggers in the
// multi-proc case, this could pull up just the global properties used to start the project. For
// distributed loggers in the multi-proc case and all loggers in the single-proc case, this could pull
// up the live list of properties from the loaded project, which is stored in the configuration as well.
// By doing this, we no longer need to transmit properties using this message because they've already
// been transmitted as part of the BuildRequestConfiguration.

@KirillOsenkov
Copy link
Member Author

LoggingService.SerializeAllProperties is interesting

@KirillOsenkov
Copy link
Member Author

MSBUILDFORWARDALLPROPERTIESFROMCHILD if set to one will enable sending all properties from nodes to central node (but not items since there's no logic to serialize items in WriteToStream)

@KirillOsenkov
Copy link
Member Author

_buildParameters.LogInitialPropertiesAndItems is equivalent

@KirillOsenkov
Copy link
Member Author

LoggingNodeConfiguration.LogTaskInputs is a place where it feels appropriate to configure whether to send properties and items on ProjectEvaluationFinished instead of ProjectStarted

@KirillOsenkov
Copy link
Member Author

Yes looks like LoggingNodeConfiguration is the proper way to tell the node what to do. Setting a Trait from BinaryLogger will only impact the central node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Logging Performance-Scenario-Build This issue affects build performance. triaged
Projects
None yet
Development

No branches or pull requests

4 participants