-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to json serialization without adding support for precomputed caches #5983
Conversation
This gives RAR the ability to save off a modified state file and load it for later use. Still to do: 1) Verify that the precomputed cache read in is real. 2) Change logging such that it will log messages instead of errors or warnings if there are problems with deserializing the precomputed cache 3) Comments at relevant points 4) Switch serialization mode to another form (json?) 5) Validation + performance tests 6) Have SDK opt in
It was previously serializable via BinaryFormatter. This makes it serializable via System.Text.Json's JsonConverter.
Add support for serializing SystemState using System.Text.Json rather than the older BinaryFormatter.
… just when serializing and deserializing the precomputed cache
I have not yet measured how long it takes to read a MVID vs. reading and parsing its references, nor have I made unit tests.
...although they don't work yet
…son-serialization
…son-serialization
The version of Arcade MSBuild currently uses uses Roslyn version 3.3.1. This updates that, but it should be removed when we've updated to a more modern version of arcade.
…into json-serialization
…son-serialization
… into serialization-only
c45c079
to
fcbeb81
Compare
eng/Packages.props
Outdated
<PropertyGroup> | ||
<NuGetPackageVersion>5.9.0-preview.1.6870</NuGetPackageVersion> | ||
<NuGetBuildTasksVersion Condition="'$(NuGetBuildTasksVersion)' == ''">$(NuGetPackageVersion)</NuGetBuildTasksVersion> | ||
<NuGetCommandsVersion Condition="'$(NuGetCommandsVersion)' == ''">$(NuGetPackageVersion)</NuGetCommandsVersion> | ||
<NuGetProtocolVersion Condition="'$(NuGetProtocolVersion)' == ''">$(NuGetPackageVersion)</NuGetProtocolVersion> | ||
</PropertyGroup> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a bad merge that's overwriting NuGet updates.
@@ -986,13 +986,15 @@ | |||
</ItemGroup> | |||
<ItemGroup> | |||
<PackageReference Include="System.Collections.Immutable" /> | |||
<PackageReference Include="System.Reflection.Metadata" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're using this in full framework now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Among the things that (presumably) was useful at some stage in serialization, though possibly just for BinaryFormatter-serialized caches.
<PackageReference Include="System.Resources.Extensions" /> | ||
</ItemGroup> | ||
|
||
<!-- Tasks need to mimic redistributing the compilers, so add references to both full framework and .net core --> | ||
<ItemGroup> | ||
<!-- Reference this package to get binaries at runtime even when Arcade is not adding compiler references --> | ||
<PackageReference Include="Microsoft.Net.Compilers.Toolset" ExcludeAssets="all" Condition="'$(UsingToolMicrosoftNetCompilers)' == 'false'" /> | ||
<PackageReference Include="System.Text.Json" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a normal reference it should go in the PackageReference
group above. The comments don't apply to normal references.
{ | ||
if (!string.IsNullOrEmpty(_stateFile) && _cache.IsDirty) | ||
{ | ||
_cache.SerializeCache(_stateFile, Log); | ||
var deserializeOptions = new JsonSerializerOptions() { Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So . . . the Unsafe
part of this name sure stands out to me. Is it necessary? Is it safe? If so, comment an explanation of why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, and I think so. If I remember correctly, this has to do with whether characters are escaped, and this specifies they don't need escaping, but almost everything in the cache is ultimately a number, a string, a bool, or a version, so other random characters should be flagged as "wrong type" outside the string, which should accept them without running them as code or immediately reject them depending on the circumstance. I don't remember which characters were being serialized incorrectly, but something was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still want to know this for sure, and with comments in the source explaining.
_cache.SerializeCache(_stateFile, Log); | ||
var deserializeOptions = new JsonSerializerOptions() { Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping }; | ||
deserializeOptions.Converters.Add(new SystemState.Converter()); | ||
File.WriteAllText(_stateFile, JsonSerializer.Serialize<SystemState>(_cache, deserializeOptions)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you call the overload that takes a stream so we don't have to materialize the possibly-LOH full string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, no. Not available in .NET Framework. If you want to switch to .net 5...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not available in .NET Framework.
What specifically isn't? I see
public static ValueTask<TValue?> DeserializeAsync<TValue>(Stream utf8Json, JsonSerializerOptions? options = null, CancellationToken cancellationToken = default(CancellationToken))
in the net461 decompilation.
If you want to switch to .net 5
We are doing so in #5515.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -64,9 +64,10 @@ internal virtual void SerializeCache(string stateFile, TaskLoggingHelper log) | |||
/// <summary> | |||
/// Reads the specified file from disk into a StateFileBase derived object. | |||
/// </summary> | |||
internal static StateFileBase DeserializeCache(string stateFile, TaskLoggingHelper log, Type requiredReturnType) | |||
internal static T DeserializeCache<T>(string stateFile, TaskLoggingHelper log, bool logWarnings = true) where T : StateFileBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this method look good but I don't understand why they're being made. Aren't we getting rid of BinaryFormatter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, yes, but StateFileBase is used for other classes, too. I did some cleanup while I was looking at it.
/// Construct. | ||
/// </summary> | ||
internal SystemState() | ||
internal sealed class Converter : JsonConverter<SystemState> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you go into a fair amount of detail on why this fairly-custom converter is here? Is it required for functionality? Is it a perf improvement? Will it be necessary after dotnet/runtime#1568 lands? Why choose the explicit-converter approach over the JsonConverter
attribute approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: haven't looked at this part because waiting on 👆🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started out with a not-customized implementation, had some difficulty getting everything to be serialized, and asked for help. You told me to write a customer serializer. For the first part of that, see 8b3b613.
I think you suggested this is more secure, serializes everything, and hopefully runs faster than using a default converter.
/// <summary> | ||
/// Flag that indicates | ||
/// </summary> | ||
/// <value></value> | ||
internal bool IsDirty | ||
{ | ||
get { return isDirty; } | ||
set { isDirty = value; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a setter here feels wrong. Generally dirtiness is a result of mutating something and not directly manipulable. Is that not true here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is true, but this is very useful in the test I deleted. (It's among the "not actually necessary but present" changes I mentioned.) I'd prefer not to bother with deleting and readding them.
src/Tasks/SystemState.cs
Outdated
@@ -540,7 +669,7 @@ out fileState.frameworkName | |||
|
|||
dependencies = fileState.dependencies; | |||
scatterFiles = fileState.scatterFiles; | |||
frameworkName = fileState.frameworkName; | |||
frameworkName = fileState.FrameworkNameAttribute; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why go through the getter when we have direct access?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it matter? The getter just returns it.
I think I tried to eliminate the field but found it was necessary and put it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put it all the way back. Unnecessary churn like this makes reviews harder and generally hurts understandability of the codebase.
{ | ||
if (!string.IsNullOrEmpty(_stateFile) && _cache.IsDirty) | ||
{ | ||
_cache.SerializeCache(_stateFile, Log); | ||
var deserializeOptions = new JsonSerializerOptions() { Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still want to know this for sure, and with comments in the source explaining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review
{ | ||
log.LogMessageFromResources("General.CouldNotReadStateFileMessage", stateFile, log.FormatResourceString("General.IncompatibleStateFileType")); | ||
retVal = null; | ||
log.LogMessageFromResources("General.CouldNotReadStateFile", stateFile, log.FormatResourceString("General.IncompatibleStateFileType")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a slightly different codepath now? The older version checked retVal._serializedVersion != CurrentSerializationVersion
before logging this, should we check the same here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see what you mean? It looks to me like the retVal._serializedVersion != CurrentSerializationVersion check is after logging a warning (always) in the old code. The changes here are:
- The conditions were mutually exclusive --> switch to else if
- Make function parameterized instead of explicitly taking a type as an argument
- Give the option of not logging warnings
throw new JsonException(); | ||
} | ||
string parameter = reader.GetString(); | ||
reader.Read(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this capturing exactly? Could you comment the first use of reader.Read() in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably does deserve a comment, since it's kinda overloaded.
The Json reader is set up such that every thing (technical term) of interest has its own type. In particular, when specifying that a particular property has a particular value, it has the "PropertyName" token followed by the token(s) representing the property. If that property is not null, it will start with a "StartObject" token if it's an object or a "StartArray" token if it's an array, or just have a value we read below. If it's null, it just starts with null, and there isn't anything else.
In this case, I read the first token. If it's null, that means that the json has something like "propertyName"=null, which is valid though avoided unless you modify it by hand. Having the null check you commented on means that we don't keep reading the object if it happens to be null. GetString() would have returned a valid property name on the ANE that wasn't set.
If the property isn't null, it skips over the "StartObject", etc. and lets us start reading the interesting stuff immediately or parses the token as appropriate. Does that make sense?
} | ||
string parameter = reader.GetString(); | ||
reader.Read(); | ||
if (reader.TokenType == JsonTokenType.Null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean when we see null here? What would we have just read from GetString()
if we fall into this case?
We need to move off BinaryFormatter.
Test of basic functionality (creates simple cache in json and reads it) passed.