Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No way to serialize an object of type System.Random #27429

Closed
panesofglass opened this issue Sep 19, 2018 · 32 comments
Closed

No way to serialize an object of type System.Random #27429

panesofglass opened this issue Sep 19, 2018 · 32 comments
Labels
area-System.Runtime question Answer questions and provide assistance, not an issue with source code or documentation.
Milestone

Comments

@panesofglass
Copy link

Code snippet to reproduce:

namespace Repro
{
    public class Program
    {
        public static void Main(string[] args)
        {
            var rand = new System.Random();
            var filename = @"C:\rng.bin";
            var binRngFormater = new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
            using (var fileWriteStream = new System.IO.FileStream(filename,System.IO.FileMode.Create))
            {
                binRngFormater.Serialize(fileWriteStream,rand);
                fileWriteStream.Close();
            }
        }
    }
}

Error message:

System.Runtime.Serialization.SerializationException: Type 'System.Random' in Assembly 'System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e' is not marked as serializable.

Is it possible to add System.Random to the list of binary serializable types? We use this to save the state of a random generator while processing long-running ML tasks.

@panesofglass
Copy link
Author

cc @SteliosFly

@GrabYourPitchforks
Copy link
Member

IMO a better design would be to allow System.Random to serialize its own state to a byte[] independent of BinaryFormatter or any other specific serializer. Here are some strawman APIs that could be used to accomplish this.

public class Random {
    public byte[] SerializeState();
    public static Random DeserializeFrom(byte[] serializedState);
}

@danmoseley
Copy link
Member

danmoseley commented Sep 20, 2018

I agree there is no need to bring binary formatter into this unless it is important for compatibility. @kcwalina do we have an established API pattern for serialization by custom means (eg into a byte array). I suppose not. Unless it's ISerializable which has too much binary formatter baggage.

@panesofglass
Copy link
Author

Does anyone know of a workaround that would allow us to serialize the state such that we could continue generating a consistent set of random numbers?

@danmoseley
Copy link
Member

cc @stephentoub , this came up today talking to ML folks. They have their own random generators, which they plan to made derived classes of Random, in which case the proposed API by @GrabYourPitchforks would work -- assuming it's virtual.

@GrabYourPitchforks
Copy link
Member

@danmosemsft My proposal would not automatically work with derived types. That's by design.

With my proposal the way you'd have to do it is through method hiding. DerivedRandom would also have a non-virtual serialize instance method and a static deserialize method. If you know that you have an instance of DerivedRandom rather than Random, you'd need to call the appropriate static deserialization method.

Again, this is intentional, because it avoids deserialization exploits.

@Sebazzz
Copy link

Sebazzz commented Nov 2, 2018

@panesofglass The only options are not ugly, but really plain wrong. Either you manually extract and set the private fields of the randomizer or you need to wrap it with a class that holds the seed of the random, and on deserialization "replays" the calls on the Random instance so you end up with the same state.

In any case, the random algorithm is not part of the contract I think so it may change at any time.

@danmoseley danmoseley changed the title Binary serializing a type of System.Random fails with SerializationException No way to serialize an object of type System.Random Mar 11, 2019
@danmoseley
Copy link
Member

Clarified the title. I agree that there is no need to bring binary serialization into this.

@madewokherd
Copy link
Contributor

Is compatibility with .NET a consideration for this? A quick comparison suggests corefx uses the same algorithm as referencesource and thus .NET (which does have a serializable Random), so potentially Random instances could be compatibly serialized and deserialized.

@GrabYourPitchforks
Copy link
Member

In general, we're loathe to mark types as [Serializable] because each new type marked as such opens the window to introduce new deserialization exploits into the Framework. We're trying to get people off of BinaryFormatter as much as possible.

@danmoseley
Copy link
Member

If folks reading this issue would find @GrabYourPitchforks suggested API valuable, please thumbs up that post above.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@layomia layomia removed their assignment Mar 19, 2020
@Ozzard
Copy link

Ozzard commented Jun 15, 2020

... damn. Back to .Net Framework I go, then :-(. Or I suppose I grab the source and make a serializable version. We've had working software for 14 years that uses this property of Random.

Should I assume that binary serialization will no longer be supported in some future version of .Net Core? It'd be good to know now, so that we can move our core product entirely away from .Net towards some system such as Smalltalk with fast state load/save.

@GrabYourPitchforks
Copy link
Member

@Ozzard Is there any reason the proposal at #27429 (comment) wouldn't work for your scenarios?

@Ozzard
Copy link

Ozzard commented Jun 15, 2020

One and a half, neither of which should necessarily stop it moving forward:

  • Elapsed time! This is not triaged, and is at best likely to arrive in .Net 6. I need a solution within days, or move our development back to .Net Framework. I don't want to do that; Framework has a lot of baggage I really don't want to carry. And, apparently, some baggage that I do want to carry!
  • API coherence. Special-casing System.Random doesn't make much sense to me; either Core moves to support efficient, unsafe serialization or it doesn't. Leaving gaps in support - especially for a class that is purely managed code with no external dependencies - feels really strange. Why should I have access to most - but not all - of a remarkably useful deserialization framework?

My use case is fast state dumps of simulations with up to about half a gigabyte of data, mostly small (10-200 byte) data objects, so that the simulations can be restarted from known points. The data forms a directed graph, neither tree-structured nor acyclic. It has to be fast because we're allowing users to interact with these things, and while we're dumping state we're unable to interact. 30 seconds is about our realistic upper limit. I couldn't care less about deserialization exploits; we have full control over the byte stream and the code in use at all points. I selected binary formatting because it did everything we needed without trying to satisfy ultimately unsatisfiable goals for fast state save such as security.

@panesofglass
Copy link
Author

My use case is fast state dumps of simulations with up to about half a gigabyte of data, mostly small (10-200 byte) data objects, so that the simulations can be restarted from known points.

This was my use case, as well.

@danmoseley
Copy link
Member

It's not impossible this API could get into 5.0 if its fairly clear it's the appropriate design, etc, someone championed it in API review, and there was a community volunteer to implement and get it merged into master all by July 15th.

Or I suppose I grab the source and make a serializable version.

Seems a pretty reasonable interim solution if it's the only thing holding you on .NET Framework.

@danmoseley
Copy link
Member

As mentioned above, we will not add [Serializable] to more types as that makes it visible to BinaryFormatter, which is a dead-end. (Your point taken that in your scenario the data is fully trusted - unfortunately there is never end to the number of cases in which such data is trusted when it should not be.)

@GrabYourPitchforks in the general case of an arbitrary serializer, what is our story? For our Json serializer, I guess our current story would be to write a converter explicitly, but ideally a type could support some recognized pattern. Our nearest thing to an API for "serialize/deserialize" is ISerializable. Can a serializer consider use of this (as opposed to field-wise deserialization) to be safe, or are there too many existing implementations of it that are known to not be safe?

@GrabYourPitchforks
Copy link
Member

@danmosemsft There were some proposals floating around to allow serialization of arbitrary objects, but they never really took off. ISerializable could be used to pull this off, as it's not intrinsically tied to dangerous serializers like BinaryFormatter. But in practice it's not ever used outside of that context because the API surface isn't quite tuned for non-BinaryFormatter scenarios.

Generally speaking, for standard serialization the best thing is to stick to POCOs and primitives, plus simple collections like T[] and List<T>. Anything else necessarily involves types exposing their implementation details to the serializer, which means those types are essentially locked for all time and can no longer evolve with the ecosystem.

For types like Random where the behavior is effectively locked for all time anyway, it's ok for us to expose these implementation details. Hence my earlier suggestion of static Serialize or Deserialize methods which worked with byte arrays.

@danmoseley
Copy link
Member

@GrabYourPitchforks thanks, that is clear. Let's say we added those methods to Random that deal with byte arrays, then the consumer would have to ensure there was a converter for Random appropriate for their serializer of choice, right?

@steveharter just curious, taking JSON serializer as the test case, do we imagine we would offer converters for types like this, or they would be externally supplied by app or some library it depends on? Just wondering how this ecosystem fits together.

@GrabYourPitchforks
Copy link
Member

It'd be hooked up in the same way that any type not natively understood by the serializer gets hooked up. Write a custom converter, or define a separate property on your POCO which handles the translation for you:

public class MySerializableClass
{
    private Random _rnd;
    public byte[] RngBytes
    {
        get => _rng.Serialize();
        set => { _rng = Random.Deserialize(value); }
    }
}

@danmoseley
Copy link
Member

OK. From discussing offline, it might also be interesting to consider whether we should have a general pattern that a type should implement to implement if it wants to serialize to/from bytes. This would not be for graphs - probably the type would contain only primitive fields, as Random does. @stephentoub we have no such pattern today do we?

@stephentoub
Copy link
Member

we have no such pattern today do we?

We have some types that accept bytes into constructors and expose bytes from properties/methods, e.g. IPAddress.TryWriteBytes and IPAddress(ReadOnlySpan<byte>, long), BigInteger.TryWritesBytes and BigInteger(ReadOnlySpan<byte> value, bool isUnsigned = false, bool isBigEndian = false), etc.

@jpapp05
Copy link

jpapp05 commented Jun 17, 2020

This probably needs to be in its own topic, but it is related to the binary serializer and its current incarnation in .NET Core, so apologies up front for miss-posting.

The changes/restrictions to the binary serializer is also preventing us from porting our commercial applications over to core; though, I completely agree with the overall sentiment and security concerns.

However, in our case (and I expect others too), have used the binary serializer not so much for serialization, but as a deep cloner. In our app it is used for the following:

  1. Undo/Redo
  2. Generating audit logs (i.e. change from one state of the model to another)
  3. Easy OK/Cancel semantics in complex modal dialogs

In all these cases, the internal representation of the serialization was never exposed, cared about, or saved externally.

It would be very cool if .NET Core officially supported some standard deep cloning facility. We looked at many libraries, but none of them honored the binary serializer's attributes, special constructors, and callback interfaces which we need. Others used internal implementation details or had varying solutions depending on the version of .NET core being used (emit, expression trees, etc.), and/or carried a bunch of .NET Framework legacy code.

We are currently in the process of writing this functionality, which we hope will unblock us and allow us to continue our transition to core.

I’ve seen a bunch of these issues raised about or around the legacy binary serializer and just wanted to share some use cases that you might not be aware of that may be impacting others in conversion from the legacy .NET framework. Anyway, something to think about...

Thanks much,
James

@GrabYourPitchforks
Copy link
Member

@jpapp05 Do you have a scenario for cloning Random instances specifically, or are you discussing deep object graph cloning in general terms?

@jpapp05
Copy link

jpapp05 commented Jun 17, 2020

The later. For us at least, the issue with Random is just on one of many similar obstacles between the legacy framework and core when it comes to differences in the scope of the binary serializer. For example, "serializing" delegates being another one.

Since we are working around the issue there is no real ask here; just an observation from our own path to .NET core and the struggles we had with converting a large commercial application; ditching the binary serializer has by far been one of our biggest hurdles. Some type of deep cloning service might be a way for others to be unblocked, even if in this particular use case it does not really apply.

@Ozzard
Copy link

Ozzard commented Jun 22, 2020

@danmosemsft Is there, anywhere, a razor to decide between the following two options?

  • Performance or functionality is sufficiently important for a class of users to expose unsafe functionality despite the dangers (for example the vector intrinsics);
  • Performance or functionality is sufficiently unimportant prevent a class of users from using .Net Core for their use case due to security concerns (for example this issue).

Also, can some kind soul point me to any expected replacement support for efficient object graph (de)serialization? I don't mind if it's a package - in fact I think it's probably better outside of .Net Core itself to make it unavailable to as many attackers as possible. But I'd be concerned if there are plans to drop this capability with no replacement.

@danmoseley
Copy link
Member

@GrabYourPitchforks is a better person than me to speak to the various serializers and their pros/cons.

From the core libraries, you could look at DataContractSerializer

@Ozzard
Copy link

Ozzard commented Jun 22, 2020

DataContractSerializer is almost, but not quite, useful. Here's why:

  • In order for it to serialize an object like a Random in an object graph, that object's class must be specially marked (by the author of the class - it can't be later marked by a developer). It can be marked either by specifying a DataContract or by SerializableAttribute.
  • The new system classes such as Random are not marked with data contracts;
  • The new system classes such as Random are being actively unmarked with SerializableAttribute where they were so marked in Framework.

The net effect of this is that the effort to secure BinaryFormatter also makes its closest replacement, DataContractSerializer, unusable.

How likely is it that the team would accept DataContract markers for system classes such as Random?

@Ozzard
Copy link

Ozzard commented Jun 22, 2020

@danmosemsft Just to check, there was also the first part of my comment - any statement or documentation around a razor that can be applied consistently to deal with decisions between performance/functionality vs. security. Who would be well placed to comment on that, please, or is there anything already existing to which you could point me? Architecturally and commercially, I reckon it's critically important to understand a candidate technology's maintainers' views on that perennial nasty trade-off.

@danmoseley
Copy link
Member

Fair question and I'm looking to @GrabYourPitchforks to share that guidance.

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Jun 27, 2020

I am working on some public-facing guidance on this for serializer writers. But in a nutshell, there are three things that make BinaryFormatter dangerous, and other serializers (like NetDataContractSerializer, not to be confused with DataContractSerializer) which follow these principles are likewise vulnerable.

In these examples I'll talk about BinaryFormatter specifically. But it's more useful to think about the behavior that each bullet describes. Serializers that exhibit these behaviors would be cause for concern.

  1. The serializer allows unrestricted polymorphic deserialization and embeds full type information within the payload itself. That is, it embeds the fully-qualified type name "Contoso.Models.Employee, Contoso.Library, Version=1.0.0.0, ..." within the payload with the intent of calling FormatterServices.GetUninitializedObject(Type.GetType(<that-string>)). This is distinct from a binder which allows restricted polymorphic deserialization. A restricted polymorphic deserializing binder might allow you to specify the limited strings "Employee", "Manager", or "CEO" in the incoming payload, and on the deserialization side those would map back to the well-known types typeof(Employee), ..., with no calls to Type.GetType anywhere. This latter method is how frameworks like OData support polymorphism.

Some developers use a SerializationBinder to try to work around these issues. It helps with (1) only to a very limited extent, and even that isn't really guaranteed / 100% supported. And it certainly doesn't help with (2) or (3).

  1. The serializer relies heavily on private reflection. When an instance of the type is deserialized, the normal constructor validation is bypassed. This means that the type's internal data structures might not meet the type's own invariants, which is a type safety violation. It's not safe to use BinaryFormatter to deserialize seemingly innocuous types like Exception or List<T> or Dictionary<TKey, TValue>, for instance. Additionally, since the serializer relies heavily on private reflection, the internal implementation details of the target class effectively become a part of the public contract. This makes it very difficult to version or improve the class without introducing significant breaking changes.

  2. The serializer wasn't created with any type of safety in mind. It's not even that safety was an afterthought. It wasn't a part of the equation at all. Trying to bolt "safe" behaviors onto such a thing after the fact is a losing proposition regardless of how you slice it. It would require essentially a complete rewrite, and certain scenarios would break. A "safe" version of BinaryFormatter wouldn't allow serializing or deserializing cycles, for instance, as allowing cycles by default is generally considered unsafe. (Yes, I know many deserializers allow this by default, but it is sometimes a cause of security vulnerabilities.) This also dovetails with (2) above, since the BinaryFormatter payload format itself depends on the fact that the serializer utilizes private reflection under the covers. You see this reflected in the data structures used for parsing the serialized payload.

Like I said, more should be coming on this over the next few weeks.

@joperezr joperezr removed the untriaged New issue has not been triaged by the area owner label Jul 2, 2020
@GrabYourPitchforks
Copy link
Member

We're winding down usage of BinaryFormatter within the framework and do not plan on attributing any new or existing types with [Serializable]. Additionally, the API proposal at #27429 (comment) received only 3 upvotes, indicating little interest.

Closing this issue for now. If there is interest after all in adding a serializable API surface for Random we can investigate it in a different issue.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Runtime question Answer questions and provide assistance, not an issue with source code or documentation.
Projects
None yet
Development

No branches or pull requests