Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only clone Fields, instead of trying to be smart about cloning properties and collections. #28

Open
MarcinJuraszek opened this issue Dec 4, 2017 · 11 comments

Comments

@MarcinJuraszek
Copy link
Owner

Inspired by https://github.com/ReubenBond/DeepCopy/ from Reuben I think about making a somehow breaking change (v2.0 I think) and move from trying to clone Properties, Fields and Collection Items separately to always cloning fields instead. That would allow us to remove the custom initialization dictionary, CloningFlags and more.

Fields are the only way to store data, and at the end of the day it's just about the data (fields) and not how it's exposed (properties, methods, indexers, etc.).

And as @deipax has already proven, it's possible to read/write from/to private fields using Expression Trees, as long as the right overload it used. So there shouldn't be any issues with doing that.

@deipax - I'd love to get your opinion on that before I start working on this.

@deipax
Copy link
Contributor

deipax commented Dec 6, 2017

I think the cloning of fields should work just fine ... but is that the reason for the speed increase, or is that due to the fact he is using IL? And his memory footprint is smaller as well ... it looks like I will be playing with this code as well.

@deipax
Copy link
Contributor

deipax commented Dec 6, 2017

|                                       Method |            Mean |         Error |         StdDev |          Median |    Gen 0 |    Gen 1 |    Gen 2 | Allocated |
|--------------------------------------------- |----------------:|--------------:|---------------:|----------------:|---------:|---------:|---------:|----------:|
|                         SimpleClass_CloneExt |       187.98 ns |      3.762 ns |      10.733 ns |       186.71 ns |   0.0443 |        - |        - |     280 B |
|                          ListOfInts_CloneExt |     4,437.00 ns |    140.315 ns |     413.722 ns |     4,349.16 ns |   6.3629 |   0.6332 |        - |   40280 B |
|       ListOfSimpleClassSameInstance_CloneExt |   641,737.59 ns | 14,144.575 ns |  41,483.589 ns |   640,145.04 ns |  11.7188 |   1.9531 |        - |   80384 B |
| ListOfSimpleClassDifferentInstances_CloneExt | 2,298,926.49 ns | 51,314.469 ns | 150,496.453 ns | 2,304,192.21 ns | 218.7500 | 218.7500 | 218.7500 | 1662112 B |
|                        ListOfStruct_CloneExt |   560,255.44 ns | 12,436.372 ns |  36,473.726 ns |   546,894.77 ns |  49.8047 |  49.8047 |  49.8047 |  160328 B |
|                         SimpleClass_DeepCopy |        36.69 ns |      1.011 ns |       2.980 ns |        36.11 ns |   0.0101 |        - |        - |      64 B |
|                          ListOfInts_DeepCopy |     4,719.32 ns |    136.703 ns |     403.073 ns |     4,714.02 ns |   6.3629 |   0.6332 |        - |   40064 B |
|       ListOfSimpleClassSameInstance_DeepCopy |   225,383.07 ns |  5,061.858 ns |  14,845.554 ns |   219,899.19 ns |  12.4512 |   1.9531 |        - |   80128 B |
| ListOfSimpleClassDifferentInstances_DeepCopy |   783,483.72 ns | 16,732.513 ns |  49,073.564 ns |   784,389.49 ns | 113.2813 |  41.9922 |        - |  720064 B |
|                        ListOfStruct_DeepCopy |    69,918.36 ns |  1,651.983 ns |   4,818.913 ns |    69,814.28 ns |  49.9268 |  49.9268 |  49.9268 |  160064 B |

@deipax
Copy link
Contributor

deipax commented Dec 7, 2017

I have a sneaking suspicion that part of the reason for the speed increase and memory footprint is due to the CopyContext. When the dictionary starts to fill up, it must resize itself which takes time ... the copy context holds onto it dictionary on a per thread basis. Once it is sized up, it does not have to do it again on subsequent calls. That is my guess ...

*Edit ... yes, this is true.

@deipax
Copy link
Contributor

deipax commented Dec 10, 2017

Another observation:

DeepCopy looks at the inheritance tree to determine if a class "NeedsTracking". If a class "NeedsTracking" the CopyContext is checked for its existence and is also updated with it existence once initialized. This is an attempt to prevent unnecessary calls to CopyContext.RecordCopy and CopyContext.TryGetCopy ... after all if a class does not have any cyclical references why bother tracking it?

This functionality mostly works ... but the check for "NeedsTracking" only checks down the inheritance tree, as a result this tests fails for DeepCopy:

       [TestMethod]
        public void Primitives_DoubleSimple()
        {
            SimpleClass x = new SimpleClass();

            DoubleSimple source = new DoubleSimple()
            {
                One = x,
                Two = x
            };

            var target = source.GetClone();

            Assert.AreSame(target.One, target.Two);
        }

SimpleClass is from DeepCopy tests. When the cloning of SimpleClass comes around, it is determined to not be in need of tracking because its inheritance tree (below it) has no cyclical references.

This bug could be fixed by first climbling to the top of the inheritance tree first and then running the NeedsTracking logic on that Type ...

Currently, CloneExtensions always adds/records reference types to the dictionary ... I do not think that it would be terribly difficult to add this conditional logic to CloneExtensions.

Edit - Since DeepClone, for a single instance of a SimpleClass, does not check the Context for existence or Records is, this explains the speed difference in the SimpleClass_* tests above.

@deipax
Copy link
Contributor

deipax commented Dec 10, 2017

In the CopierGenerator class there is this:

private static readonly ConcurrentDictionary<Type, DeepCopyDelegate<T>> Copiers = new ConcurrentDictionary<Type, DeepCopyDelegate<T>>();
...
private static readonly Func<Type, DeepCopyDelegate<T>> GenerateCopier = CreateCopier;
...
        public static T Copy(T original, CopyContext context)
        {
            // ReSharper disable once ExpressionIsAlwaysNull
            if (original == null) return original;

            var type = original.GetType();
            if (type == GenericType) return MatchingTypeCopier(original, context);

            var result = Copiers.GetOrAdd(type, GenerateCopier);
            return result(original, context);
        }

Why bother with the field of "GenerateCopier" with the value factory function for the ConcurrentDictionary? Why not just use "CreateCopier" directly?

I am not sure exactly what is going on, but if you do use "CreateCopier" directly in the GetOrAdd call, the time to perform the GetOrAdd close to triples.

@deipax
Copy link
Contributor

deipax commented Dec 10, 2017

For the ListOfStruct_* tests above ...

The SimpleStruct is considered Immutable.. that is, it is a value type itself and all of its field members are also immutables. Since that is the case, instead of cloning each element from the source array and copying it to the target array the DeepCopy logic calls the Array.Copy method to perform is work.

Edit - In DeepCopy code "Immutables" are the equivalent of what CloneExtensions refers to "IsPrimitiveCloneLogic". That is, the cloning work is taken care of by the runtime.

@MarcinJuraszek
Copy link
Owner Author

This is all very interesting. I think if we unify on copying just fields it will be easier to figure out if a struct is immutable or not. Currently, because we're using properties and indexers it's quite hard. With fields only we could assume a struct/class is immutable if all the fields are readonly and their type is considered immutable.

@deipax
Copy link
Contributor

deipax commented Dec 11, 2017

Understood ... I am currently looking at the performance differences and trying to understand them. As I move along, I am just making my observations. Here is another instance where DeepCopy has issues:

        public struct ComplexStruct
        {
            public int Int { get; set; }
            public SimpleClass One { get; set; }
            public SimpleClass Two { get; set; }
        }

The "NeedsTracking" logic determines this struct needs to be tracked ... as a result, the IL code attempts to add the struct to the CopyContex. Since the IL uses the address of the struct when adding it to the context, this actually throws a runtime error.

@MarcinJuraszek
Copy link
Owner Author

You should log those as issues on DeepCopy repo, if you haven't already :)

@deipax
Copy link
Contributor

deipax commented Jan 1, 2018

I finished my appraisal of DeepCopy and integrated my findings with my source. As suggested, I wrote up my observations for Reuben as an Issue for him.

Part of the work I did was to make my integration tests (and Benchmarks) easily adaptable so that I can test other frameworks with little effort. Check out the comparisons of my clone work vs Reubans here:

https://github.com/deipax/Deipax/tree/master/Source/Benchmarks.Cloning/BenchmarkDotNet.Artifacts/results/DeepCopy
https://github.com/deipax/Deipax/tree/master/Source/Benchmarks.Cloning/BenchmarkDotNet.Artifacts/results/Deipax

If you have any questions or comments, of if you would like my assistance in some way, just let me know.

Thank you,
-Jeff

p.s. - One thing I did in my branch was to allow users to supply their own Delegate Factory. It allows them to extend functionality and is easier to maintain. Here is the class:

https://github.com/deipax/Deipax/blob/master/Source/Deipax.Cloning/Common/CloneDelConfig.cs

@ghost
Copy link

ghost commented Sep 18, 2019

@deipax Did your CloneDelConfig.cs file move or get deleted? I don't see it in master anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants