Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C# Pure Function Keyword to Mark No Side Effects Or External Dependencies #7561

Closed
wcrowther opened this issue Dec 17, 2015 · 32 comments
Closed

Comments

@wcrowther
Copy link

Be able to mark a method as having no side effects or external dependencies, ie: it does not change any state outside the inputs or outputs. Any code that attempts to do this would throw an exception. My thought was that the keyword could be "functional", "pure" (as in a "pure functions" mentioned in some Msdn documentation ), "purefunction", or even "nosideffects".

See https://msdn.microsoft.com/en-us/library/bb669139.aspx for some current naming conventions and reasons for this feature.

@wcrowther wcrowther changed the title C# Pure Function Keyword to Mark No Side Effects and External Dependencies C# Pure Function Keyword to Mark No Side Effects Or External Dependencies Dec 17, 2015
@sharwell
Copy link
Member

❓ Considering we already have [Pure], which is very short and doesn't require new keywords be added, what additional benefit(s) would this proposal bring?

📝 Note that I'm not necessarily against this proposal. I'm just trying to get some more context. 😄

@HaloFour
Copy link

If the compiler were to start enforcing method purity through the addition of new warnings or errors than a new keyword might be necessary in order to avoid breaking changes.

private int x;

public void Unpure() {
    x++;
}

[Pure]
public void Pure1() {
    Unpure(); // legal, no change to existing code
}

public pure void Pure2() {
    x++; // compiler error, side effects
    Unpure(); // compiler error, method not marked as pure
    Pure1(); // legal, method marked as pure (even if it might not be)
}

An analyzer that could issue warnings based on PureAttribute would probably be a good start, though.

@alrz
Copy link
Member

alrz commented Dec 17, 2015

Doesn't pure imply "always evaluating the same result value given the same argument value"? I think C++ const would be more familiar; e.g. void M() const { }. or whatever dotnet/csharplang#2543 would use.

@orthoxerox
Copy link
Contributor

I like this idea, but how do we verify that a function has no side-effects, recursively? Does memoizing count as a side-effect? If not, how can that be verified? Even if #49 is implemented so we can encapsulate that ConcurrentDictionary instance inside the function, we cannot mark GetOrAdd() as [Pure], because it isn't.

@alrz
Copy link
Member

alrz commented Dec 18, 2015

@orthoxerox I think it needs an immutable map, and yet map itself is a mutating state, probably needs a state monad or something? Then recursion is off the table, I guess.

@orthoxerox
Copy link
Contributor

@alrz One option would be to add Memoized as a new parameter to [Pure]. This new parameter would force the compiler to rewrite the function into something like this if the original function was verifiably pure.

[Pure(Memoize=true)]
modifiers return_type name(args)
{
    body;
}
[Pure(SkipVerification=true)]
modifiers return_type name(args)
{
    return_type mangled_name(args) {
        body;
    }
    static let memos = new ConcurrentDictionary<ValueTuple<args_types>, return_type>();
    static let locks = new ConcurrentDictionary<ValueTuple<args_types>, object>();

    return_type result;
    let a = ValueTuple.Create(args);

    if (!memos.TryGetValue(a, out result)) {
        var l = locks.GetOrAdd(a, new object());
        lock (l) {
            result = memos.GetOrAdd(a, mangled_name);
        }
        locks.TryRemove(a, out l);
    }
    return result;
}

@leppie
Copy link
Contributor

leppie commented Dec 18, 2015

Just comment.

If a method returning void is marked as pure, the compiler should be able to remove it as it has no side-effects.

@sharwell
Copy link
Member

@leppie Common cases where you may want code that doesn't affect program state but don't need to return a value:

  • Benchmarking
  • Unit tests - here exceptions could be considered a meaningful return value, which means even pure methods marked as void can have a return value of sorts

Removing the following would probably not be desirable, yet it's arguably a pure method:

public class Requires
{
  [Pure]
  public static void NotNull<T>(T value, string paramName)
    where T : class
  {
    if (value == null)
      throw new ArgumentNullException(paramName);
  }
}

@leppie
Copy link
Contributor

leppie commented Dec 18, 2015

@sharwell The presence of a possible throw, hardly makes it 'pure' :) But I get what you saying. Perhaps pure is not the best word here.

@alrz
Copy link
Member

alrz commented Dec 18, 2015

@orthoxerox Memoization wouldn't make it to the language. (already proposed in #5205).

@MgSam
Copy link

MgSam commented Dec 18, 2015

The existing [Pure] attribute is not particularly helpful, as it doesn't even produce a squiggle if you write to fields. I'm all in favor of a better way to mark methods that shouldn't have side effects. Right now I am often forced to try to use static methods for this purpose, but that only goes so far because sometimes you need static fields and there's nothing to stop you from writing to them.

@sharwell
Copy link
Member

The existing [Pure] attribute is not particularly helpful, as it doesn't even produce a squiggle if you write to fields.

This could be implemented as an analyzer. However, it's a bit complicated.

  • Writing to non-static fields of instances directly or indirectly created by the pure method would probably be allowed. This means a pure method can in some cases call a non-pure method, without removing its effective purity.
  • Writing to instance fields of a struct which is passed as a parameter would probably be allowed, unless the parameter uses ref. Writing to a struct parameter with ref can be fine as long as the reference points to a stack-allocated struct in a caller
  • Creating instances is generally OK, as long as the constructor is also pure. Unlike other methods, pure constructors can write to their own instance fields.
    • Constructors of types which have user-defined finalizers (including in a base type) cannot be pure unless the finalizer is also pure.

@alrz
Copy link
Member

alrz commented Dec 18, 2015

Does a so-called "pure function" as a sole feature really help in a non-immutable type? C++ allowed this and instead disallowed it for static methods. Makes sense that way, but with immutable types I suspect pure functions as a distinct entity would make the world any better. I mean, having partially-immutable types might be confusing, yet, the C++ way of "purity" might be a better approach — purity at the function level and immutability at the variable declaration (type usage) level, instead of type declaration level. This would allow declaring variables like "immutable arrays" e.g. int immutable[] arr = {3,4}; which I think even dotnet/csharplang#2543 couldn't address very well (via immutable unsafe).

@ashmind
Copy link
Contributor

ashmind commented Dec 22, 2015

Concept of "pure" does not have a single clear definition between languages, so it might be better to use some alternative terminology.

E.g. when I researched this last time, here's what I ended with:

  1. In D the only limitation is non-mutation of global state. "Pure" functions can mutate its arguments.
  2. In GCC there are two types of "pure": pure (no side-effects, but can read global state) and const (stricly pure as per wikipedia definition).
  3. In C#, [Pure] is defined as "does not make any visible state changes" (whatever that is).
  4. Haskell follows the Wikipedia definition (deterministic + no side effects)

http://cs.stackexchange.com/questions/24406/is-there-a-canonical-definition-of-pure-function

That's not even starting on how exceptions should behave.

I think each limitation we could apply to "pure" has it's own uses, e.g. determinism excludes reading mutable state -- good for concurrency. So maybe some more complex state attribute(s) are needed.


And if we look just at side effects, there is another question -- is this pure?

public string M() {
    var builder = new StringBuilder();
    builder.Append("Hello world!");
    return builder.ToString();
}

It can only be verifiably pure if StringBuilder.Append is marked with some variant of mutability attribute that specs self-mutation but not outside-mutation. Which again brings the need for more complex mutability descriptions.

@alrz
Copy link
Member

alrz commented Dec 24, 2015

@ashmind How about isolated for StringBuilder.Append or the whole StringBuilder class?

@leppie
Copy link
Contributor

leppie commented Dec 24, 2015

Local mutation within a method whose variables are not captured (free) would not be impure to me.

@ashmind
Copy link
Contributor

ashmind commented Dec 24, 2015

@alrz
I think at least the following qualities are needed (I'm not suggesting the keywords, just trying to see the whole picture).

Function quality Description
CanChangeExternalState Non pure, default behavior
CanChangeArguments (including this) Non pure, but can be used as pure if the arguments don't come from external state. E.g. combination of new StringBuilder and any number of StringBuilder.Append is side-effect-free and deterministic.
CanReadExternalState Pure by some definitions, but might not be safe for concurrency

That also raises a question of ownership -- let's say we have a class X that has internal StringBuilder in a field. If we can demonstrate that this StringBuilder is owned by the class, then we can prove that changing it is changing this and not external state. So some kind of [Owns] annotation would be useful.

@alrz
Copy link
Member

alrz commented Dec 25, 2015

@ashmind I didn't understand, having said isolated or "CanChangeArguments" methods (only able to change internal state) what is the need for ownership qualifiers? by "internal" we mean "not leaking outside of the class", so they must be private right? I mean a private state doesn't imply it belongs to the enclosing type? and can you please elaborate on "CanReadExternalState" what are its use cases?

@ashmind
Copy link
Contributor

ashmind commented Dec 25, 2015

I didn't understand, having said isolated or "CanChangeArguments" methods (only able to to change internal states) what is the need for ownership qualifiers? by internal we mean not leaking outside of the class, so they must be private right? I mean a private state doesn't imply it belongs to the enclosing type?

Example:

public class Changer {
    private readonly Changeable _inner;

    public Changer(Changeable inner) {
        _inner = inner;
    }

    public void Change() {
        _inner.X = "Y";
    }
}

Is this class changing external state or only state it owns? It's uncertain and depends on whether inner is owned by this instance, or whether it might be shared. One example where this is already important is disposal -- e.g. Autofac has Owned<T> references that specify that instance is owned and will be disposed by the owner.

and can you please elaborate on "CanReadExternalState" what are its use cases?

Reading external state makes function potentially unsafe for threading, and unsafe for permanent memoization. On the other hand, it would mean that function does not change external state, and so is safe to call it automatically in debugging, for example.

@alrz
Copy link
Member

alrz commented Dec 25, 2015

@ashmind (1) ok, assuming that _inner belongs to the Changer class, how would you know that argument passed to the constructor is not shared? (2) I'm thinking in #7626, so "CanReadExternalState" doesn't provide anything useful for immutability enforcement, right?

PS: I think the answer to the number (1) is in dotnet/csharplang#6611. Perhaps, a type qualifier would be better than move I guess.

@HaloFour
Copy link

Considering that PureAttribute already exists and it has been applied to some percentage of the BCL, assuming (and this is a big assumption) that this has been done using a somewhat consistent set of rules, I think that any direct support for pure functions in the C# compiler should adhere to those same rules.

If that's not the case I think that the C# compiler should pick a set of rules and run with it. Trying to adopt many flavors of pure from many different languages seems like a recipe for disaster. However, I could see value in offering that level of information within the syntax tree available to analyzers.

@alrz
Copy link
Member

alrz commented Dec 25, 2015

@HaloFour Not from different languages, these are just concepts tied to immutability, if you want a safe environment to code in, I think it's good to take advantage of these concepts, it encourages you to actually think about isolation and immutability in your design and prevents you to kick yourself in the foot.

@HaloFour
Copy link

@alrz What other languages consider "pure" methods was mentioned by @ashmind. I understand that there are different concepts around immutability, but it doesn't make sense to try to take one concept like "pure" functions and to attempt to accomplish all of them when they differ in design. My point is that the CLR already has "pure" functions, as decorated by the existing attribute, and it makes the most sense for C# to adhere to the same conventions already applied rather than trying to forge its own path, or worse, trying to define some crazy matrix of degrees-of-purity.

@alrz
Copy link
Member

alrz commented Dec 25, 2015

@HaloFour There are two paths that can be taken for immutability enforcement in a language. F# does this by making mutability a special case e.g. mutable keyword, but for C# this is not applicable because everything is mutable by default. Deep immutability (#7626) on the other hand, as Eric said, "has strict requirements that may be more than we need, or more than is practical to achieve." Two scenarios in which this becomes a problem are initialization (like #229) and iteration, I can imagine that "isolation" would be helpful in these situations, while it doesn't affect "purity" as far as immutability concerns.
For example, if you want to use property initializers or iterating over an immutable list, I think it makes sense if you annotate property setters like const int P { get; isolated set; }. Also to be able to use foreach, GetEnumerator should be annotated as such, because MoveNext is not pure by definition.

@Richiban
Copy link

Richiban commented Feb 8, 2016

There's another benefit to having purity enforced by the compiler (or, at least, to have the compiler reasonably confident about purity) -- some of the artificial constraints around covariance would be lifted. For example:

If we define a very simple ISet interface

interface ISet<T> : IEnumerable<T>
{
    bool Contains(T item);
}

Unfortunately we can't declare our Set interface as ISet<out T> because the Contains method uses T in an input position; something the language disallows to prevent inserting a Banana into a list of Fruit that is actually a list of Apples.

But! In a set you should be able to safely check whether it contains a given item even though the collection is covariant. Why? Because the contains function is pure. So the following could be allowed by the compiler:

interface ISet<out T> : IEnumerable<T>
{
    [Pure] bool Contains(T item);
}

Being pure means:

  • Not calling any unpure methods (that includes properties)
  • No assigning to fields, properties etc (basically anything that isn't a local variable)
  • Not accessing any fields unless they are declared readonly

That should cover most of the basics. In theory, if you can't any unpure data (e.g. via properties or methods) then your function kind of has to be deterministic as well...

@jpierson
Copy link

I was just getting ready to propose this exact feature. My assumption was that pure members could only call other pure members. This would be an improvement in cases where I've created static methods just to narrow the reach available to the statements within the method. Could having such a pure keyword assist the complier in optimizations as well? Pure methods should obviously be able to be inferred by the complier in order to make the optimizations so I suppose the use of the keyword would be more about making the contract (for lack of better words) more explicit.

I see this being useful for code readability and developer experience in an IDE or code editor. Example would be when hovering over a method call in a body of code it could indicate that it is pure which gives me an immediate assurance that the method isn't modifying any state. Another option would be to more subtly changing the code syntax colorization to make pure calls distinct.

A possible extension of this could be to allow for a sort of local purity scoped by a class where only fields on the local instance or other members also marked as local pure could be invoked. This would allow class implementations that could guarantee that it doesn't reach out to any global singletons or anything like that. The keyword that would make the most sense here to me would be isolated. Both the proposed pure keyword and a hypothetical isolated keyword seem to be sort of inversely related to the normal access modifiers (public, private, protected, ...). I think it would be crucial to make sure that if introduced that they have an obvious separation in the syntax.

public:pure int add(int x, int y);
public class Person
{
    int Weight { get; private set; }

    public:isolated void eat(Food item)
    {
        this.Weight += item.Weight;
    }

    public void shout(string phrase)
    {
        Console.WriteLine(phrase.ToUpper());
    }
}

In the example above a class could be marked isolated which would enforce that a class could only contain members that are themselves isolated.

public:isolated class Person
{
    public void shout(string phrase)   // Compiler error, shot is not isolated
    {
        Console.WriteLine(phrase.ToUpper());
    }
}

Perhaps the same may make sense for the pure keyword in that it could be used at the class level to ensure that all members are pure members.

@be5invis
Copy link

So... should we use a F*-style effect marker that forms a lattice?
We have Total at the lowermost position (purely functional, guaranteed return), then we have Divergence, State and Exception. And the effect marker of “general” C# functions are CSharp...

cc. @nikswamy @catalin-hritcu @simonpj

@jcouv
Copy link
Member

jcouv commented Jul 29, 2017

Issue moved to dotnet/csharplang dotnet/roslyn#776 via ZenHub

@jcouv jcouv closed this as completed Jul 29, 2017
@jcouv
Copy link
Member

jcouv commented Jul 29, 2017

Moved the issue over to csharplang repo to continue the discussion there. Thanks

@AustinBryan
Copy link

Doesn't pure imply "always evaluating the same result value given the same argument value"

Actually, Unreal Engine 4 (their visual scripting, not C++) uses pure functions to denote functions that "promise" not to have side effects. I said promise in quotes, because you still can modify things, but the nature of them and the visual scripting meant that they had to basically be used as what would be the equalivent of calling it only as an arguement to another function, ie, using it only for the return value.

If it wasn't used for that, it had no way of being called. So, pure is very familiar to me and I think it makes sense. const in C++ made me think it meant it dealt with function pointers and const functions couldn't be set or something else.

@jpierson
Copy link

jpierson commented Mar 30, 2019

@AustinBryan, coincidentaly perhaps, it looks like in Rust-Lang that they have opted for the const keyword to implement something very similar to what is being asked for here.

https://blog.rust-lang.org/2018/12/06/Rust-1.31-and-rust-2018.html#const-fn

@aguzev
Copy link

aguzev commented Oct 4, 2019

Surely benchmarking and unit tests have side effects: they produce reports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests