Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider allow pointers as generic type arguments #13627

Open
tannergooding opened this issue Oct 23, 2019 · 49 comments
Open

Consider allow pointers as generic type arguments #13627

tannergooding opened this issue Oct 23, 2019 · 49 comments
Labels
area-TypeSystem-coreclr enhancement Product code improvement that does NOT require public API changes/additions
Milestone

Comments

@tannergooding
Copy link
Member

In .NET and the latest C# language versions, it is currently allowed to have MyStruct<T>* and T* (.NET allows this more generally, but C# allows it only for "unmanaged" types).

However, it is not possible to have MyStruct<T*>. This can be "problematic" for certain types of generic code, such as being able to have a Span<int*> x or Lazy<ID3D12Device*> since you instead must instantiate it as IntPtr and cast at the relevant usage sites.

Going through ECMA-335, the rationale given in II.9.4 Instantiating generic types is:

[Rationale: Unmanaged pointers are disallowed because as currently specified unmanaged pointers are not technically subclasses of System.Object. This restriction can be lifted, but currently the runtime enforces this restriction and this spec reflects that. ]

I would propose that this restriction be lifted and pointers be allowed as generic type arguments. If that is possible, I would open a corresponding proposal on dotnet/csharplang suggesting that the language allow the same.

@john-h-k
Copy link
Contributor

I can see GetHashCode/ToString/Equals simply using the IntPtr methods, but how would GetType work here? Would the actual pointer types be given that method?

@tannergooding
Copy link
Member Author

tannergooding commented Oct 23, 2019

I would imagine it works the same as typeof(int*), which is it returns [System.Int32*]:
https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBLANgHwAEAmARgFgAoQgZgAIS6BhOgbyrs4foBUBPAA4w6AcRgZ+QgJIA7DDWIAFCNjkwoACgCUdALwA+OhkEwIAMw2qMAKi0BuKgF8gA==

Edit: Noting that this is the same codegen as is already done for an arbitrary T: https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBLANgHwAEAmARgFgAoQgZgAIS6BhOgbyrs4foBUBPAA4w6AcRgZ+QgPIAzHgB4eAPgAUASjoBeJXQyCYEGSp5qA3FQC+QA=

@john-h-k
Copy link
Contributor

I meant something more like this
https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBLANgHwAEAmARgFgAoQgZgAIS6BhOgbyrs4foBUBPAA4w6AcRgZ+QgPIAzHgB4eAPgAUASjoBeJXQAmMGQEMArrgwqeagHRiJgmOoDcVAL5A===
where in the IL it is calling constrained. !!T System.Object.GetType(), which obviously isn't an actual method on pointer types so I guess the JIT would need to recognise it and just insert the type handle there instead?

@tannergooding
Copy link
Member Author

That would likely require some work (either on the language side or the runtime side) to ensure things don't blow up 😄. I imagine there will be a few places like that if the relaxation was allowed.

@MichalStrehovsky
Copy link
Member

The main restriction is that generic parameters can be boxed (one can do box !T in IL). So this would be about figuring out how the boxing happens for pointer types.

Making pointers derive from System.Object (or System.ValueType... or a new System.Pointer?) would be the first step. Then there would be questions about whether we want them to box like Nullable<T> (i.e. boxing a null pointer produces a null reference; unboxing a null reference produces a null pointer). How do we define equality of these boxed objects (just dispatch to IntPtr.Equals and ignore the type, or include the type of the pointer?).

I expect there would be a pretty long bug tail after that.

@tannergooding
Copy link
Member Author

Having a special-cased Pointer<T> type might be feasible. It could internally just be private T* _value and then in generic instantiations would just be Span<Pointer<T>>. Everything else is just syntax sugar/specialized handling after that (much like Nullable<T> like you called out).

@4creators
Copy link
Contributor

IMHO this issue belongs to https://github.com/dotnet/csharplang/ repo.

Otherwise, I fully support the idea. It's a huge simplification for multiple coding patterns.

@tannergooding
Copy link
Member Author

IMHO this issue belongs to https://github.com/dotnet/csharplang/ repo.

The C# language can't reliably do this without the restriction first being lifted in the runtime. Now it's also reasonable that the runtime wouldn't do this without the language also agreeing to expose said support, but ensuring it is feasible for the runtime to handle is the first step (thus this issue and the note "If that is possible, I would open a corresponding proposal on dotnet/csharplang suggesting that the language allow the same.").

@4creators
Copy link
Contributor

4creators commented Oct 23, 2019

ensuring it is feasible for the runtime to handle is the first step

thus this issue and the note "If that is possible, I would open a corresponding proposal on dotnet/csharplang suggesting that the language allow the same."

IMHO this is a C# language feature which requires runtime support. Reading between lines I understand you want to avoid pushback in csharplang repo due to the requirement of the runtime support. But this doesn't change the fact that the issue should go to dotnet/csharplang .

@alexrp
Copy link
Contributor

alexrp commented Jul 22, 2022

Assuming that this work would enable function pointers as type arguments as well, then FWIW, I have a function hooking API in one of my projects that would benefit considerably from this. (Apologies in advance for the somewhat lengthy comment.)

The API shape currently looks like this (simplified):

public sealed unsafe class FunctionHook
{
    public void* Target { get; }
    public void* Hook { get; }
    public void* Trampoline { get; }
    public object? State { get; }

    public static FunctionHook Current { get; }

    public static FunctionHook Create(void* target, void* hook, object? state = null);
}

Usage looks like this:

[UnmanagedCallersOnly(CallConvs = new[] { typeof(CallConvStdcall) })]
static int SetThreadDescriptionHook(nint hThread, char* lpThreadDescription)
{
    Console.WriteLine(new string(lpThreadDescription) + FunctionHook.Current.State);

    return 0;
}

var kernel32 = NativeLibrary.Load("kernel32.dll");
var setThreadDescription = (delegate* unmanaged[Stdcall]<nint, char*, int>)NativeLibrary.GetExport("SetThreadDescription");

using (var hook = FunctionHook.Create(func, (delegate* unmanaged[Stdcall]<nint, char*, int>)&SetThreadDescriptionHook, "bar"))
    fixed (char* p = "foo")
        _ = setThreadDescription(-1, p); // Prints "foobar".

I would much rather have an API shape like this:

public sealed unsafe class FunctionHook<T>
    where T : delegate* /* or simply : unmanaged */
{
    public T Target { get; }
    public T Hook { get; }
    public T Trampoline { get; }
    public object? State { get; }

    public static FunctionHook Current { get; }
}

public sealed unsafe class FunctionHook
{
    // (target is typed as void* since you almost always obtain it as that and forcing an extra cast doesn't actually add value.)
    // (Throws if !typeof(T).IsFunctionPointer.)
    public static FunctionHook Create<T>(void* target, T hook, object? state = null)
        where T : delegate* /* or simply : unmanaged */;
}

This new API shape doesn't change the above example much - it just has to be changed to pass the function pointer type as a generic argument and remove the cast from &SetThreadDescriptionHook. Importantly, though, the properties are now typed properly so they can be easily called without needing highly error-prone casts.

An even more significant benefit is the type information that is now available to the FunctionHook implementation (once function pointer introspection lands, anyway). I need entry/exit to/from the hook function to go through a 'hook gate' - that gate guards against some common pitfalls when hooking native APIs, and also takes care of making the FunctionHook.Current property work correctly. Unfortunately, this means doing some really nasty hacks that involve hijacking the return address currently on the stack and tail calling the hook function. (I have to do so because I have no idea what the prototype of the target/hook functions is, so I am not allowed to push another return address onto the stack.) This hack is very unfortunate because, while the basic calling convention is satisfied, other aspects of the ABI such as unwindability certainly are not.

If I could know from typeof(T) exactly what the calling convention, return type, and parameter types are, I would be able to drop this hack completely and instead generate correct code that forwards parameters and the return value when doing a normal call to the hook function, removing the need for the return address hijacking. Of course, I could take a Type parameter in my Create method -- likely what I'll end up doing in the short term -- but then I would still lose out on type safety for the properties, and it would be easy for the typeof(delegate* ...) argument to get out of sync with the actual function type without getting a compiler error.

So, being able to pass a function pointer type as a generic argument here would help me solve a bunch of API safety, usability, and implementation problems all at once.

@alexrp
Copy link
Contributor

alexrp commented Aug 8, 2023

Another reason we need this: The shiny new inline arrays (#61135) are completely unusable with pointers and function pointers, presumably due to conversion to Span<T>, severely hampering their usefulness for interop bindings.

@rickbrew
Copy link
Contributor

rickbrew commented Aug 8, 2023

Another reason we need this: The shiny new inline arrays (#61135) are completely unusable with pointers and function pointers, presumably due to conversion to Span<T>, severely hampering their usefulness for interop bindings.

As an example of this, these can be useful for methods that take an array of COM interface pointers. Here are a few used in my project:

ID2D1Factory::CreateGeometryGroup https://learn.microsoft.com/en-us/windows/win32/api/d2d1/nf-d2d1-id2d1factory-creategeometrygroup

ID2D1GeometryGroup::GetSourceGeometries https://learn.microsoft.com/en-us/windows/win32/api/d2d1/nf-d2d1-id2d1geometrygroup-getsourcegeometries

IWICBitmapEncoder::SetColorContexts https://learn.microsoft.com/en-us/windows/win32/api/wincodec/nf-wincodec-iwicbitmapencoder-setcolorcontexts

@Marlax0
Copy link

Marlax0 commented Oct 16, 2023

Another reason we need this: The shiny new inline arrays (#61135) are completely unusable with pointers and function pointers, presumably due to conversion to Span<T>, severely hampering their usefulness for interop bindings.

This alone should be enough of a reason to seriously consider lifting this limitation.

There are always workarounds, but they feel unnecessary when generics exist.
I feel like a dope casting Thing* to nint and back to Thing* all over the place, just to use generic helpers.
In an odd way it makes unsafe C# even more unsafe than usual.

@jgcodes2020
Copy link

This issue's been up for a while now, what's happening?

@davidfowl
Copy link
Member

It's in the "Future" milestone, that means it's not being actively worked on.

@Korporal
Copy link

A similar argument suggest we should support <ref T> as well as <T*> is this being actively pursued by the team?

@Neme12
Copy link

Neme12 commented Mar 16, 2024

A similar argument suggest we should support <ref T> as well as <T*> is this being actively pursued by the team?

ref is not even part of the type in C# (that's why you have to keep saying ref to keep the refness, and why you can't have ref of ref), so I'm not really sure how this would work.

@john-h-k
Copy link
Contributor

ref is not even part of the type in C# (that's why you have to keep saying ref to keep the refness), so I'm not really sure how this would work.

That's not really true. It's a syntactic decision by the language that makes it feel like it is implicit, but ref T is still a fundamentally different type from T. Type.IsByRef wouldn't make any sense if ref wasn't part of the type

@Neme12

This comment was marked as off-topic.

@Neme12

This comment was marked as off-topic.

@Neme12

This comment was marked as off-topic.

@john-h-k
Copy link
Contributor

ref T is still a fundamentally different type from T.

In the runtime yes, but in C# it's not a type.

I agree it is not a type in the sense "the C# spec does not mention it under the category Types", but I don't see what meaningful distinction there is that makes it not a type. Yes it has implicit dereference, but that in no way makes it "not a type".

It's why there's no syntax (like there is for pointers) for dereferencing a ref. If you made generics support ref, it would have to work differently than elsewhere in the language, and you'd have to introduce a syntax for dereferencing the T inside that generic method.

The reason there is no syntax for dereferencing a ref is because that was the decision made. Classes are implicitly dereferenced, and there is no syntax for dereferencing them, so surely they are not types by this logic?

If you made generics support ref, it would have to work differently than elsewhere in the language, and you'd have to introduce a syntax for dereferencing the T inside that generic method.

Why would you need to do that? A generic method that takes a T never needs to explicitly dereference that T - it can't because it doesn't know if it is a ref or not. If the generic method does need to dereference it, then it must guarantee T is a ref, and that can already achieved by parameterising on the underlying type and taking a ref T instead.

@Neme12

This comment was marked as off-topic.

@john-h-k
Copy link
Contributor

You'd also have to allow ref of ref. Let's say you have two generic parameters T and U. You want to set T to ref U. But what if U is a ref? You'd then have a ref of ref. You'd have to completely redo how ref works in C#.

With pointers, this isn't an issue - there is syntax for dereferencing, the "pointerness" doesn't automatically constantly decay like ref does, and you can have pointers to pointers.

I agree this becomes messy (syntactically), but there's two clear routes:

  • Support ref ref Foo
  • Support compiling nested refs in the context of generics, but don't bring nested refs to the rest of the language, for simplicity

@john-h-k
Copy link
Contributor

"the C# spec does not mention it under the category Types"

So you agree it's not a type in C# :D you just said it.

I agree that the C# 8 Draft specification says it is not a type. But A) that's pretty old, and B) I think it is irrelevant. There is nothing about ref T that makes it fundamentally "not a type"

@Neme12

This comment was marked as off-topic.

@john-h-k
Copy link
Contributor

Why would you need to do that? A generic method that takes a T never needs to explicitly dereference that T - it can't because it doesn't know if it is a ref or not.

To support the scenario in a truly generic way - have a type parameter T that can be ref but also might not be, you'd have to have a syntax for dereferencing. If you don't, how do you set T to another ref? By saying ref x? But now you made T require being a ref and it's not an open generic anymore. And at this point, you don't need ref as type parameters at all - if it's always a ref, then you can just use ref T already today without the type parameter itself being ref.

In a generic method such as this

public static void F<T>(T a)
{
    T b = a;
}

It would just be decided that if T is a ref, then this is a ref assignment. This makes sense, because given an open T, the only possible operation to do here is an exact assignment which does not dereference.

@Neme12

This comment was marked as off-topic.

@Neme12

This comment was marked as off-topic.

@john-h-k
Copy link
Contributor

It would just be decided that if T is a ref, then this is a ref assignment

Great. But then how do you do "regular" assign (assign to what T points to)? You can't, without a syntax for dereferencing.

Of course you can't - because that is impossible in an open generic? To do a dereferencing-assign like that, you need to restrict T to be a reference type, which as we both said, is possible by just making T a generic and taking ref T instead.

If you want an open-generic function over T that can take any type (int and ref int for example), you CANNOT dereference. It doesn't make sense.

If you want a function over T that can dereference T, then you must take a ref T instead, and then the normal syntax works.

You are trying to do operations specific to ref-types within the context of a generic that accepts both ref and non-ref types. That will not work

@john-h-k
Copy link
Contributor

But A) that's pretty old, and B) I think it is irrelevant

It doesn't matter that it's old, the fact that it's not a type in C# hasn't changed at all. And of course it's relevant. You'd have to make this work in C# if you want to use it in C#, and you can't make it work if you don't change how ref works and make ref a type.

But by this definition ("ref T is not a type because it is not in the spec"), making ref T a type would simply entail putting it in the spec, which is not too hard. If there is something fundamentally difficult about making ref T a type (which I believe it effectively already is, albeit restricted and with some odd syntactical constructs), then that would be a different story.

@Neme12

This comment was marked as off-topic.

@Neme12

This comment was marked as off-topic.

@john-h-k
Copy link
Contributor

Of course you can't - because that is impossible in an open generic?

OK you're right. But you still couldn't do this without completely changing the language. It's not just a matter of "putting it in the spec". Today, if a method returns ref X, you can just do M() = x. If you say that a method returning by ref like that is actually returning a type called ref X, then assigning to that would be a ref assignment, which it's not today.

You are equating my view that "ref is a type" with "ref shouldn't automatically dereference". That's not what I am saying.
M(x) = x works fine with ref as a type. It returns the type ref X, and a ref X on the left hand side of an assignment is automatically dereferenced, just as it always has been.

The only change I am suggesting is how refs interact in generic contexts. Nothing else

@john-h-k
Copy link
Contributor

john-h-k commented Mar 16, 2024

Classes are implicitly dereferenced, and there is no syntax for dereferencing them, so surely they are not types by this logic?

They are types by the language, unlike ref. And classes work with generics without an issue because they're not implicitly dereferenced the same way that ref is. If you have a class C and variables C x; and C y; assign x = y;, you are assigning the reference itself. You are not assigning to what C points to. There's only automatic dereference by the . operator.

Again, "types by the language" just seems to mean "in the spec". Refs being added as a type in the spec wouldn't change how the language worked, but then they would be considered types.
The argument about ref not being types needs to have a fundamental difference between ref T and T/T?/T* etc. If it's just a technicality (being in the spec), then it doesn't matter.

And classes work with generics without an issue because they're not implicitly dereferenced the same way that ref is. If you have a class C and variables C x; and C y; assign x = y;, you are assigning the reference itself. You are not assigning to what C points to. There's only automatic dereference by the . operator.

I do see the difference around assignment there and that is a big difference. But that doesn't feel like it is the "reason" they work with generics. They work with generics because generics were written to support classes. Generics could be written to support refs too (with the rules I mentioned earlier). The only caveat is it leave some unintuitive parts of the syntax (the fact {Some Type} a = b is legal if and only if {Some Type} is not a concrete ref-type)

@Neme12

This comment was marked as off-topic.

@john-h-k
Copy link
Contributor

I don't think you could make this work without a huge breaking change in the language. If you have ref int x; and assign x = 5; and say that x is of type ref int and a ref type of the left-hand side of an assignment is automatically dereferenced, then ok, but what if you do x = ref y;? You said that x is automatically dereferenced, so now you can't ref-assign it (you'd have to do ref x = ref y). If you say x is automatically dereferenced when the right-hand side isn't of a ref type, then what if the right-hand side is a function call and that function returns a ref X? That still isn't a ref assignment today. You'd have to make the ref keyword somehow special and not just simply part of the type (which is what it is today).

I don't understand why considering ref T a type makes you think the rules about how it works have to change. Just as it used to, x = ref y is a ref assignment because that is literally meaning of = ref. It is called the Ref Assignment Operator. So to clarify the rules:

  • A ref on the left-hand-side of an assignment will dereference
  • A ref on the left-hand-side of a ref assignment will not dereference

These are the EXACT same rules as now. No changes

And even if you could make this somehow work by adding special rules on top of special rules, the most unintuivite part about it would be that if you have a generic method, call it with type ref X and then decide to make the function not generic and substitute T for ref X, you'd have completely different semantics.

I agree, this would be intuitive. But I don't see any other ways to achieve it

@Neme12

This comment was marked as off-topic.

@Neme12

This comment was marked as off-topic.

@john-h-k
Copy link
Contributor

john-h-k commented Mar 16, 2024

Just as it used to, x = ref y is a ref assignment because that is literally meaning of = ref.

But then... you agree that what you have there is the = ref operator and ref isn't part of y and the type of the right-hand side is NOT ref Y. So ref is not part of the type, and we're back to square one. You can't "just make it part of the type" without changing the language.

ref isn't part of y

ref is not part of y in the sense that the = ref operator's "ref" token is not part of it. But ref must be a part of y, because for x = ref y to be legal y must have been declared as ref T!

We don't write integer assignments like this

x = int 10;

but 10 still has type int. Just as y has type ref T in x = ref y.

the type of the right-hand side is NOT ref Y.
This does not follow from "= ref is an operator".

  1. x is a local variable of type ref int
  2. y is a local variable of type ref int
  3. x = y is an assignment operation
  4. Ref-type variables on the LHS of an assignment dereference
  5. Ref-type variables on the RHS of an assignment dereference
  6. The assignment works as expected

And for = ref:

  1. x is a local variable of type ref int
  2. y is a local variable of type ref int
  3. x = ref y is a ref assignment operation
  4. Ref-type variables on the LHS of a ref assignment do not dereference
  5. Ref-type variables on the RHS of a ref assignment do not dereference
  6. The assignment works as expected

You can't "just make it part of the type" without changing the language.

I am suggesting it is already effectively a type and should be treated as such. I don't know what definition of type you have, but I still haven't seen a reason for why ref T is not a type, and if so, what changes would "make it a type". If the answer is "well it isn't on the spec", then you can just add it to the spec as a type and that is making it part of a type without changing the language

@john-h-k
Copy link
Contributor

you'd have to do ref x = ref y

Though I do wish this was how it worked so that the rules could be more consistent with ref being a type, and you'd have a clear difference between regular assignment and a ref assignment, and not just only depending on the right-hand side. But that's not where we are.

But this would be ref being an operator almost, rather than a type. x = ref y is unambiguous because = ref MUST mean ref assignment. You know purely from seeing = ref that x is a ref variable

@Neme12

This comment was marked as off-topic.

@Neme12

This comment was marked as off-topic.

@john-h-k
Copy link
Contributor

But ref must be a part of y, because for x = ref y to be legal y must have been declared as ref T!

That's not true. It could be a normal variable. And you are assigning a reference to that variable.

Ah yes I realise what you mean. But I stand by the rest of my points

@john-h-k
Copy link
Contributor

john-h-k commented Mar 16, 2024

But ref must be a part of y, because for x = ref y to be legal y must have been declared as ref T!

That's not true. It could be a normal variable. And you are assigning a reference to that variable.

No, that is illegal. Sharplab

That's an error because the left-hand side isn't a ref variable. The right-hand side can be a regular variable.

Yeah, realised what you meant just after. I read "And you are assigning a reference to that variable" to mean "assigning a reference [RHS] to that variable [normal variable on LHS]" instead of "assigning a reference [of LHS] to that variable [normal variable on RHS]", my bad

@Neme12

This comment was marked as off-topic.

@john-h-k
Copy link
Contributor

It's irrelevant whether y was declared as ref or not for x = ref y to work

Yep

So if ref Y is a type, it's irrelevant whether y is ref Y or Y

I do not in anyway see how this follows. Here is an argument using the same reasoning as this one

Instead of the two types being ref Y x and Y y/ref Y y, use double x and float y/double y. Of course, x = y is valid in both scenarios, just as x = ref y was valid in both prior scenarios.

It's irrelevant whether y was declared as double or float for x = y to work. So if float is a type, it's irrelevant whether y is float or double. So it's not a meaningful part of the type system. You can call the type float instead of double if you wish, but it's not actually a different type from double itself.

It's the exact same argument. Both are the compiler allowing multiple types on the RHS. Suggesting that something is not a "meaningful part of the type system" because there are SOME scenarios where it behaves as something else just doesn't make sense. If in EVERY scenario, ref Y and Y were indistinguishable, then yes they would be the same type. But that isn't true, because ref Y can be used in entirely different ways to Y

@alexrp
Copy link
Contributor

alexrp commented Mar 16, 2024

A similar argument suggest we should support <ref T> as well as <T*> is this being actively pursued by the team?

It would be a completely separate feature from what this issue is proposing. Pointers in generics are relatively 'easy' to implement, while refs in generics would have far-reaching effects on all parts of the runtime.

Can we please take the debate about refs in generics to a separate issue/discussion? This is getting a little bit noisy and off topic.

@Neme12
Copy link

Neme12 commented Mar 16, 2024

You're right, I'm sorry. I'll hide my comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-TypeSystem-coreclr enhancement Product code improvement that does NOT require public API changes/additions
Projects
None yet
Development

No branches or pull requests