Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbounded string fields should be declared with [] instead of [0...0] bounds #912

Closed
timsneath opened this issue May 1, 2022 · 22 comments
Closed
Assignees

Comments

@timsneath
Copy link
Contributor

timsneath commented May 1, 2022

A number of structs in SetupAPI.h contain string fields of arbitrary size, for example:

typedef struct _SP_DEVICE_INTERFACE_DETAIL_DATA_W {
    DWORD  cbSize;
    WCHAR  DevicePath[ANYSIZE_ARRAY];
} SP_DEVICE_INTERFACE_DETAIL_DATA_W, *PSP_DEVICE_INTERFACE_DETAIL_DATA_W;

While this is projected "correctly", since ANYSIZE_ARRAY is defined to 1, as the documentation notes, this is merely a limitation of the ability for C to accurately declare an indeterminately-sized array. See https://docs.microsoft.com/en-us/windows/win32/api/setupapi/ns-setupapi-sp_device_interface_detail_data_w.

There are a few other structs with a similar formulation, at least in SetupAPI.h: SP_INF_INFORMATION and SP_DRVINFO_DETAIL_DATA_W are two examples.

Per §I.8.9.1 "Array types" of ECMA-335, these can simply be declared as unbounded. That is, instead of:

.field public char[0...0] DevicePath

we should declare this field as:

.field public char[] DevicePath

The distinction is important, since otherwise a language projection cannot distinguish between an unbounded array and a single-item array.

@taufikrh
Copy link

taufikrh commented May 2, 2022

Unfortunately, metadata only storing rank and size of array. I think, such declaration should be done in projections.
See
II.23.2.12 Type
II.23.2.13 ArrayShape

@mikebattista
Copy link
Contributor

Unfortunately, metadata only storing rank and size of array. I think, such declaration should be done in projections. See II.23.2.12 Type II.23.2.13 ArrayShape

@kennykerr @AArnott @sotteson1 for their thoughts.

@kennykerr
Copy link
Contributor

I agree with @timsneath that we should be able to distinguish between unbounded and fixed size array fields.

@mikebattista
Copy link
Contributor

This is what ClangSharp emits.

    public unsafe partial struct SP_DEVICE_INTERFACE_DETAIL_DATA_W
    {
        [NativeTypeName("DWORD")]
        public uint cbSize;

        [NativeTypeName("WCHAR[1]")]
        public fixed ushort DevicePath[1];
    }

Is that enough information to identify this case?

@AArnott
Copy link
Member

AArnott commented Mar 14, 2023

CsWin32 plans to interpret 0 and 1 length arrays as variable length. It would be great if the metadata were consistent on the length (0 vs 1), but CsWin32 can certainly be made to detect either one.

@tim-weis
Copy link
Contributor

Is that enough information to identify this case?

Not strictly, no. The one true discriminator is the ANYSIZE_ARRAY name, but that doesn't survive preprocessing. A good heuristic might be to check for an array of size one, where that array is also the final field in a structure.

As a workaround to preserving the additional semantics of ANYSIZE_ARRAY (which is #define ANYSIZE_ARRAY 1'd in winnt.h) could be to #define ANYSIZE_ARRAY 0 while preprocessing, and transforming resulting zero-length arrays back into unbounded arrays in a subsequent processing stage (it is my understanding that zero-length arrays aren't legal even in C99, so there shouldn't be any in the headers).

Though, honestly, given the reach and impact of this repo's artifacts, I'd much like to see it move closer to tooling that better persists vital information from the SDK header files.

@ChrisDenton
Copy link
Contributor

Zero length arrays are not legal in ISO C but they are allowed in MSVC C (and gcc) at the end of a struct. Still, I'd be surprised if they appear in the SDK headers.

@tim-weis
Copy link
Contributor

I'd be surprised if they appear in the SDK headers.

That was my thought, too. And then I went looking... There are, in d3dkmthk.h, mstcpip.h, nvme.h, and others. So that's that then.

@riverar
Copy link
Collaborator

riverar commented Mar 16, 2023

Zero length arrays are legal in C99+ as a flexible array member (pg. 103, ¶16) at the end of a structure, such as:

typedef struct _D3DKMT_DISPLAYMODELIST
{
    D3DDDI_VIDEO_PRESENT_SOURCE_ID  VidPnSourceId;
    UINT                            ModeCount;
    D3DKMT_DISPLAYMODE              pModeList[0];
} D3DKMT_DISPLAYMODELIST;

@tim-weis
Copy link
Contributor

@riverar I'm pretty sure the syntax for flexible array members is [], not [0]. Not that it makes a difference with the SDK headers making use of the non-standard [0] extension.

@mikebattista
Copy link
Contributor

I can get these to output [0...-1] if I change the C# definitions from public fixed ushort DevicePath[1]; to public ushort DevicePath[0];. Will that work?

C#

public struct SP_DEVICE_INTERFACE_DETAIL_DATA_W
{
	public uint cbSize;

	public char[] DevicePath;
}

IL

.class public sequential ansi sealed beforefieldinit Windows.Win32.Devices.DeviceAndDriverInstallation.SP_DEVICE_INTERFACE_DETAIL_DATA_W
	extends [netstandard]System.ValueType
{
	.custom instance void [Windows.Win32.winmd]Windows.Win32.Foundation.Metadata.StructSizeFieldAttribute::.ctor(string) = (
		01 00 06 63 62 53 69 7a 65 00 00
	)
	.pack 1
	.size 0

	// Fields
	.field public uint32 cbSize
	.field public char[0...-1] DevicePath

} // end of class Windows.Win32.Devices.DeviceAndDriverInstallation.SP_DEVICE_INTERFACE_DETAIL_DATA_W

If I try to change the C# to public ushort DevicePath[]; I get a ? for the type of the field and IL parse errors for InvalidCompressedInteger.

C#

public struct SP_DEVICE_INTERFACE_DETAIL_DATA_W
{
	public uint cbSize;

	public ? DevicePath;
}

IL

System.BadImageFormatException: Invalid compressed integer.
   at System.Reflection.Throw.InvalidCompressedInteger()
   at System.Reflection.Metadata.BlobReader.ReadCompressedInteger()
   at System.Reflection.Metadata.Ecma335.SignatureDecoder`2.DecodeType(BlobReader& blobReader, Boolean allowTypeSpecifications)
   at System.Reflection.Metadata.FieldDefinition.DecodeSignature[TType,TGenericContext](ISignatureTypeProvider`2 provider, TGenericContext genericContext)
   at ICSharpCode.Decompiler.Disassembler.ReflectionDisassembler.DisassembleFieldHeaderInternal(PEFile module, FieldDefinitionHandle handle, MetadataReader metadata, FieldDefinition fieldDefinition) in /_/ICSharpCode.Decompiler/Disassembler/ReflectionDisassembler.cs:line 1327
   at ICSharpCode.Decompiler.Disassembler.ReflectionDisassembler.DisassembleField(PEFile module, FieldDefinitionHandle handle) in /_/ICSharpCode.Decompiler/Disassembler/ReflectionDisassembler.cs:line 1252
   at ICSharpCode.ILSpy.TextView.DecompilerTextView.DecompileNodes(DecompilationContext context, ITextOutput textOutput)
   at ICSharpCode.ILSpy.TextView.DecompilerTextView.<>c__DisplayClass51_0.<DecompileAsync>b__0()

A simplified repro branch is at https://github.com/microsoft/win32metadata/tree/mikebattista/flexiblearrays.

Below is where we can rewrite the C# if we detect fields with [0] or [1]. For now I'm manually removing fixed from the generated C#. We'd need to add that logic here if it's correct. With fixed, the compiler fails expecting an explicit length.

// Assume [0] or [1] (ANYSIZE_ARRAY) indicates flexible arrays and convert to [].
if (node.Declaration.Variables.First().ArgumentList is BracketedArgumentListSyntax bracketedArgumentList)
{
if (bracketedArgumentList.Arguments.ToString() == "1" || bracketedArgumentList.Arguments.ToString() == "0")
{
var variable = node.Declaration.Variables.First().WithArgumentList(SyntaxFactory.ParseBracketedArgumentList("[]"));
node = node.WithDeclaration(node.Declaration.WithVariables(SyntaxFactory.SingletonSeparatedList(variable)));
return node;
}
}

We also have an opportunity to rewrite the array field signature when writing to the winmd. We can do that here. The knobs are provided by https://learn.microsoft.com/dotnet/api/system.reflection.metadata.ecma335.arrayshapeencoder.shape.

Everything I've tried so far that compiles emits [0...-1]. Let me know if you have ideas to emit something different.

@kennykerr
Copy link
Contributor

While I agree that we should ideally be able to identify these in metadata, I don't think actually changing the definition is appropriate. Add an attribute if needed to indicate these may be unbounded, but if the original struct has a field with a ANYSIZE_ARRAY array size then it must remain that size. That's important since that affects the overall size of the structure, which many APIs and coding patterns depend on for stack packing, padding, and alignment, as well as explicit API version checks.

@mikebattista
Copy link
Contributor

We could add an attribute to all fields that have a NativeTypeName that ends with [0].

I'm not sure how to reliably automate [1] since there are plenty of [1] fields that behave like ANYSIZE_ARRAY but hardcode 1 rather than use ANYSIZE_ARRAY, and there are legitimate cases of 1-length arrays. Unless we want to assume [1] equals a flexible array and then manually remove the attribute from cases where it shouldn't apply based on feedback. I'm not sure how pervasive legitimate 1-length arrays are but I'd hope that's much less pervasive than flexible arrays.

@mikebattista mikebattista self-assigned this Apr 17, 2023
@mikebattista
Copy link
Contributor

I implemented a fix that seems to be pretty reliable. We now apply [FlexibleArray] to the last field of a struct if its NativeTypeName ends with [0] or [1]. If it's over including anywhere we can manually remove the attribute.

[StructSizeField("cbSize")]
[SupportedArchitecture(Architecture.X64 | Architecture.Arm64)]
public struct SP_DEVICE_INTERFACE_DETAIL_DATA_W
{
	public uint cbSize;

	[FlexibleArray]
	public char[] DevicePath;
}

[StructLayout(LayoutKind.Sequential, Pack = 1)]
public struct USB_NODE_CONNECTION_INFORMATION_EX
{
	public uint ConnectionIndex;

	public USB_DEVICE_DESCRIPTOR DeviceDescriptor;

	public byte CurrentConfigurationValue;

	public byte Speed;

	public BOOLEAN DeviceIsHub;

	public ushort DeviceAddress;

	public uint NumberOfOpenPipes;

	public USB_CONNECTION_STATUS ConnectionStatus;

	[FlexibleArray]
	public USB_PIPE_INFO[] PipeList;
}

@timsneath
Copy link
Contributor Author

This is great.

@tim-weis
Copy link
Contributor

That seems to ignore the concern raised here:

If the original struct has a field with a ANYSIZE_ARRAY array size then it must remain that size. That's important since that affects the overall size of the structure, which many APIs and coding patterns depend on for stack packing, padding, and alignment, as well as explicit API version checks.

SP_DEVICE_INTERFACE_DETAIL_DATA_W is defined as

typedef struct _SP_DEVICE_INTERFACE_DETAIL_DATA_W {
  DWORD cbSize;
  WCHAR DevicePath[ANYSIZE_ARRAY];
} SP_DEVICE_INTERFACE_DETAIL_DATA_W, *PSP_DEVICE_INTERFACE_DETAIL_DATA_W;

but apparently drops the size information in the generated metadata:

[StructSizeField("cbSize")]
[SupportedArchitecture(Architecture.X64 | Architecture.Arm64)]
public struct SP_DEVICE_INTERFACE_DETAIL_DATA_W
{
	public uint cbSize;

	[FlexibleArray]
	public char[] DevicePath;
}

Is this an oversight?

@mikebattista
Copy link
Contributor

No we haven't changed the size.

@tim-weis
Copy link
Contributor

tim-weis commented May 1, 2023

I'm confused.

I just looked at the metadata 50.0.71-preview published 3 days ago, and it describes SP_DEVICE_INTERFACE_DETAIL_DATA_W as

public struct SP_DEVICE_INTERFACE_DETAIL_DATA_W
{
	public uint cbSize;

	[FlexibleArray]
	public char[] DevicePath;
}

There's no mention of the ANYSIZE_ARRAY constant that evaluates to 1 anywhere. A (hypothetical) code generator using win32metadata to generate C headers would produce a struct whose sizeof differs from sizeof(SP_DEVICE_INTERFACE_DETAIL_DATA_W) when applied to the SDK headers.

This must not happen, as explained here.

Am I just blind in not seeing where the size information is persisted, or is this a bug?

@AArnott
Copy link
Member

AArnott commented May 24, 2023

@tim-weis You're not blind. You just have your C# glasses on. :)

If you decompile the winmd to IL instead, you see the length of the array is still there:

	.field public char[0...0] DevicePath

See, in C# one cannot express an array type with a fixed length for an inline array except in very restrictive conditions. But IL can express it. So a C# decompiler just drops the detail. But it's there in the metadata for your projection.

@tim-weis
Copy link
Contributor

@AArnott That makes sense, thank you. The specific pair of C# glasses I was wearing go by the name ILSpy. Ildasm.exe did in fact produce output similar to the IL snippet you posted.

So SP_DEVICE_INTERFACE_DETAIL_DATA_W looks fine. This doesn't seem to be the case for structures that end in a literal zero-length array (such as SOCKET_SECURITY_SETTINGS_IPSEC).

Executing

ildasm /text /item:Windows.Win32.Networking.WinSock.SOCKET_SECURITY_SETTINGS_IPSEC Windows.Win32.winmd

produces the following IL:

.class public sequential ansi sealed beforefieldinit Windows.Win32.Networking.WinSock.SOCKET_SECURITY_SETTINGS_IPSEC
       extends [netstandard]System.ValueType
{
  .field public valuetype Windows.Win32.Networking.WinSock.SOCKET_SECURITY_PROTOCOL SecurityProtocol
  .field public uint32 SecurityFlags
  .field public uint32 IpsecFlags
  .field public valuetype [netstandard]System.Guid AuthipMMPolicyKey
  .field public valuetype [netstandard]System.Guid AuthipQMPolicyKey
  .field public valuetype [netstandard]System.Guid Reserved
  .field public uint64 Reserved2
  .field public uint32 UserNameStringLen
  .field public uint32 DomainNameStringLen
  .field public uint32 PasswordStringLen
  .field public char[1] AllStrings
  .custom instance void Windows.Win32.Foundation.Metadata.FlexibleArrayAttribute::.ctor() = ( 01 00 00 00 )
} // end of class Windows.Win32.Networking.WinSock.SOCKET_SECURITY_SETTINGS_IPSEC

The AllStrings field should be an array of length 0, but is described as an array of length 1. That doesn't look right, and would need to be addressed. For reference, I initially ran the above against metadata version 50.0.71-preview, but the output is identical for version 52.0.65-preview (modulo the DocumentationAttribute).

@AArnott
Copy link
Member

AArnott commented May 25, 2023

@tim-weis I use ILSpy too. It has a dropdown in the toolbar to switch from decompiling to C# or IL.
@mikebattista will have to respond to your latest observation.

@riverar
Copy link
Collaborator

riverar commented Mar 9, 2024

The AllStrings field should be an array of length 0, but is described as an array of length 1. That doesn't look right, and would need to be addressed.

Looks like this was already addressed, 59.x is showing:

.field /* 04039E80 */ public valuetype [Windows.Win32.winmd]Windows.Win32.Networking.WinSock.SOCKET_SECURITY_PROTOCOL SecurityProtocol
.field /* 04039E81 */ public uint32 SecurityFlags
.field /* 04039E82 */ public uint32 IpsecFlags
.field /* 04039E83 */ public valuetype [netstandard]System.Guid AuthipMMPolicyKey
.field /* 04039E84 */ public valuetype [netstandard]System.Guid AuthipQMPolicyKey
.field /* 04039E85 */ public valuetype [netstandard]System.Guid Reserved
.field /* 04039E86 */ public uint64 Reserved2
.field /* 04039E87 */ public uint32 UserNameStringLen
.field /* 04039E88 */ public uint32 DomainNameStringLen
.field /* 04039E89 */ public uint32 PasswordStringLen
+.field /* 04039E8A */ public char[0...0] AllStrings
.custom instance void [Windows.Win32.winmd]Windows.Win32.Foundation.Metadata.FlexibleArrayAttribute::.ctor() = (
01 00 00 00
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants