Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer ExactSpelling=true on [DllImport] for known APIs #35695

Closed
Tracked by #57797
elinor-fung opened this issue May 1, 2020 · 24 comments
Closed
Tracked by #57797

Prefer ExactSpelling=true on [DllImport] for known APIs #35695

elinor-fung opened this issue May 1, 2020 · 24 comments
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.InteropServices code-analyzer Marks an issue that suggests a Roslyn analyzer help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@elinor-fung
Copy link
Member

elinor-fung commented May 1, 2020

On Windows, setting ExactSpelling=true on the DllImport attribute prevents the runtime from looking for alternate function names (suffix based on CharSet), giving a slight performance benefit by avoiding this search.

There is a fair amount of nuance associated with the behaviour of ExactSpelling and under which scenarios there would actually be a performance benefit. In order to avoid noise, this rule can be scoped to check for a database of Win32 APIs that use the suffixes and provide a warning and fix for those cases where it matters.

// Flag ExactSpelling=false
[DllImport("kernel32.dll", CharSet=CharSet.Unicode)]
private static extern bool SetEnvironmentVariableW(string name, string? value);

// OK - not in database of APIs
[DllImport("MyLibrary.dll", CharSet=CharSet.Unicode)]
private static extern void Foo();

Recommend:

[DllImport("kernel32.dll", CharSet=CharSet.Unicode, ExactSpelling = true)]
private static extern bool SetEnvironmentVariableW(string name, string? value);

Category: Performance
Default: Enabled

cc @terrajobst @stephentoub @AaronRobinsonMSFT @jkoritzinsky

@elinor-fung elinor-fung added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime.InteropServices code-analyzer Marks an issue that suggests a Roslyn analyzer labels May 1, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label May 1, 2020
@stephentoub
Copy link
Member

stephentoub commented May 1, 2020

Is there any concern here that someone following this guidance will introduce bugs, e.g. because they were actually relying on the behavior? Can an auto-fixer do more than just add "ExactSpelling = true", e.g. would it be able to update the function name appropriately in case adding ExactSpelling would be introducing a bug?

I expect we have a bunch of places where we don't follow this guidance in dotnet/runtime. What does a fix look like there, and how much overhead should we expect to see removed? If it's meaningful enough to add an analyzer for, it would more so seem to be something we should fix in dotnet/runtime for .NET 5.

@AaronRobinsonMSFT
Copy link
Member

Is there any concern here that someone following this guidance will introduce bugs, e.g. because they were actually relying on the behavior?

The biggest one is for Win32 APIs that use the A/W suffix. Detecting that might be possible, but the heuristic would be fragile. We could have an exhaustive Win32 API list where this applies. I haven't observed new Win32 APIs applying these suffixes anymore so it might actually be stable.

@EgorBo
Copy link
Member

EgorBo commented May 1, 2020

I believe there is an issue somewhere to turn off A/W probing for non-windows OSs so ExactSpelling=true will be just a noise for them.

@elinor-fung
Copy link
Member Author

Good call about non-Windows - certainly would not want that noise.

Looking more at how we look for an entry point, it does have a bit more subtlety than I had realized. On Windows, if ExactSpelling=false and:

  • CharSet=CharSet.Unicode, we always look for both the specified name and the name with the W suffix
    • Even if the specified name is found as an exact match to an entry point, we look for the W suffix. If found, the W suffix function is used.
    • It may not be clear to the user that they are actually relying on the suffix-appending behaviour, so ExactSpelling=true, would also require adding the W suffix,
  • CharSet=CharSet.Ansi, we look for the specified name (no suffix) first and only try the suffix if it is not found
    • If the specified name is already the exact entry point name, setting ExactSpelling=true wouldn't make any difference.
    • Setting ExactSpelling=true would only matter if the user is actually relying on the suffix-appending behaviour - and it would require updating to add the A suffix.

Given the above, I'm a little less convinced that this should warrant a rule/fixer.

Rule or not, I'd agree we should do a pass over dotnet/runtime. Aside from places that just don't need the suffix behaviour, I also see some places where functions explicitly have the W suffix in the entry point name (e.g. FooW), but don't set ExactSpelling=true, so the runtime always looks for a duplicated suffix (e.g. FooWW)...

@jkotas
Copy link
Member

jkotas commented May 5, 2020

Given the above, I'm a little less convinced that this should warrant a rule/fixer.

You build the database of OS APIs and do the rule/fixer for those only.

@elinor-fung elinor-fung changed the title Prefer ExactSpelling=true on [DllImport] Prefer ExactSpelling=true on [DllImport] for known APIs May 27, 2020
@AaronRobinsonMSFT AaronRobinsonMSFT removed the untriaged New issue has not been triaged by the area owner label Jun 8, 2020
@AaronRobinsonMSFT AaronRobinsonMSFT added this to the 5.0 milestone Jun 8, 2020
@elinor-fung elinor-fung added api-ready-for-review and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Jun 26, 2020
@carlossanlop carlossanlop added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-ready-for-review labels Jul 6, 2020
@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jul 16, 2020
@terrajobst
Copy link
Member

terrajobst commented Jul 16, 2020

Video

Looks good as proposed.

@elinor-fung elinor-fung modified the milestones: 5.0.0, 6.0.0 Jul 28, 2020
@Mrnikbobjeff
Copy link

Which dll's would we want in that api? I generated a brief list with the dumpbin tool for user32 and kernel32-dll - are there any others which should be added for now?

@AaronRobinsonMSFT
Copy link
Member

AaronRobinsonMSFT commented Sep 3, 2020

@Mrnikbobjeff advapi32.dll and perhaps kernelbase.dll too?

I think the easiest way to do this is leverage dumpbin /exports on all DLLs in System32 and SysWoW64 and find those similar APIs that end with both a A and W.

Cases:
XxxxW exists AND XxxxA exists, then add DLL and both exports to database.
XxxxW exists but XxxxA doesn't exist, ignore.
XxxxA exists but XxxxW doesn't exist, ignore.

@jkotas
Copy link
Member

jkotas commented Sep 3, 2020

I would think we want to have all entry-points in the database. Otherwise, we cannot tell the difference between an API that does not have W suffix and API that is missing in the database (e.g. available in new OS version only).

@AaronRobinsonMSFT
Copy link
Member

I would think we want to have all entry-points in the database.

As in, only have Xxxx and not XxxxA and XxxxW?

@jkotas
Copy link
Member

jkotas commented Sep 3, 2020

I think the database should have the full snapshot of the exports, ie all Xxxx, XxxxA, XxxxW if they are present.

When we see "Xxxx" in the entrypoint, the ExactSpelling autofixer needs to decide whether to suggest:

  • "XxxxW" + ExactSpelling=true. This should be suggested when we see "XxxxW" in the database.
  • "Xxxx" + ExactSpelling=true. This should be suggested when we see "Xxxx" and no "XxxxW" in the database
  • Leave the API alone. This should be the case when neither "Xxxx" nor "XxxxW" is in the database

@AaronRobinsonMSFT
Copy link
Member

@jkotas I agree with all of this. I wasn't trying to dictate design merely indicating a possible collection of exports for how to think about what the analyzer needs to "know". I believe the logic you laid out is properly described in the issue.

@Mrnikbobjeff Please let us know if something is unclear with @jkotas's statement and with the issue details.

@Mrnikbobjeff
Copy link

Mrnikbobjeff commented Sep 3, 2020

@AaronRobinsonMSFT thank you for the details. The last thing unclear to me is where to store the resulting database. Keeping it in memory seems kind of wasteful as the symbol list contains approximately 10k entries for the four explicitly listed dll's alone. Is there infrastructure already in place for these kind of databases?

@AaronRobinsonMSFT
Copy link
Member

@Mrnikbobjeff I think generating a series of constant data structures in C# will work just fine. The term "database" here wasn't to imply we have a real SQL-esque or SQLite database file somewhere. The solution could also be a simple text file that ships with the analyzer - choose the format (JSON, XML, etc).

@Mrnikbobjeff
Copy link

Mrnikbobjeff commented Sep 4, 2020

I thought about it a little. At first I wrote a little utility to do what @jkotas wanted to scan all dll files in System32 which resulted in 120k methods. I tried storing it in a Dictionary<string (dll), HashSet (methods)> dictionary generated by the new generator feature, but the compile time for the resulting file is abysmal, I stopped the process after it took thrice the time required to compile the entire solution on my machine without completing. I settled on initializing the Dictionary on first access, which performs okay. I also thought it would be wise to include the entrypoint parameter into the analysis if it is present. Perhaps this is also the place to edit the Entrypoint name in case we change it?
Edit: There are only 650 methods for which this applies, perhaps we could reconsider not adding all the clutter?

@jkotas
Copy link
Member

jkotas commented Sep 4, 2020

scan all dll files in System32 which resulted in 120k methods.

There are many private APIs in System32 .dlls. I agree that having database of everything in System32 does not sound right.

There are only 650 methods for which this applies, perhaps we could reconsider not adding all the clutter?

Do you mean that there are only 650 methods with W or A suffixes? If we had just these methods in the database, the analyzer may recommend false positives if there are new Windows APIs with W or A suffixes introduced. Maybe that's ok.

@Mrnikbobjeff
Copy link

Mrnikbobjeff commented Sep 4, 2020

There are only 650 methods which have a similar method with the same name + "W" in the same assembly. There are many more instances where there are methods which have both a W and A suffix but no base method without the suffix. These are currently not counted and included

@riverar
Copy link
Contributor

riverar commented Feb 5, 2021

@Mrnikbobjeff Are you using https://github.com/microsoft/win32metadata as opposed to manually scanning PE files?

@Mrnikbobjeff
Copy link

@riverar I used a small utility exe I wrote to parse dumpbin output for all dll files in the System32 directory. I uploaded the gist but it really is just a hacked together script and nothing production ready :)

@riverar
Copy link
Contributor

riverar commented Feb 26, 2021

@Mrnikbobjeff Thanks, I think my concern when I originally asked was that we should be using win32metadata as the authoritative source and not any ad-hoc scans. Scanning System32 is not a great idea due to windows componentization, you'll definitely miss something.

@jkoritzinsky
Copy link
Member

jkoritzinsky commented Jul 13, 2021

@Mrnikbobjeff how are you doing with this work? Do you need any assistance?

@carlossanlop
Copy link
Member

@Mrnikbobjeff I'll unassign this issue and move it to 8.0. If you still want to work on it, let us know.

@carlossanlop carlossanlop added the help wanted [up-for-grabs] Good issue for external contributors label Jul 6, 2022
@carlossanlop carlossanlop modified the milestones: 7.0.0, 8.0.0 Jul 6, 2022
@jkotas
Copy link
Member

jkotas commented Jul 6, 2022

This was superseded by the interop source generator and LibraryImportAttribute that sets ExactSpelling=true by default.

@AaronRobinsonMSFT @jkoritzinsky Should we close this?

@AaronRobinsonMSFT
Copy link
Member

I think so. I'm of the opinion that for .NET 7+, any analyzers for DllImport, unless security sensitive, should be retargeted to LibraryImport.

@ghost ghost locked as resolved and limited conversation to collaborators Aug 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.InteropServices code-analyzer Marks an issue that suggests a Roslyn analyzer help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

No branches or pull requests