Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using SortedList instead of Dictionary in HttpHeaders #62846

Closed
geoffkizer opened this issue Dec 15, 2021 · 15 comments · Fixed by #62981
Closed

Consider using SortedList instead of Dictionary in HttpHeaders #62846

geoffkizer opened this issue Dec 15, 2021 · 15 comments · Fixed by #62981
Assignees
Labels
area-System.Net.Http enhancement Product code improvement that does NOT require public API changes/additions tenet-performance Performance related issue
Milestone

Comments

@geoffkizer
Copy link
Contributor

The Dictionary we allocate in HttpHeaders for the headers table seems to use a significant amount of memory. Since the headers table itself is typically not that large, it seems reasonable to use SortedList instead and trade off reduced memory usage for a small amount of added CPU cost on update and retrieval -- I doubt this added CPU cost is significant, whereas the memory savings may be significant.

@geoffkizer geoffkizer added this to the Future milestone Dec 15, 2021
@ghost
Copy link

ghost commented Dec 15, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

The Dictionary we allocate in HttpHeaders for the headers table seems to use a significant amount of memory. Since the headers table itself is typically not that large, it seems reasonable to use SortedList instead and trade off reduced memory usage for a small amount of added CPU cost on update and retrieval -- I doubt this added CPU cost is significant, whereas the memory savings may be significant.

Author: geoffkizer
Assignees: -
Labels:

area-System.Net.Http

Milestone: Future

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Dec 15, 2021
@MihaZupan
Copy link
Member

This may require a custom SortedList implementation (or improvements to the existing collection).
Using the existing implementation as-is would likely be worse (for example it boxes the enumerator)

@geoffkizer
Copy link
Contributor Author

Well that's unfortunate. Aside from boxing the enumerator, are there other known issues?

@MihaZupan
Copy link
Member

An insertion has O(n) runtime, so I think that disqualifies this type of collection from a DOS security perspective as well.

@geoffkizer
Copy link
Contributor Author

An insertion has O(n) runtime, so I think that disqualifies this type of collection from a DOS security perspective as well.

I'm not sure it matters. We limit the size of the response headers that we will accept, so there's a bound on the total number of headers we will add. While O(n^2) isn't great, it's not necessarily a DOS as long as n is smallish.

@geoffkizer
Copy link
Contributor Author

Another alternative here might be to simply have an unsorted list of key value pairs. This has cheap insertion but makes both lookup and removal expensive. That might not be a bad tradeoff though. And if we did it this way, we could actually preserve the original header ordering, which seems like a nice benefit.

For stuff like YARP we would want to just enumerate the raw headers and not worry about grouping by header name, which means that "raw enumeration" for these sorts of scenarios would be cheap as well. Unfortunately we defined the NonValidated collection to group by header name as well...

@MihaZupan
Copy link
Member

While O(n^2) isn't great, it's not necessarily a DOS as long as n is smallish.

Right, but as of right now n could still reach a few thousand by default which isn't ideal.

@Clockwork-Muse
Copy link
Contributor

An insertion has O(n) runtime, so I think that disqualifies this type of collection from a DOS security perspective as well.

.... do we insert headers singly, or as a range?

@danmoseley
Copy link
Member

Using the existing implementation as-is would likely be worse (for example it boxes the enumerator)

Are you referring to the non generic SortedList? SortedList<K,V> should not do this.

@MihaZupan
Copy link
Member

The generic as well. The enumerator is a struct, but the type is not publicly exposed so the GetEnumerator returns an IEnumerator.

public IEnumerator<KeyValuePair<TKey, TValue>> GetEnumerator()

@karelz karelz added enhancement Product code improvement that does NOT require public API changes/additions tenet-performance Performance related issue and removed untriaged New issue has not been triaged by the area owner labels Dec 16, 2021
@karelz
Copy link
Member

karelz commented Dec 16, 2021

Triage:

  • Would be nice to save some memory if we don't hurt CPU too much
  • It may help with order of headers - we had few issues in the space

Further design discussion:

  • Idea: Could we have a list/array as backing field and create something like Dictionary for string lookups? (they are rare)
  • In extreme case we might want to add another API, which would return multiple headers (not happening today), unsorted
  • Or perhaps just add another IEnumerable<string, string>

@geoffkizer
Copy link
Contributor Author

Similarly, ConcurrentDictionary<K, V> boxes its enumerator: https://docs.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentdictionary-2.getenumerator?view=net-6.0

I noticed this in the same trace. It's a small impact in this case because we only enumerate the concurrent dictionary when scavenging connections.

I'm not sure why these collections were implemented so that their enumerators must be boxed. Is this simply an oversight? Can we fix it? It's a breaking change, but a very small one...

@gfoidl
Copy link
Member

gfoidl commented Dec 16, 2021

For a dictionary approach maybe dotnet/aspnetcore#31360 is also of interest (the AdaptiveCapacityDictionary).

@stephentoub
Copy link
Member

It's a breaking change, but a very small one...

It's a binary breaking change. I don't think it's that small. Anyone enumerating one of these today would fail at run time.

Similarly, ConcurrentDictionary<K, V> boxes its enumerator:

#25448

@geoffkizer
Copy link
Contributor Author

It's a binary breaking change. I don't think it's that small. Anyone enumerating one of these today would fail at run time.

Makes sense, thanks for clarifying.

@MihaZupan MihaZupan self-assigned this Dec 18, 2021
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Dec 18, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 20, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Feb 19, 2022
@karelz karelz modified the milestones: Future, 7.0.0 Apr 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Http enhancement Product code improvement that does NOT require public API changes/additions tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants