Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP2: Optimize huffman encoding static table initialization #45297

Closed
rokonec opened this issue Nov 28, 2020 · 13 comments
Closed

HTTP2: Optimize huffman encoding static table initialization #45297

rokonec opened this issue Nov 28, 2020 · 13 comments
Assignees
Labels
Milestone

Comments

@rokonec
Copy link
Contributor

rokonec commented Nov 28, 2020

Currently huffman encoding static table is compiled into ~5,5K IL bytes which is JITed into ~9.5K of ASM code.

Sharplab prove

It might be beneficial to somehow optimize above.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Net.Http untriaged New issue has not been triaged by the area owner labels Nov 28, 2020
@ghost
Copy link

ghost commented Nov 28, 2020

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Currently huffman encoding static table is currently compiled into ~5,5K IL bytes which is JITed into ~9.5K of ASM code.

Sharplab prove

It might be beneficial to somehow optimize above.

Author: rokonec
Assignees: -
Labels:

area-System.Net.Http, untriaged

Milestone: -

@rokonec
Copy link
Contributor Author

rokonec commented Nov 28, 2020

My recommendation is to create ReadonlySpan containing source data from which the encoding table will be initialized.

private static ReadOnlySpan<byte> s_encodingTableData => new byte[257*5]
{
            0b11111111, 0b11000000, 0b00000000, 0b00000000, 13,
            0b11111111, 0b11111111, 0b10110000, 0b00000000, 23,
            ....
};

private static readonly (uint code, int bitLength)[] s_encodingTable = GenerateEncodingTable();

private static (uint code, int bitLength)[] GenerateEncodingTable()
{
    // TODO: build if from s_encodingTableData bytes by converting 4 bytes into 'uint code' and 5th into 'int bitLength'
}

^ Sharplab

@scalablecory
Copy link
Contributor

@rokonec would you like to take this one?

@scalablecory
Copy link
Contributor

The strategy outlined looks good to me.

@stephentoub
Copy link
Member

stephentoub commented Nov 28, 2020

I'd suggest just starting with a private static readonly uint[] and a ReadOnlySpan<byte>, and then upgrading the former to a ReadOnlySpan<uint> once #24961 is implemented, hopefully later this release. Otherwise to do this most efficiently (e.g. reading 4 bytes at a time rather than 1) you'll likely need to complicate the code to manually handle endianness, do unsafe unaligned reads, etc.
cc: @VSadov, @jaredpar

@rokonec
Copy link
Contributor Author

rokonec commented Nov 28, 2020

@scalablecory I'd like to take this one If I could. Please assign me to it if you feel its OK.

@stephentoub stephentoub added this to the 6.0.0 milestone Nov 28, 2020
@stephentoub stephentoub added tenet-performance Performance related issue and removed untriaged New issue has not been triaged by the area owner labels Nov 28, 2020
@rokonec
Copy link
Contributor Author

rokonec commented Nov 28, 2020

@stephentoub I was thinking about something like this: https://sharplab.io/#gist:c13c371e129cb662d20d6c2d3ab0f312
No unsafe needed. Is this fragile against endianness?

Sharplab does support static constructors and initialization for Jit ASM view so the above sample is 'unstaticisied'

@stephentoub
Copy link
Member

stephentoub commented Nov 28, 2020

You can do that, but you're now incurring lots of additional bounds checks and division and the like.

[Benchmark]
public (uint code, int bitLength)[] Create1()
{
    var data = s_encodingTableData;
    var table = new (uint code, int bitLength)[data.Length / 5];

    for (int i = 0; i < s_encodingTableData.Length;)
    {
        table[i / 5] = ((uint)((data[i++] << 24) | (data[i++] << 16) | (data[i++] << 8) | data[i++]), data[i++]);
    }

    return table;
}

[Benchmark]
public (uint code, int bitLength)[] Create2()
{
    var uintData = s_encodingTableUintData;
    var byteData = EncodingTableByteData;
    var table = new (uint code, int bitLength)[uintData.Length];

    for (int i = 0; i < uintData.Length; i++)
    {
        table[i] = (uintData[i], byteData[i]);
    }

    return table;
}
Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Create1 258.2 ns 1.21 ns 1.01 ns 0.1054 - - 664 B
Create2 103.6 ns 1.00 ns 0.83 ns 0.1057 0.0002 - 664 B

That said, in retrospect this probably doesn't matter, since this code is ever going to be invoked only once. And with the uint[] approach, there will also be the initial cost to allocate and blit the contents of the array, but that would go away with #24961 .

That's all to say, what you proposed seems fine for now, but I do think we should change it when #24961 arrives. I'd even be curious to know if at that point it's necessary at all to build up this single table, rather than just indexing into each of the two spans to get the two distinct values needed by the call site. And, actually, is that necessary even now? What's the impact on perf of the call site if you just use the uint[] and ReadOnlySpan<byte> now and skip this table building entirely? i.e. if the call site just does:

uint code = s_encodingTableCodes[octet];
int bitLength = EncodingTableBitLengths[octet];

@rokonec
Copy link
Contributor Author

rokonec commented Nov 28, 2020

As for now, encoding table is only needed to generate decoding table - only once - and in unit tests.

Maybe we can consider to skip that table building and just use those two arrays (uint[] and ReadOnlySpan) directly from generate decoding table. If we decide to support huffman encoding in our HTTP2 clients later, we can optimize for it at that time.

@rokonec
Copy link
Contributor Author

rokonec commented Nov 28, 2020

I'll create PR with latest @stephentoub recommendation:

uint code = s_encodingTableCodes[octet];
int bitLength = EncodingTableBitLengths[octet];

We can iterate on it in the PR as those are really elementary changes.

@stephentoub
Copy link
Member

Great. Thanks.

@rokonec
Copy link
Contributor Author

rokonec commented Nov 28, 2020

PR #45303

@geoffkizer
Copy link
Contributor

Linked PR was merged, closing.

Please reopen/refile if there is something else to look at here.

@ghost ghost locked as resolved and limited conversation to collaborators Feb 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants