Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

c# feature: new method AsSpan() on RepeatedField<T> #6538

Closed
wants to merge 1 commit into from

Conversation

prat0088
Copy link
Contributor

Provides very efficient access to the underlying data at the expense of foregoing null checks.

#6217

Provides very efficient access to the underlying data at the expense of foregoing null checks.
@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it!) and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@prat0088
Copy link
Contributor Author

@googlebot I signed it!

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@jtattermusch
Copy link
Contributor

This seems as an overkill. What is the use case where using the indexer is not sufficient? I don't thing we should add an extra API (that will need to be supported forever) unless this is a common use case that will be used by many.

@jtattermusch jtattermusch self-assigned this Aug 25, 2019
@jtattermusch jtattermusch self-requested a review August 25, 2019 13:51
@prat0088
Copy link
Contributor Author

prat0088 commented Aug 26, 2019

This seems as an overkill. What is the use case where using the indexer is not sufficient?

Bulk operations. My use case is adding lots of primitives to repeated fields to construct a response.
In the ballpark of 7.5 million values across 500 repeated fields. 15k per field.

RepeatedField's indexer is 12x slower than a raw array, and 4.5x slower than Span.

| Raw array | 1.3ms |
| Span | 3.3ms |
| RepeatedField Indexer | 15.5ms |

On a related note, I need to look further into whether or not this will actually work with C# 6. I ran into some errors creating the benchmark and needed to update the language to 7, which the maintainers aren't ready to do yet.

/// Creates a new span over the RepeatedField. This is the most
/// efficient way to read and write values to the RepeatedField because
/// there are no internal null or bounds checks. You are responsible
/// for ensuring no null values are written to the span.
Copy link
Contributor

@jtattermusch jtattermusch Aug 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the lack of null check is dangerous and relying on users to always do the right thing is also dangerous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ allows you to memcpy into RepeatedField. But that's C++. I suppose C# users expect the null checks and such.

/// for ensuring no null values are written to the span.
/// </summary>
/// <returns>Span representation of the RepeatedField</returns>
public Span<T> AsSpan()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say I wanted to populate the repeated field with 1 million values. How do I resize the empty RepeatedField to the right size using the public API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have another PR for that, but I will need to modify it slightly. It preallocates the array but doesn't increase count, so, as you point out, it isn't going to work. It will still be nice to have for reads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to first know what it the full API you're proposing. Isolated like this, it doesn't make much sense.
Btw, we just merged #6317, so this PR might be out of date.

@jtattermusch
Copy link
Contributor

This seems as an overkill. What is the use case where using the indexer is not sufficient?

Bulk operations. My use case is adding lots of primitives to repeated fields to construct a response.
In the ballpark of 7.5 million values across 500 repeated fields. 15k per field.

RepeatedField's indexer is 12x slower than a raw array, and 4.5x slower than Span.

| Raw array | 1.3ms |
| Span | 3.3ms |
| RepeatedField Indexer | 15.5ms |

On a related note, I need to look further into whether or not this will actually work with C# 6. I ran into some errors creating the benchmark and needed to update the language to 7, which the maintainers aren't ready to do yet.

Would an optimized version of AddRange() help you then? (have you tried using AddRange()?)

@Falco20019
Copy link
Contributor

I would also like to have support for Span<T>. I sometimes have to get ranges which is currently not easily possible with RepeatedField<T>. I understand that it should not be possible to manipulate it, so I would suggest offering ReadOnlySpan<T> instead.

public ReadOnlySpan<T> this[Range range]
{
	get
	{
		(int start, int length) = range.GetOffsetAndLength(array.Length);
		return new ReadOnlySpan<T>(array, start, length);
	}
}

@jtattermusch
Copy link
Contributor

This PR is out of date now that #7351 has been merged.

@jtattermusch
Copy link
Contributor

After a long time, there's still some unsolved issues with this PR (e.g. it cannot ensure that the values won't become null, there's no way of setting size when filling a repeated field, should we only allow read only access and provide a different way for populating the contents, etc.).
For these reason, I'm going to close this PR.
I'm not in principle opposed to providing a high-performance bulk accessor, but all the design questions need to be answered before we can add one. We cannot add a half-baked API.

@jtattermusch
Copy link
Contributor

jtattermusch commented May 18, 2020

Btw, a RepeatedField has a AddRange(IEnumerable method which can be used for efficiently bulk-filling a repeated field (if you know the size in advance).

Also note that exposing the data as read-only (ReadOnlySpan<T>) addresses some concerns over the safety, but there are still some potential issues - e.g. the returned span is bound to the current array, but if a repeated field gets resized, a new internal array is allocated, so the span will point to a dead array and all the changes will be lost.

One idea for safe span-based initialization:
We could use a similar trick that is used for Creating a string from a Span:
https://docs.microsoft.com/en-us/dotnet/api/system.string.create?view=netcore-3.1
e.g. with a "Populate" method like this: Populate(int length, TState state, System.BuffersSpanAction<T, TState> action). (when called, it sets the desired size internally and gives you temporary access to the contents - it still doesn't solve the null check problem though).
Similar trick could also work for bulk reading the contents of a repeated field, but doing that safely. (but then again, if you're doing a bulk read, why can't you just enumerate over the values? - GetEnumerator should be really fast)

@nietras
Copy link

nietras commented Mar 1, 2021

@jtattermusch has this PR stranded completely? What's the status?

I have issues with RepeatedField API design and perf too, and couldn't understand why this was not supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants