proposal: utf8.RuneIndexToByteIndex() #31879

MMulthaupt · 2019-05-07T09:28:55Z

What version of Go are you using (`go version`)?

1.12.4

Does this issue reproduce with the latest release?

Yes.

What did you expect to see?

A function in the utf8 package which, for a given string and rune index, returns the byte index of that rune in the string.

What did you see instead?

No such function.

What did you try to fix it?

Write my own implementation, like so:

func RuneIndexToByteIndex(s string, runeIndex int) int {
	currentRuneIndex := 0
	for i := range s {
		if currentRuneIndex == runeIndex {
			return i
		}
		currentRuneIndex++
	}
	if currentRuneIndex == runeIndex {
		return len(s)
	}
	return -1
}

Additional comments

YES, this IS a wasteful way to do it. However, it can be part of idiomatic code. e.g.:

func ShortString(s string, leadingCount int, trailingCount int) string {
	ellipsis := "..."
	maxLen := leadingCount + trailingCount + len([]rune(ellipsis))
	runeCount := len([]rune(s))
	if runeCount > maxLen {
		firstByteIndex := RuneIndexToByteIndex(s, leadingCount)
		omitByteIndex := RuneIndexToByteIndex(s, runeCount-trailingCount)
		s = s[:firstByteIndex] + ellipsis + s[omitByteIndex:]
	}
	return s
}

Other constructs are possible, such as retrieving a slice of indices. Would like to hear some thoughts on this.

The text was updated successfully, but these errors were encountered:

mvdan · 2019-05-07T09:50:27Z

Usually, we only add functions to packages like utf8 if they're very commonly needed, or if they are tricky to implement correctly. Does this fall into either category?

I've never needed this function, and if I did, you yourself show that it can be implemented in under ten lines. So it seems to me like it's not necessary to add it to the standard library.

Also, any reason why we should have RuneIndexToByteIndex and not ByteIndexToRuneIndex?

MMulthaupt · 2019-05-07T12:53:14Z

You make a fair point. It is easy to implement correctly. But then again, omitting it supports the spread of misc packages and such in people's projects. (Which I am guilty of myself – part of a different, much larger issue)

I don't feel qualified to make a decision here, and would prefer to see some more people chew on this.

rsc · 2019-05-07T20:17:02Z

Although your example use is not in a loop, if this existed, inevitably people would use it inside loops processing the entire string. And in that context, the overall loop would then run in quadratic time, since there would be N calls (N = len(s)) and as you get further into the string each one would take longer and longer, requiring N/2 time on average. So overall you'd get a loop that runs in N^2 time. We work very hard to avoid making this kinds of accidents easy. They are already too easy in general. (See the excellent https://accidentallyquadratic.tumblr.com/ blog.)

gopherbot added this to the Proposal milestone May 7, 2019

gopherbot added the Proposal label May 7, 2019

rsc closed this as completed May 7, 2019

golang locked and limited conversation to collaborators May 6, 2020

gopherbot added the FrozenDueToAge label May 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: utf8.RuneIndexToByteIndex() #31879

proposal: utf8.RuneIndexToByteIndex() #31879

MMulthaupt commented May 7, 2019

mvdan commented May 7, 2019

MMulthaupt commented May 7, 2019

rsc commented May 7, 2019

proposal: utf8.RuneIndexToByteIndex() #31879

proposal: utf8.RuneIndexToByteIndex() #31879

Comments

MMulthaupt commented May 7, 2019

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What did you expect to see?

What did you see instead?

What did you try to fix it?

Additional comments

mvdan commented May 7, 2019

MMulthaupt commented May 7, 2019

rsc commented May 7, 2019

What version of Go are you using (`go version`)?