Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: utf8.RuneIndexToByteIndex() #31879

Closed
MMulthaupt opened this issue May 7, 2019 · 3 comments
Closed

proposal: utf8.RuneIndexToByteIndex() #31879

MMulthaupt opened this issue May 7, 2019 · 3 comments

Comments

@MMulthaupt
Copy link

What version of Go are you using (go version)?

1.12.4

Does this issue reproduce with the latest release?

Yes.

What did you expect to see?

A function in the utf8 package which, for a given string and rune index, returns the byte index of that rune in the string.

What did you see instead?

No such function.

What did you try to fix it?

Write my own implementation, like so:

func RuneIndexToByteIndex(s string, runeIndex int) int {
	currentRuneIndex := 0
	for i := range s {
		if currentRuneIndex == runeIndex {
			return i
		}
		currentRuneIndex++
	}
	if currentRuneIndex == runeIndex {
		return len(s)
	}
	return -1
}

Additional comments

YES, this IS a wasteful way to do it. However, it can be part of idiomatic code. e.g.:

func ShortString(s string, leadingCount int, trailingCount int) string {
	ellipsis := "..."
	maxLen := leadingCount + trailingCount + len([]rune(ellipsis))
	runeCount := len([]rune(s))
	if runeCount > maxLen {
		firstByteIndex := RuneIndexToByteIndex(s, leadingCount)
		omitByteIndex := RuneIndexToByteIndex(s, runeCount-trailingCount)
		s = s[:firstByteIndex] + ellipsis + s[omitByteIndex:]
	}
	return s
}

Other constructs are possible, such as retrieving a slice of indices. Would like to hear some thoughts on this.

@gopherbot gopherbot added this to the Proposal milestone May 7, 2019
@mvdan
Copy link
Member

mvdan commented May 7, 2019

Usually, we only add functions to packages like utf8 if they're very commonly needed, or if they are tricky to implement correctly. Does this fall into either category?

I've never needed this function, and if I did, you yourself show that it can be implemented in under ten lines. So it seems to me like it's not necessary to add it to the standard library.

Also, any reason why we should have RuneIndexToByteIndex and not ByteIndexToRuneIndex?

@MMulthaupt
Copy link
Author

You make a fair point. It is easy to implement correctly. But then again, omitting it supports the spread of misc packages and such in people's projects. (Which I am guilty of myself – part of a different, much larger issue)

I don't feel qualified to make a decision here, and would prefer to see some more people chew on this.

@rsc
Copy link
Contributor

rsc commented May 7, 2019

Although your example use is not in a loop, if this existed, inevitably people would use it inside loops processing the entire string. And in that context, the overall loop would then run in quadratic time, since there would be N calls (N = len(s)) and as you get further into the string each one would take longer and longer, requiring N/2 time on average. So overall you'd get a loop that runs in N^2 time. We work very hard to avoid making this kinds of accidents easy. They are already too easy in general. (See the excellent https://accidentallyquadratic.tumblr.com/ blog.)

@rsc rsc closed this as completed May 7, 2019
@golang golang locked and limited conversation to collaborators May 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants