-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with splice! on array of strings #25561
Comments
For some added fun, this actually inserts an
|
The behavior reported in the OP is confusing and I can't speak to whether it's intended (I suspect not), but I'm going to call this a bug, since if nothing else, inserting an |
This happens because |
What is the correct behavior then? Failing mutating operations can have pretty much arbitrary side effects. I don't think it is possible to enforce that any mutation must be resilient to any operation potentially failing. |
I think we've had that discussion a few years ago, I think it was about when The problem is basically that once you've replaced some data, it's too late to recover it if the operation fail. With |
Relying on the state of your data after an operation to mutate it failed seems adjective, so I don't think it is worth spending too much effort thinking about it. |
It's mainly annoying at the REPL. Anyway, the problem isn't so much effort IMHO: if there was a way of handling this without a performance penalty, I think it would be worth it, but so far we haven't found a solution. |
I understand that this happens only with arrays of strings. In facts the following code works (taken from the manual):
IMHO it would be preferable for |
It already treats all types of arguments the same; it always iterates over the argument. We should not add a special case for one type. |
@lucatrv I as noted above, this works because numbers are iterable, they act as one-element collections. Non-number types other than strings also have the same problem, for example: A = [:a, :b, :c]
splice!(A, 2, :e) The fact that |
I've said a few times that I don't think The solution I've proposed before is to have a But maybe I've been thinking about this backwards. If
Text{T <:AbstractString} <: AbstractText
s::T
end I wonder, if all the (In may text processing systems text is stored as a vector of word or phrase codes. |
In such a scheme, should the |
I guess so (with the possible exception of
Agreed
How often is byte-based character indexing useful? The stuff I've written lately using byte-based indexing is only concerned with finding 7-bit syntax characters in a UTF-8 stream, all the multi-byte characters are just passed over in this kind of code. That's kind of the beauty of UTF-8. There are a whole class of problems where you can blissfully pretend that the whole world is still ASCII. I suspect that if you're doing natural language stuff in LOTEs you'll probably be working in the I guess if indexing
That wasn't my intention, I'd find it least surprising for these to each be indexed in units of its item type. I'd also be happy if these were all efficient as iterators, even if random access indexing has some degenerate cases. You can always to
What about saying |
It's the only kind of indexing into variable-width encodings that can be done in constant time, so it's kind of essential.
In that case, you were doing byte-based indexing. If you were forced to use character-based indexing then every indexing operation would be O(n) so all your string code would be slow.
Yes, but all of that has to be defined in terms of some lower-level efficient interface... which has to be based on byte indexing since otherwise it would not be efficient.
It seems very confusing that |
I think we're in agreement on the essential-ness of byte-based indexing. It's just that I think of what I was doing as byte-based indexing of bytes, not Chars (because all the characters I cared about also happen to be bytes). The new Re confusion: We could have Or, we could have Lets not lock .... or, we could have a more opaque |
This is an interesting discussion, but frankly, it's way too late. The basic thing people want from strings is to be able to search for and extract characters and substrings in them. There needs to be some kind of index type to represent locations in strings. For the most part, it is opaque: people don't care what index Even though the indices must be in terms of code units to be efficient, what people want to extract is not code units – it's characters and substrings. So that's how strings work by default: code unit indexing of characters and substrings. The logic is fairly inexorable. It's a somewhat mixed mode, but there's no getting around it: the mixed mode is the only thing that's efficient and does what people want. Certainly there are other modes one may want to interact with a string in:
We already have |
Obligatory cross-ref: #9297 |
Oh, one more point: yes, we could require writing |
Getting back to the original
(The error messages currently produced are still not ideal though, as they expose and depend on the internals of the function. They're less useful than they could be to the user in diagnosing the issue: trying to run |
The following code fails:
with message:
while
insert!(a, 2, "zz")
works.With
splice!
I need to writesplice!(a, 2, ["zz"])
.The text was updated successfully, but these errors were encountered: