-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes constructor of SubString from SubString #22511
Conversation
I think this should rather be an error, just like when trying to index a string with an invalid index range. See how |
I agree with @nalimilan, I think this should throw an error. |
Before fixing this I have one question. Do you want also:
to throw an error except when annotated with I want to make sure as I believe that in the initial design of |
I would say yes, but where did you read that this was supposed to work? |
In general I would recommend to have
|
I think that all makes sense. You could do a git blame and figure out who wrote these tests and ping that person, maybe there was a deeper rational for it. |
I have pushed an initial proposal how |
Actually @ScottPJones didn't write these tests, he split tests into several files. These tests were added in #7145, a PR which was about introducing a more efficient implementation of @StefanKarpinski and @stevengj would probably know better, but copying the behavior of |
base/strings/types.jl
Outdated
j > endof(s) && throw(BoundsError(s, j)) | ||
|
||
if !isvalid(s,i) | ||
throw(ArgumentError("invalid SubString index i")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make sense to print "$i" rather than just "i".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant symbol i
as it is a name of the first argument. Invalid second argument j
is allowed (like in getinex
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After rethinking I have hanget it to i=$i
.
base/strings/types.jl
Outdated
@@ -19,10 +19,10 @@ struct SubString{T<:AbstractString} <: AbstractString | |||
function SubString{T}(s::T, i::Int, j::Int) where T<:AbstractString | |||
i > j && return new(s, 0, 0) # allow i > j as it is consistent with getindex | |||
i < 1 && throw(BoundsError(s, i)) | |||
j > endof(s) && throw(BoundsError(s, j)) | |||
j > s.len && throw(BoundsError(s, j)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not guaranteed that all AbstractString subtypes will have a .len
field - that's even going to be removed from String soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tkelman Changed to sizeof
(I understand it will stay).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sizeof
isn't correct here. For example, when indexing "café"
:
julia> sizeof("café")
5
julia> endof("café")
4
base/strings/types.jl
Outdated
@@ -42,7 +42,7 @@ SubString(s::AbstractString, r::UnitRange{<:Integer}) = SubString(s, Int(first(r | |||
function SubString(s::SubString, i::Int, j::Int) | |||
i > j && SubString(s.string, 1, 0) # allow i > j as it is consistent with getindex | |||
i < 1 && throw(BoundsError(s, i)) | |||
j > endof(s) && throw(BoundsError(s, j)) | |||
j >= nextind(s, endof(s)) && throw(BoundsError(s, j)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this always return endof(s) + 1
? Why is this needed?
base/strings/types.jl
Outdated
@@ -4,32 +4,47 @@ | |||
|
|||
## substrings reference original strings ## | |||
|
|||
""" | |||
SubString(s::AbstractString, i::Integer, j::Integer) | |||
SubString(s::AbstractString, r::UnitRange{Integer}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<:Integer
base/strings/types.jl
Outdated
@@ -4,32 +4,47 @@ | |||
|
|||
## substrings reference original strings ## | |||
|
|||
""" | |||
SubString(s::AbstractString, i::Integer, j::Integer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
j::Integer=endof(s)
.
base/strings/types.jl
Outdated
function SubString(s::SubString, i::Int, j::Int) | ||
i > j && SubString(s.string, 1, 0) # allow i > j as it is consistent with getindex | ||
i < 1 && throw(BoundsError(s, i)) | ||
j > sizeof(s) && throw(BoundsError(s, j)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
@@ -110,8 +125,6 @@ let s="lorem ipsum", | |||
SubString(s,1,6)=>"lorem ", | |||
SubString(s,1,0)=>"", | |||
SubString(s,2,4)=>"ore", | |||
SubString(s,2,16)=>"orem ipsum", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe keep these examples, but with valid indices that still give the expected result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@nalimilan Thank you for the code review. I will wait with fixing it till #22548 is decided, as apart from obvious corrections your comments relate to the fact that I want to make Eg. currently:
goes through without error and that is why I have used And in general for me having consistent behavior of indexing for any |
Now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few comments. Looks like you'll have to dig into the bootstrap process to fix the CI failures. That's a relatively tedious work since it's hard to find out where the failures come from. Here it seems to be in the string interpolation code. You'll probably have to review all places using SubString
and check for invalid indices.
base/strings/types.jl
Outdated
i > j && return new(s, 0, 0) # allow i > j as it is consistent with getindex | ||
isvalid(s, i) || throw(BoundsError(s, i)) | ||
isvalid(s, j) || throw(BoundsError(s, j)) | ||
o = i-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
o
isn't really useful in the new code.
base/strings/types.jl
Outdated
SubString(s::AbstractString, r::UnitRange{<:Integer}) = SubString(s, first(r), last(r)) | ||
|
||
function SubString(s::SubString, i::Int, j::Int) | ||
i > j && SubString(s.string, 1, 0) # allow i > j as it is consistent with getindex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to check this here, as the inner constructor will do the same via s.offset+i > s.offset+j
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is required as we allow any i>j
(not necessarily valid) and return empty SubString
then. This is required for consistency with getindex
for String
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is that you call the inner constructor below, which already checks the indices, so this is redundant (just try without it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following test (which should pass) fails if I remove this line:
@test SubString(SubString("123", 1, 2), -10, -20) == ""
But there was an error as the line missed return
before SubString
so thank you for a detailed review 😄.
base/strings/types.jl
Outdated
|
||
function SubString(s::SubString, i::Int, j::Int) | ||
i > j && SubString(s.string, 1, 0) # allow i > j as it is consistent with getindex | ||
isvalid(s, i) || throw(BoundsError(s, i)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isvalid
is relatively expensive, and will be checked by the inner constructor too. So better drop it in favor of a less costly check that i
and j
are within the bounds of the sub string. Then you can delegate to the inner constructor the question of whether they are also valid for the underlying string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I have to retain earlier i > j
check, as SubString(SubString("fsfsdf", 1, 2), -10, -20)
should go through and return empty substring.
test/strings/types.jl
Outdated
ss=SubString(str,1,length(str)) #match source string | ||
ss=SubString(str,1,endof(str)) #match source string | ||
@test length(ss)==length(str) | ||
ss=SubString(str,1:length(str)) # works as str is all ASCII |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just remove this test, it's not valid to index with length(str)
, and we don't really care that it works for ASCII strings.
test/strings/types.jl
Outdated
@test length(ss)==length(str) | ||
|
||
ss=SubString(str,1,0) #empty SubString | ||
@test length(ss)==0 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep blank line for consistency.
test/strings/types.jl
Outdated
@test length(ss)==0 | ||
|
||
ss=SubString(str,10,16) #end indexed beyond source string length | ||
@test length(ss)==3 | ||
str = "∀" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to store the string in str
, it's clearer to repeat it (especially since you change the contents of str
). Also, it would be useful to add a short comment explaining what is tested to new tests.
test/strings/types.jl
Outdated
@test_throws BoundsError SubString(str, 4, idx) == str[4:idx] | ||
end | ||
|
||
@test_throws BoundsError SubString(str,14,20) #start indexed beyond source string length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"start indexed" -> "start index". Same below.
test/strings/types.jl
Outdated
@@ -72,20 +98,23 @@ b = IOBuffer() | |||
write(b, u) | |||
@test String(take!(b)) == "" | |||
|
|||
str = "føøbar" | |||
u = SubString(str, 10, 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather keep this test, just changing the indices. It's not clear what this tested, but it must have triggered a bug in a corner case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment still seems to apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is retained but moved to line 95.
test/strings/types.jl
Outdated
@test SubString(u, 2:idx) == u[2:idx] | ||
end | ||
@test_throws BoundsError SubString(u, 1, 10) | ||
@test_throws BoundsError SubString(u, 1:10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also test with start index out of bounds (both positive and negative).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done below.
test/strings/types.jl
Outdated
for i in -1:length(s)+2 | ||
if isvalid(s, i) | ||
ss=SubString(s,1,i) | ||
@test isvalid(ss,i)==isvalid(s,i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
==isvalid(s,i)
is redundant since that's the condition. Also check that a BoundsError
is raised when !isvalid(s,i)
.
@nalimilan included your comments and working through bootstrap. |
test/strings/types.jl
Outdated
@test length(ss)==0 | ||
|
||
ss=SubString(str,10,16) #end indexed beyond source string length | ||
@test length(ss)==3 | ||
# tests for SubString of as single multibyte `Char` string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of a single
test/strings/types.jl
Outdated
for i in -1:length(s)+2 | ||
if isvalid(s, i) | ||
ss=SubString(s,1,i) | ||
# make sure isvalid give equivalent resutls for SubString and String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gives equivalent results
Fixed typos. Seems that some errors in Documenter.jl have to be fixed before this one is merged. |
base/strings/types.jl
Outdated
o = i-1 | ||
new(s, o, max(0, j-o)) | ||
end | ||
i > j && return new(s, 0, 0) # allow i > j as it is consistent with getindex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could continue to return new(s, i-1, 0)
unless you've found a reason not to. I don't know what was the original justification for this behavior.
test/strings/types.jl
Outdated
@test length(ss)==0 | ||
# tests for SubString of more than one multibyte `Char` string | ||
# we are consistent with `getindex` for `String` | ||
for idx in 0:1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also include idx = 4
here rather than in a separate line.
test/strings/types.jl
Outdated
|
||
@test SubString("∀∀", 4, 4) == "∀∀"[4:4] | ||
|
||
for idx in 5:8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have a comment explaining what is tested here. Same below, where some lines are commented and not others (a global comment for multiple lines is fine).
test/strings/types.jl
Outdated
@test_throws BoundsError SubString("∀∀", 4, idx) | ||
end | ||
|
||
@test_throws BoundsError SubString(str,14,20) #start indexing beyond source string length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's weird to reuse str
here after tests on a different string. Maybe put these tests above those on "∀"
?
test/strings/types.jl
Outdated
|
||
@test_throws BoundsError SubString(str,2:4) | ||
|
||
str2="" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be clearer to get rid of str2
and use ""
everywhere.
Merge conflicts were introduced by #22572 - I will resolve them. |
I would like to ask for a merge or discard this PR before I finish implementation of #23765 as it will be using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, there are two details I hadn't seen (or were introduced in the rebase).
@StefanKarpinski @stevengj Good to merge after this?
base/regex.jl
Outdated
cap = Union{Void,SubString{String}}[ | ||
ovec[2i+1] == PCRE.UNSET ? nothing : SubString(str, ovec[2i+1]+1, ovec[2i+2]) for i=1:n ] | ||
ovec[2i+1] == PCRE.UNSET ? nothing : SubString(str, ovec[2i+1]+1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation should be kept as this is inside an array comprehension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
base/strings/types.jl
Outdated
SubString(s::AbstractString, i::Integer, j::Integer=endof(s)) | ||
SubString(s::AbstractString, r::UnitRange{<:Integer}) | ||
|
||
Like [`getindex`](@ref), but returns a view into the parent AbstractString `s` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing backquotes around AbstractString
, or better say "string".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@nalimilan Thank you for the review as the rebase was not very simple. |
base/regex.jl
Outdated
ovec[2i+1] == PCRE.UNSET ? nothing : SubString(str, ovec[2i+1]+1, | ||
prevind(str, ovec[2i+2]+1)) for i=1:n ] | ||
ovec[2i+1] == PCRE.UNSET ? nothing : SubString(str, ovec[2i+1]+1, | ||
prevind(str, ovec[2i+2]+1)) for i=1:n ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one needs to be aligned with str
. Or maybe better break after :
or ?
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope now it is OK (I thought I have seen the layout I have used earlier somewhere in the code base).
76f3966
to
98a1b14
Compare
CI failures seem to be unrelated |
I have performed another rebase of conflicts introduced in the meantime. |
Does this need a NEWS entry? |
@KristofferC I have proposed a news entry. |
NEWS.md
Outdated
@@ -238,6 +238,9 @@ Library improvements | |||
|
|||
* The functions `strip`, `lstrip` and `rstrip` now return `SubString` ([#22496]). | |||
|
|||
* The constructor of `SubString` now checks if the requsted view range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be listed under "breaking changes", since it can raise an error for previously working cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved, although if someone used an index outside a valid range it was probably a bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but that can still break code since no error was thrown before. Anyway, looks good to me now.
@@ -65,7 +65,7 @@ function shell_parse(str::AbstractString, interpolate::Bool=true; | |||
j = k | |||
end | |||
elseif interpolate && !in_single_quotes && c == '$' | |||
update_arg(s[i:j-1]); i = k; j = k | |||
update_arg(s[i:prevind(s, j)]); i = k; j = k |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why these changes are being made as a part of this PR. What does this have to do with SubString
?
If j
is the index of c
, then j-1
is actually correct for s::String
because c=='$'
is ASCII, by the way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two reasons:
- if
j
is the index ofc
ifc=='$'
thenj-1
does not have to be correct. Considers="∀\$"
, thenj=4
andj-1
is an incorrect index (s[j-1]
throwsUnicodeError
even under stable 0.6 before the changes; the changes only make it also throw an error when range indexing is used e.g.s[i:j-1]
). - It is implemented in
SubString
PR ass
isSubString
because it is defined ass = lstrip(str)
in line 13 andlstrip
returnsSubString
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevengj Is this justification OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I was thinking of j+1
, not j-1
.
@stevengj Merge now? |
Thanks @bkamins! |
PR following https://discourse.julialang.org/t/do-not-allow-substring-to-grow/4438