Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add rand(::String) #22224

Merged
merged 7 commits into from
Jun 15, 2017
Merged

add rand(::String) #22224

merged 7 commits into from
Jun 15, 2017

Conversation

rfourquet
Copy link
Member

julia> rand("asd", 3)
3-element Array{Char,1}:
 'd'
 'a'
 'a'

This should work more generally for any iterable (with length) in principle, but I'm hesitant to introduce a method of rand with unconstrained parameters: rand([rng], iter), which would mean that this would be the fallback method anytime rand is called on a type for which it's not defined (so maybe this should wait till some traits for iterators are available).

test/random.jl Outdated
(rand(Int, 100), Int)]
(rand(Int, 100), Int),
("qwèrtï", Char),
(Base.Test.GenericString("qwèrtï"), Char)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would throw in a test for empty strings since that seems like a corner case that could easily break. (I checked this out specifically to test that – it works.)

base/random.jl Outdated

rand_iter(rng, s, ::Base.IteratorSize) = throw(ArgumentError("iterator must have a known length"))

function rand_iter(rng::AbstractRNG, s, ::Union{Base.HasShape,Base.HasLength})::eltype(s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why is the return type annotation necessary here? c is the value that will be returned and isn't that already guaranteed to have type eltype(s) (or at least <:eltype(s) if eltype(s) is abstract) given that it's an element of s?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inference probably can't determine that i == pos is guaranteed to be true for at least one value of the enumerate loop

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes Tony is spot on, and also initially I thought to propose this PR working generally on iterators, and the return type annotation could serve as an additional way to check whether the argument was complying to the iterator interface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, nice. Thanks for the explanation, both of you.

base/random.jl Outdated
for (i, c) in enumerate(s)
i == pos && return c
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little skeptical of the rand_iter function and AbstractString support. People using rand with other string types will probably not expect this function to be O(n); it seems preferable to throw a MethodError rather than to use this implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree, I had the same hesitation and had planned to update the doc today to warn about the complexity. Then I think that with a proper warning those convenience methods (similarly for AbstractSet and Associative at #22228) are not harmful and can be handy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also do rejection sampling: randomly choose and index from start(s):endof(s) and retry if it's not a valid start of a character – i.e. the same thing you do for String.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh great! for some reason, I skimmed too fast on the generic code of isvalid and thought it was O(n).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I was confused not about isvalid, but about the fact that I thought I needed to compute the length somehow, which is O(n) in the generic method. Thanks again @StefanKarpinski, I updated the code.

stevengj
stevengj previously approved these changes Jun 6, 2017
base/random.jl Outdated
* a type: the set of values to pick from is then equivalent to `typemin(S):typemax(S)` for
integers (this is not applicable to [`BigInt`](@ref)), and to ``[0, 1)`` for floating
point numbers;

`S` defaults to [`Float64`](@ref).

Note that the complexity of `rand(rng, s::AbstractString)` is linear
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really necessary, but this could be a Documenter note block, like

!!! note
    The complexity of `rand(rng, s::AbstractString)` is linear in the
    length of `s` if `s` is not of type `String`. If called more than
    a few times, use `rand(rng, collect(s))` instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and maybe rand(rng, collect(s)) or rand(rng, String(s)), just to emphasize that String is fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If called more than a few times" is a bit awkward because of the ambiguous subject of "called". Maybe "For generating many random characters, use ..."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, but your new formulation seems to include also the case were I call rand(rng, s, dims) which already does a collect... WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "For multiple calls, consider using ..."

base/random.jl Outdated
return s[rand(rng, g)]
end
end
end
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StefanKarpinski would you mind having a look at this updated generic implementation, to confirm it complies with AbstractString API?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to do i = rand(rng, g); isvalid(s, i) && return i since various string types are expected to implement that without the try/catch. The try/catch only exists as a last resort fall-back, and I believe it is more targeted and only catches UnicodeError exceptions (if it's not already, it should be).

Copy link
Member Author

@rfourquet rfourquet Jun 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks (I believe the idea being that try/catch an exception is expected to occur, and slow down the computation when the isvalid call would generally be faster). I will update. Note that the generic isvalid catches all exceptions (not only UnicodeError). I'm not familiar with strings, but I can open a corresponding simple PR if it helps.

@rfourquet
Copy link
Member Author

I rebased (wile changing only the docstring, including "a string (considered as a collection of characters)"), and removed the try/catch block in favor of isvalid.

base/random.jl Outdated
* an `Associative` or `AbstractSet` object, or
* an indexable collection (for example `1:n` or `['x','y','z']`),
* an `Associative` or `AbstractSet` object,
* a string (considered as a collection of characters),
Copy link
Member

@stevengj stevengj Jun 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need "or" at the end of this line (or at the beginning of the next line)

@stevengj stevengj dismissed their stale review June 13, 2017 18:23

realized a problem

@rfourquet
Copy link
Member Author

Does it need a NEWS.md entry?

@rfourquet
Copy link
Member Author

Thanks all for your help!

@rfourquet rfourquet merged commit 1c40bab into master Jun 15, 2017
@rfourquet rfourquet deleted the rf/rand-from-string branch June 15, 2017 09:07
quinnj pushed a commit that referenced this pull request Jun 20, 2017
@rfourquet rfourquet added the randomness Random number generation and the Random stdlib label Jul 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
randomness Random number generation and the Random stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants