-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pretty printing for DataLoader
#122
Conversation
Codecov Report
@@ Coverage Diff @@
## main #122 +/- ##
==========================================
+ Coverage 88.36% 88.72% +0.36%
==========================================
Files 13 13
Lines 533 559 +26
==========================================
+ Hits 471 496 +25
- Misses 62 63 +1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
if Base.haslength(e) | ||
print(io, length(e), "-element ") | ||
else | ||
print(io, "Unknown-length ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this can happen, can DataLoader be used with iterators which don't support indexing? Are there any which allow indexing but don't have a length?
julia> DataLoader(1:10)
10-element DataLoader(::UnitRange{Int64})
with first element:
1-element UnitRange{Int64}
julia> DataLoader(x for x in 1:10)
10-element DataLoader(::Base.Generator{UnitRange{Int64}, typeof(identity)})
with first element:Error showing value of type DataLoader{Base.Generator{UnitRange{Int64}, typeof(identity)}, Random._GLOBAL_RNG, Val{nothing}}:
ERROR: MethodError: no method matching getindex(::Base.Generator{UnitRange{Int64}, typeof(identity)}, ::UnitRange{Int64})
Stacktrace:
[1] getobs(data::Base.Generator{UnitRange{Int64}, typeof(identity)}, idx::UnitRange{Int64})
@ MLUtils ~/.julia/dev/MLUtils/src/observation.jl:49
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think DataLoader
expects random access indexing (for shuffling, batching, etc.). Even an infinite data container must support this through getobs
(falls back to getindex
) despite being slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random access indexing is required as Kyle said
# Base uses this function for composable array printing, e.g. adjoint(view(::Matrix))) | ||
function Base.showarg(io::IO, e::DataLoader, toplevel) | ||
print(io, "DataLoader(") | ||
Base.showarg(io, e.data, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention with overloading this (not just show
) is that things like CuIterator(DataLoader(adjoint(::Matrix{...
could then work without packages knowing about each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice if that could also apply to something like Iterators.flatten(Iterators.repeated(d, n))
, or Iterators.take(Iterators.cycle(d), n * length(d))
... but these seem obscure & have very complicated types.
Maybe we should make repeat(d::DataLoader, epochs::Int)
just work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but I thought there would be a default implementation for repeat
that calls the iterator repeatedly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there should be. I thought it was only arrays, but turns out to repeat strings too:
julia> methods(repeat)
# 6 methods for generic function "repeat" from Base:
[1] repeat(A::AbstractArray; inner, outer)
@ abstractarraymath.jl:392
[2] repeat(A::AbstractArray, counts...)
@ abstractarraymath.jl:355
[3] repeat(c::Char, r::Integer)
@ strings/string.jl:349
[4] repeat(c::AbstractChar, r::Integer)
@ strings/string.jl:348
[5] repeat(s::Union{SubString{String}, String}, r::Integer)
@ strings/substring.jl:251
[6] repeat(s::AbstractString, r::Integer)
@ strings/basic.jl:715
I guess it's eager for all of those, so perhaps unclear whether repeat((x for x in 1:3), 3)
should make an iterator or collect.
function _expanded_summary(xs::NamedTuple) | ||
parts = ["$k = "*_expanded_summary(x) for (k,x) in zip(keys(xs), xs)] | ||
"(; " * join(parts, ", ") * ")" | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is perhaps slightly odd, but the idea is to show what's inside something like this:
julia> nt = (x=rand(10,100), y=rand(Bool,10,100));
julia> summary(nt) # no sizes
"NamedTuple{(:x, :y), Tuple{Matrix{Float64}, Matrix{Bool}}}"
julia> repr(nt) # no sizes, too long
"(x = [0.4301236060985135 0.30528931378945046 0.9865249486891879 0.880700604424396 0.0411513866531249 0.5940861560957025 0.8857114440668031 0.07167460028948913 0.011754567396461968 0.1843492781834919 0.6304545382381233 0.4430950147392062 0.8014564359131793 0.8161412755930899 0.47800508950976983 0.11072415162810345 0.6459516433095668 0.6872" ⋯ 20611 bytes ⋯ " 1 1 0 1 0 0 0 1 1 0 0 1 1 1 0 0 0 0 0 1 1 1 1 0 0 0 1 1 0 1 0 0 1 0 1 0 1 0 0 1 0 1 1 0 0 1 0 0 0 0 1 1 1 1 1 1 0 1 1 0 1 0 0 0 1 1 1 1; 1 1 0 0 0 0 1 1 1 0 1 0 1 0 0 1 1 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 1 1 1 0 1 1 1 0 1 1 0 0 1 0 0 0 1 1 0 0 1 0 0 1 1 1 0 1 1 0 0 1 1 0 1 1 1 0 1 1 1])"
julia> MLUtils._expanded_summary(nt)
"(; x = 10×100 Matrix{Float64}, y = 10×100 Matrix{Bool})"
Maybe ideally Base.summary
would do something more useful here.
This looks good to me 👍 CI failure on Nightly seems to be this unrelated error: |
Stack is #119 BTW. |
if Base.haslength(e) | ||
print(io, length(e), "-element ") | ||
else | ||
print(io, "Unknown-length ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random access indexing is required as Kyle said
# Base uses this function for composable array printing, e.g. adjoint(view(::Matrix))) | ||
function Base.showarg(io::IO, e::DataLoader, toplevel) | ||
print(io, "DataLoader(") | ||
Base.showarg(io, e.data, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but I thought there would be a default implementation for repeat
that calls the iterator repeatedly?
Before:
After:
This prints a summary, because it seems useful to see the types & sizes. To see any actual data, you would have to call
first(ans)
or something.Thoughts?