-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pretty printing for DataLoader
#122
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -255,3 +255,41 @@ end | |
e.parallel && throw(ArgumentError("Transducer fold protocol not supported on parallel data loads")) | ||
_dataloader_foldl1(rf, val, e, ObsView(e.data)) | ||
end | ||
|
||
# Base uses this function for composable array printing, e.g. adjoint(view(::Matrix))) | ||
function Base.showarg(io::IO, e::DataLoader, toplevel) | ||
print(io, "DataLoader(") | ||
Base.showarg(io, e.data, false) | ||
e.buffer == false || print(io, ", buffer=", e.buffer) | ||
e.parallel == false || print(io, ", parallel=", e.parallel) | ||
e.shuffle == false || print(io, ", shuffle=", e.shuffle) | ||
e.batchsize == 1 || print(io, ", batchsize=", e.batchsize) | ||
e.partial == true || print(io, ", partial=", e.partial) | ||
e.collate == Val(nothing) || print(io, ", collate=", e.collate) | ||
e.rng == Random.GLOBAL_RNG || print(io, ", rng=", e.rng) | ||
print(io, ")") | ||
end | ||
|
||
Base.show(io::IO, e::DataLoader) = Base.showarg(io, e, false) | ||
|
||
function Base.show(io::IO, m::MIME"text/plain", e::DataLoader) | ||
if Base.haslength(e) | ||
print(io, length(e), "-element ") | ||
else | ||
print(io, "Unknown-length ") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure this can happen, can DataLoader be used with iterators which don't support indexing? Are there any which allow indexing but don't have a length?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Random access indexing is required as Kyle said |
||
end | ||
Base.showarg(io, e, false) | ||
print(io, "\n with first element:") | ||
print(io, "\n ", _expanded_summary(first(e))) | ||
end | ||
|
||
_expanded_summary(x) = summary(x) | ||
function _expanded_summary(xs::Tuple) | ||
parts = [_expanded_summary(x) for x in xs] | ||
"(" * join(parts, ", ") * ",)" | ||
end | ||
function _expanded_summary(xs::NamedTuple) | ||
parts = ["$k = "*_expanded_summary(x) for (k,x) in zip(keys(xs), xs)] | ||
"(; " * join(parts, ", ") * ")" | ||
end | ||
Comment on lines
+291
to
+294
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is perhaps slightly odd, but the idea is to show what's inside something like this: julia> nt = (x=rand(10,100), y=rand(Bool,10,100));
julia> summary(nt) # no sizes
"NamedTuple{(:x, :y), Tuple{Matrix{Float64}, Matrix{Bool}}}"
julia> repr(nt) # no sizes, too long
"(x = [0.4301236060985135 0.30528931378945046 0.9865249486891879 0.880700604424396 0.0411513866531249 0.5940861560957025 0.8857114440668031 0.07167460028948913 0.011754567396461968 0.1843492781834919 0.6304545382381233 0.4430950147392062 0.8014564359131793 0.8161412755930899 0.47800508950976983 0.11072415162810345 0.6459516433095668 0.6872" ⋯ 20611 bytes ⋯ " 1 1 0 1 0 0 0 1 1 0 0 1 1 1 0 0 0 0 0 1 1 1 1 0 0 0 1 1 0 1 0 0 1 0 1 0 1 0 0 1 0 1 1 0 0 1 0 0 0 0 1 1 1 1 1 1 0 1 1 0 1 0 0 0 1 1 1 1; 1 1 0 0 0 0 1 1 1 0 1 0 1 0 0 1 1 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 1 1 1 0 1 1 1 0 1 1 0 0 1 0 0 0 1 1 0 0 1 0 0 1 1 1 0 1 1 0 0 1 1 0 1 1 1 0 1 1 1])"
julia> MLUtils._expanded_summary(nt)
"(; x = 10×100 Matrix{Float64}, y = 10×100 Matrix{Bool})" Maybe ideally |
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention with overloading this (not just
show
) is that things likeCuIterator(DataLoader(adjoint(::Matrix{...
could then work without packages knowing about each other.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice if that could also apply to something like
Iterators.flatten(Iterators.repeated(d, n))
, orIterators.take(Iterators.cycle(d), n * length(d))
... but these seem obscure & have very complicated types.Maybe we should make
repeat(d::DataLoader, epochs::Int)
just work?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but I thought there would be a default implementation for
repeat
that calls the iterator repeatedly?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there should be. I thought it was only arrays, but turns out to repeat strings too:
I guess it's eager for all of those, so perhaps unclear whether
repeat((x for x in 1:3), 3)
should make an iterator or collect.