Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support comprehensions with unknown-length iterables #1457

Closed
toivoh opened this issue Oct 27, 2012 · 13 comments
Closed

Support comprehensions with unknown-length iterables #1457

toivoh opened this issue Oct 27, 2012 · 13 comments

Comments

@toivoh
Copy link
Contributor

toivoh commented Oct 27, 2012

Currently, there's very little I can do with an iterable that doesn't support length, such as Filter and Task. Examples:

julia> xs = filter(isodd, 1:5)
Filter{Range1{Int32}}(isodd,1:5)

julia> println(xs...)
no method length(Filter{Range1{Int32}},)
 in method_missing at base.jl:70
 in append_any at base.jl:123

julia> [xs...]
no method length(Filter{Range1{Int32}},)
 in method_missing at base.jl:70
 in append_any at base.jl:123

julia> [x for x in xs]
no method length(Filter{Range1{Int32}},)
 in method_missing at base.jl:70
 in anonymous at no file

Would it be reasonable to provide a fallback for these cases (when length is missing) to e g allocate a Vector and push each item as it is produced? The minimal request would be to get [xs...] to work in this case, since that at least would allow to circumvent the others with modest effort.

One thing to consider is that while iterables with a length method usually can be reiterated to produce the same items again, e g Task does not behave like that.
Maybe the fallback should be only for 1d comprehensions, since

[x*y for x in task1, y in task2]

would be asking for trouble. (Consumes all output from one task in the first step of the outer loop,
hard to expand a 2d Array) All the more reason to be able to capture the output first, like

xs = [task1...]
ys = [task2...]
[x*y for x in xs, y in ys]

If there were a fallback for [x*y for x in task1, y in task2], it should do that implicitly.

See also #1454, and discussion at commit 3a0e570.

@staticfloat
Copy link
Member

I think this makes sense for any "dimensionality" of this problem: just precompute the result vector for any task that does not have a length call. For example:

[x*y + z for x in task1, y in 1:10, z in task2]

Would first compute result vectors for x and z, and then fall back onto the behavior we already have for elements that have length defined. I don't really see the issue you mentioned about consuming all output from one task in the first step of the outer loop; even in cases where length is defined, we should only be pulling an item from the input iterables once.

@toivoh
Copy link
Contributor Author

toivoh commented Oct 27, 2012

I agree. Just wasn't sure about how much automagic conversion to vectors that would be desirable.

@staticfloat
Copy link
Member

Perhaps we should just do it for any object for which length throws an
exception?
On Oct 27, 2012 1:17 PM, "toivoh" [email protected] wrote:

I agree. Just wasn't sure about how much automagic conversion to vectors
that would be desirable.


Reply to this email directly or view it on GitHubhttps://github.com//issues/1457#issuecomment-9839373.

@JeffBezanson
Copy link
Member

Stefan had suggested collect as a standard function for converting something to a vector. Comprehensions could then call that on the ranges automatically (and it could be the identity function for collections that are already indexable and have a length). Or we could use some syntax like for i <- task1 for things without a length.

@toivoh
Copy link
Contributor Author

toivoh commented Oct 28, 2012

And it seems that [taks...] works for this now too. Great, thanks!

@kmsquire
Copy link
Member

kmsquire commented Nov 4, 2012

Here's a I struggled with a little. I was trying to put the zlib version into a tuple. It's available as a string, such as "1.2.7" or "1.2.3.4". I wanted this parsed into a tuple so I could easily compare versions, even when the tuples were of different length. Here are a few things I tried or wished existed:

julia> # Tuple comprehension (Python-like generator syntax, doesn't exist in julia)

julia> (int(c) for c in each_match(r"\d+", Zlib.zlib_version))
syntax error: missing separator in tuple

julia> tuple(int(c) for c in each_match(r"\d+", Zlib.zlib_version))
syntax error: missing comma or ) in argument list

julia> # List comprehension (requires length of each_match, in this case)

julia> [int(c.match) for c in each_match(r"\d+", Zlib.zlib_version)]
no method length(RegexMatchIterator,)
 in method_missing at base.jl:70
 in anonymous at no file

julia> # Works, but I can't compare lists properly, and need to know the number of components

julia> ver = [int(c) for c in match(r"(\d+)\.(\d+)\.(\d+)\.?(\d)?", Zlib.zlib_version).captures]
4-element Any Array:
 1
 2
 3
 4

julia> ver < [1,2,3,5]
< not defined for arrays. Try .< or isless.
 in < at abstractarray.jl:803

julia> ver .< [1,2,3,5]
4-element Bool Array:
 false
 false
 false
  true

julia> ver .< [1,2,7]
argument dimensions must match
 in promote_shape at operators.jl:148
 in .< at array.jl:979

julia> # This one works, but only if I know the exact number of components

julia> tuple([int(c) for c in match(r"(\d+)\.(\d+)\.(\d+)\.?(\d)?", Zlib.zlib_version).captures]...)
(1,2,3,4)

julia> tuple([int(c) for c in match(r"(\d+)\.(\d+)\.(\d+)\.?(\d)?", "1.2.7").captures]...)
no method convert(Type{Int64},Nothing)
 in method_missing at base.jl:70
 in int at base.jl:75
 in anonymous at no file

julia> # Working version (somewhat messy, IMHO)

julia> [int(c.match) for c in [each_match(r"\d+", Zlib.zlib_version)...]]
4-element Any Array:
 1
 2
 3
 4

julia> # Version I ended up with

julia> tuple([int(c) for c in split(Zlib.zlib_version, '.')]...)
(1,2,3,4)

julia> # better yet...

julia> tuple(int(split(Zlib.zlib_version, '.'))...)
(1,2,3,4)

So, in the end, I didn't need regular expressions, but would have been happiest if the first list comprehension had worked.

all_matches in #1491 would have also worked:

julia> function all_matches(r::Regex, s::String)
           matches = RegexMatch[]
           for m in each_match(r, s)
               push(matches, m)
           end
           return matches
       end

julia> [int(c.match) for c in all_matches(r"\d+", Zlib.zlib_version)]
4-element Int64 Array:
 1
 2
 3
 4

Might also be nice if this worked:

julia> int(all_matches(r"\d+", Zlib.zlib_version))
no method iround(BitsKind,RegexMatch)
 in method_missing at base.jl:70
 in iround_to at abstractarray.jl:121
 in int at abstractarray.jl:138

Perhaps this would be reasonable once Stephan's ideas in #1448 are addressed.

@StefanKarpinski
Copy link
Member

@kmsquire – this doesn't address all these bumps, which are general issues, but there is a VersionNumber type that may do what you want:

julia> convert(VersionNumber,"1")
v"1.0.0"

julia> convert(VersionNumber,"1.2")
v"1.2.0"

julia> convert(VersionNumber,"1.2.3")
v"1.2.3"

Valid version numbers are determined as specified by the semantic versioning standard. They compare as specified by the precedence rules in the standard, which is just as you would want them to.

@StefanKarpinski
Copy link
Member

Of course, four numbers isn't supported by the standard, so that kind of shoots using that.

@JeffBezanson
Copy link
Member

As the error message suggested, isless would have worked to compare arrays. Argument spreading also works with any iterable, [each_match()...].

@kmsquire
Copy link
Member

kmsquire commented Nov 5, 2012

Thanks for the feedback. I wasn't aware of the VersionNumber type (though I vaguely remember it mentioned, now that you brought it up). Zlib itself only uses 3 numbers, but the Debian/Ubuntu package uses 4. I did miss isless, though, even though it was in front of me:

julia> isless([1,2,3], [1,2,3,4])
true

I assumed it was the same as .<.

I'm sure I'll get faster at things like this as I become more familiar with the language. I'm most used to Python, where the solution is typically obvious or easy to figure out (though it may not work as efficiently). I'm hoping Julia will get there at some point, and I'm hoping my reports and contributions help it along.

Cheers!

@ggggggggg
Copy link
Contributor

+1 Ran into this with RegexMatchIterator.

@StefanKarpinski
Copy link
Member

Collect works for this.

@yurivish
Copy link
Contributor

I just came across this issue while refactoring a loop into a list comprehension.

Here's what I tried to write:

[item for item in @task(x, y)]

After getting an error about how there's no method length(Task), I ended up turning the original code:

for item in @task f(x, y) display(item) end

into the following:

convert(Array{ItemType}, collect(@task f(x, y)))

(The ItemType conversion is there so that my custom writemime function is applied when displaying the array)

As someone fairly new to the language, the fact that comprehensions only work on some iterables (those that have a defined length) came as a surprise – I had been thinking of comprehensions as something like syntax sugar for map or a loop that collects its results.

JeffBezanson added a commit that referenced this issue Jun 27, 2016
this also uses the new lowering for typed comprehensions, allowing all
comprehensions on unknown-length iterables (fixes #1457)
JeffBezanson added a commit that referenced this issue Jun 28, 2016
this also uses the new lowering for typed comprehensions, allowing all
comprehensions on unknown-length iterables (fixes #1457)
JeffBezanson added a commit that referenced this issue Jun 29, 2016
this also uses the new lowering for typed comprehensions, allowing all
comprehensions on unknown-length iterables (fixes #1457)
JeffBezanson added a commit that referenced this issue Jul 6, 2016
this also uses the new lowering for typed comprehensions, allowing all
comprehensions on unknown-length iterables (fixes #1457)
mfasi pushed a commit to mfasi/julia that referenced this issue Sep 5, 2016
JuliaLang#4867)

this also uses the new lowering for typed comprehensions, allowing all
comprehensions on unknown-length iterables (fixes JuliaLang#1457)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants