Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterators.cycle doesn't report length correctly when cycling stateful iterators #47753

Closed
rben01 opened this issue Nov 30, 2022 · 1 comment
Closed

Comments

@rben01
Copy link
Contributor

rben01 commented Nov 30, 2022

Prelims

Installed Julia through Mac app download (symlinked into /usr/local/bin)

Version info:

julia> versioninfo()
Julia Version 1.8.2
Commit 36034abf260 (2022-09-29 15:21 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin21.4.0)
  CPU: 8 × Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, haswell)
  Threads: 4 on 8 virtual cores
Environment:
  JULIA_NUM_THREADS = 4

Description

Iterators.cycle is unable to cycle finite stateful iterators (such as those produced by Iterators.Stateful on a finite collection). On its own this is benign, if perhaps a bit unexpected. However, a Cycle always reports its length as infinite, even when it's finite. For instance,

julia> using .Iterators: take, cycle, Stateful, IteratorSize

julia> c = cycle(Stateful(1:5));

julia> IteratorSize(c)
Base.IsInfinite()

julia> [c...]  # given the above, this should exhaust our memory, but...
5-element Vector{Int64}:
 1
 2
 3
 4
 5

This can lead to some interesting bugs down the line. For instance, since Iterators.take trusts its underlying iterator about its length, it can overestimate its own length (since it takes the minimum of, say, 10 and ∞ instead of 10 and 5). This in turn can lead to incorrect results when passed to further functions that expect a correct value for length(itr). For instance,

julia> make_bad_iter() = take(cycle(Stateful(1:5)), 10);

julia> length(make_bad_iter())
10

julia> [make_bad_iter()...]
5-element Vector{Int64}:
 1
 2
 3
 4
 5

julia> collect(make_bad_iter())  # a vector of length 10 was preallocated for this result
10-element Vector{Int64}:
          1
          2
          3
          4
          5
 4489486928
 4489486960
 4489486992
 4489487024
 4489487088

A few solutions come to mind:

  1. The iterator returned by cycle is changed to be aware of whether it's truly cycleable — i.e., whether its underlying iterator isn't stateful — and if it's not, then it reports SizeUnknown() as its size. I'm not sure whether this is possible in general — is there a Base.isrestartable?
  2. The iterator returned by cycle stores the items produced by the underlying iterator in a vector, one-by-one as they are produced, so that it does in fact cycle in all cases.. Unbounded memory might be an issue, but it's no worse than trying to collect the underlying iterator first, which is what you'd have to do, currently, to get true cycling behavior anyway.
  3. The iterator returned by cycle always has SizeUnknown(). With the current implementation, this actually is what Iterators.IteratorSize should return; IsInfinite() is not always correct. Not sure what other areas this change would affect though.
@jakobnissen
Copy link
Contributor

Duplicate of #43235

@jakobnissen jakobnissen marked this as a duplicate of #43235 Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants