Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Override Iterator methods when possible in std #24214

Closed
9 of 11 tasks
aturon opened this issue Apr 8, 2015 · 30 comments · Fixed by #28101
Closed
9 of 11 tasks

Override Iterator methods when possible in std #24214

aturon opened this issue Apr 8, 2015 · 30 comments · Fixed by #28101
Labels
E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@aturon
Copy link
Member

aturon commented Apr 8, 2015

Now that Iterator has re-emerged as a single trait, it's possible to override default methods like count, last and nth for iterators on slices, vectors, and so on. We should audit the applicable iterators and take advantage of overrides when possible.

libcore:

@aturon aturon added A-libs E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. I-slow Issue: Problems and improvements with respect to performance of generated code. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. labels Apr 8, 2015
@aturon
Copy link
Member Author

aturon commented Apr 8, 2015

I'd be happy to mentor work on this bug.

@djmally
Copy link
Contributor

djmally commented Apr 9, 2015

I would be interested in working on this.

@GuillaumeGomez
Copy link
Member

Me too.

@tamird
Copy link
Contributor

tamird commented Apr 17, 2015

Also interested in working on this. Anything left to do?

@djmally
Copy link
Contributor

djmally commented Apr 20, 2015

@aturon Any word on this?

@pnkfelix
Copy link
Member

Also, I'm curious if methods like skip(n) are required to actually call next n tines, or if they are allowed to "jump ahead" on random-access iterators. (Maybe that sort of trick will have to wait for specialization in the language, to avoid composing that with e.g. a map or other potentially side-effectful iterator

@Stebalien
Copy link
Contributor

@pnkfelix skip can call nth on the underlying iterator and the underlying iterator can optimize nth if possible.

@pnkfelix
Copy link
Member

@Stebalien I think that is my point: You say "the underlying iterator can optimize nth if possible", but I claim to allow that would require changing the specification of nth, which currently is specified as "going through n iterations", which I interpret as "calls the next() method n times" -- a side-effectful iterator chain (e.g. one using inspect) can observe the difference between that or jumping ahead n steps when given a random-access iterator.

@aturon
Copy link
Member Author

aturon commented Apr 21, 2015

Oh dear, I seem to've lost track of this thread, sorry to everyone who was waiting on me to organize things!

I think probably the first place to start is just to get a list of the public iterators in std, which we can turn into a checklist and track assignees for. Part of the work will just be figuring out which cases are actually opportunities for doing better.

Anyone interested in putting such a list together? I'd be glad to help coordinate.

@aturon
Copy link
Member Author

aturon commented Apr 21, 2015

(And if no one is in a position to make a list, I can put one together tomorrow.)

@Stebalien
Copy link
Contributor

@pnkfelix It doesn't because:

  1. Skip can assume that the underlying iterator correctly implements nth.
  2. Some iterators (e.g. slice::Iter) have side-effect-free next methods. These iterators can implement optimized nth functions.

So vec![1, 2, 3, 4, 5].iter().skip(2) can have a constant time skip given an optimized slice::Iter::nth method. Unfortunately, as implemented, ranges can't optimize nth this way because they are generic.

@Stebalien
Copy link
Contributor

Here's core:

  • slice::Iter(Mut): count, last, nth
  • slice::Windows: count, last, nth
  • slice::Chunks(Mut): count, last, nth
  • char::{EscapeDefault, EscapeUnicode}: count, last, nth (techically but probably not worth it)
  • str::Bytes: count, last, nth
  • iter::Chain: count, last, nth
  • iter::Enumerate: count, nth
  • iter::Peekable: count, last, nth
  • iter::Skip: count, last, next (should call nth), nth
  • iter::Take: nth
  • iter::Fuse: count, last, nth

@aturon
Copy link
Member Author

aturon commented Apr 22, 2015

Thanks @Stebalien! I've added this as a checklist to the top of the issue.

Anyone on thread: feel free to let me know if you want to take on a particular module/crate and I can put your name next to the item to "claim" it.

@wldcordeiro
Copy link

I'm interested in this @aturon would I just fork the repo and work on master or should I make a separate branch like rust-lang/slice-iter or something?

@aturon
Copy link
Member Author

aturon commented Apr 22, 2015

@wldcordeiro

It usually works best to fork, and then create a branch in your local copy that tracks the master branch on this repo. We use the "fork and pull" model here.

Once you get cooking, write a comment here to say which modules you're taking so we can avoid overlap. Thanks!

@Stebalien
Copy link
Contributor

I have one for Iter(Mut). However, I'm not happy with the performance of nth (see the inline comment). Issue linked above ^.

I also have implementations for the iterator adapters but I'm getting an ICE on compile (code: https://github.com/Stebalien/rust/tree/iter)
ICE:

rustc: x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd
error: internal compiler error: no type for node 85759: expr buf.as_mut_ptr() (id=85759) in fcx 0x361d0b857e0
note: the compiler unexpectedly panicked. this is a bug.
note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports
note: run with `RUST_BACKTRACE=1` for a backtrace
thread 'rustc' panicked at 'Box<Any>', /home/steb/Code/Projects/rust/src/libsyntax/diagnostic.rs:230

stack backtrace:
   1:      0x361d80d4819 - sys::backtrace::write::hdd4d62d456867831u6r
   2:      0x361d80dba7d - panicking::on_panic::hbb2188b8b35dcf6cdww
   3:      0x361d80a557c - rt::unwind::begin_unwind_inner::h0259abc6ecf88252lbw
   4:      0x361d573e749 - rt::unwind::begin_unwind::h17934595601296836584
   5:      0x361d573ed1b - diagnostic::Handler::bug::h57a278f965e936bb0JB
   6:      0x361d622568b - session::Session::bug::h3a23a9d8685d047fhYp
   7:      0x361d787b4ec - check::FnCtxt<'a, 'tcx>::node_ty::h61c97da1870f112fwrp
   8:      0x361d789b2db - check::regionck::type_of_node_must_outlive::hc215bcd8f34283beppe
   9:      0x361d7899fa3 - check::regionck::visit_expr::h6693b71679b2ea0aiGd
  10:      0x361d789a3bb - check::regionck::visit_expr::h6693b71679b2ea0aiGd
  11:      0x361d7897d7d - check::regionck::Rcx<'a, 'tcx>::visit_fn_body::h2708ba44632c99bcFid
  12:      0x361d78fdb4c - check::check_bare_fn::h4c12ac148dc4a251AUn
  13:      0x361d78fbbb4 - check::CheckItemBodiesVisitor<'a, 'tcx>.Visitor<'tcx>::visit_item::hc46ecad48c847aeeBRn
  14:      0x361d78fbd9f - check::CheckItemBodiesVisitor<'a, 'tcx>.Visitor<'tcx>::visit_item::hc46ecad48c847aeeBRn
  15:      0x361d78fbd9f - check::CheckItemBodiesVisitor<'a, 'tcx>.Visitor<'tcx>::visit_item::hc46ecad48c847aeeBRn
  16:      0x361d78fbd9f - check::CheckItemBodiesVisitor<'a, 'tcx>.Visitor<'tcx>::visit_item::hc46ecad48c847aeeBRn
  17:      0x361d79a325e - check_crate::closure.36316
  18:      0x361d79a0700 - check_crate::h52c99a549b4c8975pDC
  19:      0x361d8610d22 - driver::phase_3_run_analysis_passes::h8e3fb093e4361425tGa
  20:      0x361d85f9084 - driver::compile_input::h00a02dc5812e10deQba
  21:      0x361d86992d0 - run_compiler::h8e40b347429203fbX4b
  22:      0x361d8696f2f - boxed::F.FnBox<A>::call_box::h11085863581512576413
  23:      0x361d8696739 - rt::unwind::try::try_fn::h17400314009759166474
  24:      0x361d81571a8 - rust_try_inner
  25:      0x361d8157195 - rust_try
  26:      0x361d86969b8 - boxed::F.FnBox<A>::call_box::h5995785377111646518
  27:      0x361d80da9cc - sys::thread::create::thread_start::h67b8f63f899fd3ecIvv
  28:      0x361d2e19373 - start_thread
  29:      0x361d7d3927c - clone
  30:                0x0 - <unknown>

/home/steb/Code/Projects/rust/mk/target.mk:167: recipe for target 'x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/stamp.std' failed
make: *** [x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/stamp.std] Error 101

Which I believe is: #24199

@Stebalien
Copy link
Contributor

To add to the list, nth should be overridden in the by_ref iterators (impl<I: Iterator> Iterator for &(mut) I { .. }

@pnkfelix
Copy link
Member

@Stebalien wrote:

Some iterators (e.g. slice::Iter) have side-effect-free next methods. These iterators can implement optimized nth functions.

So vec![1, 2, 3, 4, 5].iter().skip(2) can have a constant time skip given an optimized slice::Iter::nth method. Unfortunately, as implemented, ranges can't optimize nth this way because they are generic.

I think I see. So you are saying vec![...].iter() would provide the optimized nth (and thus could be used in an optimized fashion by an added .skip(n) call), while vec![...].iter().map(...) and vec![...].iter().inspect(...) would be forced to fall back to calling next() n times, right?

Update: (seems obvious in hindsight. ah well.)

@Stebalien
Copy link
Contributor

@pnkfelix Exactly. It should also be possible to optimize methods like any, all, min, max, etc. by avoiding bounds checks but I don't know if it's worth it. However, it probably is worthwhile to optimize those methods for the Bytes/Chars iterators because doing so will allow us to read in large chunks instead of byte-by-byte.

@aturon
Copy link
Member Author

aturon commented Apr 27, 2015

Note that in the long run, we'll need full "specialization" in order to optimize chained cases -- so this issue is primarily about the lowest-hanging fruit that we can tackle with today's trait system.

bors added a commit that referenced this issue Apr 27, 2015
Instead of using the O(n) defaults, define O(1) shortcuts. I also copied (and slightly modified) the relevant tests from the iter tests into the slice tests just in case someone comes along and changes them in the future.

Partially implements  #24214.
@Stebalien
Copy link
Contributor

Additional iterators to consider optimizing:

  • path::Iter: last
  • path::Components: last
  • io::{Bytes, Chars}: count, last, nth, partition, fold, all, any, max, min, max_by, min_by, unzip, sum, product
    • Implementing these methods could significantly reduce the number of system calls by reading in as much as possible. Is this worth it or should we just expect users to use a buffered reader?
  • env::{Vars, VarsOs, Args, ArgsOs} count, nth, last
    • Calls down to sys::os::{Env, Args}.
  • sys::os::{Env, Args} count, nth, last
  • vec::IntoIter count (done), nth, last
    • These may not be worth it because we have to drop each element anyways...

bors added a commit that referenced this issue May 4, 2015
Override methods `count`, `last`, and `nth` in vec::IntoIter.

#24214
@ticki
Copy link
Contributor

ticki commented Jan 17, 2016

EscapeDefault can be checked. My PR was merged.

@isomorpheme
Copy link
Contributor

Just noticed str::Bytes can be checked as well.

ranma42 added a commit to ranma42/rust that referenced this issue Jan 20, 2016
Trivial implementation, as both are `ExactSizeIterator`s.

Part of rust-lang#24214.
ranma42 added a commit to ranma42/rust that referenced this issue Jan 20, 2016
ranma42 added a commit to ranma42/rust that referenced this issue Jan 20, 2016
as a step from the appropriate state.

Part of rust-lang#24214.
@Manishearth Manishearth removed the E-help-wanted Call for participation: Help is requested to fix this issue. label Feb 29, 2016
ranma42 added a commit to ranma42/rust that referenced this issue Apr 20, 2016
bors added a commit that referenced this issue May 19, 2016
Implement `last` for `EscapeUnicode`

The implementation is quite trivial as the last character is always `'{'`.
As a side-effect it also improves the implementation of `last` for `EscapeUnicode`.

Part of #24214, split from #31049.

Maybe this (and the other changes that I will split from #31049) should wait for a test like `ed_iterator_specializations` to be added. Would it be sufficient to do the same for each possible escape length?
ranma42 added a commit to ranma42/rust that referenced this issue May 26, 2016
Trivial implementation, as both are `ExactSizeIterator`s.

Part of rust-lang#24214.
Manishearth added a commit to Manishearth/rust that referenced this issue May 28, 2016
…richton

Implement `count` for `EscapeUnicode`

and cleanup the code for `count` for `EscapeDefault` (instead of repeating the `match` for `size_hint` and `count`).

This PR marks EscapeUnicode and EscapeDefault as ExactSizeIterator. The constraints for the trait implementations held even before this PR, but I am not sure if this is something we want to guarantee/expose (I would love feedback on this, especially on what would be the appropriate way to handle stabilisation, if needed).

Part of rust-lang#24214, split from rust-lang#31049.

The test for `count` was added in rust-lang#33103.
ranma42 added a commit to ranma42/rust that referenced this issue May 28, 2016
as a step from the appropriate state.

Part of rust-lang#24214.
@brson
Copy link
Contributor

brson commented Jun 27, 2016

From comments it appears this is done. Thanks all.

@brson brson closed this as completed Jun 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.