[Discussion] Is the implementation of `cycler` efficient? #742

NivekT · 2022-08-17T22:55:30Z

TL;DR: It seems in most cases users might be better off using .flatmap(lambda x: [x for _ in n_repeat]) rather than .cycle(n_repeat).

Here is the implementation, basically Cycler reads from the source DataPipe for n number of times.

Things to consider:

This means repeating certain operations (e.g. reading from disk, complicated transformation) for n number of times, unless you use in_memory_cache.
If shuffle is used afterwards, I believe .flatmap(lambda x: [x for _ in n_repeat]) is strictly better than .cycle(n_repeat).
For input = [0, 1, 2], the major difference is that .cycle returns [0, 1, 2, 0, 1, 2] compared to .flatmap(...) returning [0, 0, 1, 1, 2, 2].

Questions:

Should we change the implementation?
Should we add something like .repeat() which basically does .flatmap(lambda x: [x for _ in n_repeat])?
Should we advise users to use .flatmap(...) instead unless they specifically want the ordering of [0, 1, 2, 0, 1, 2]?

@VitalyFedyunin @ejguan Let me know what you think.

The text was updated successfully, but these errors were encountered:

NivekT · 2022-08-17T23:01:48Z

I did some quick profiling and cycle is slower for a relatively simple DataPipe.

ejguan · 2022-08-18T14:26:49Z

Should we add something like .repeat() which basically does .flatten(lambda x: [x for _ in n_repeat])?

I would support this proposal because I believe those two have different use cases. And, python does support such functionality via https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.repeat_each

BTW, which are you using for profiling? It looks great.

NivekT · 2022-08-18T14:36:20Z

Adding an operation sounds good. I will probably modify the docstrings as well to mention that the other option exists and may be more suitable for some use cases.

I am using scalene for profiling.

NivekT · 2022-08-30T18:57:09Z

Closing this as #748 has landed.

NivekT mentioned this issue Aug 19, 2022

[DataPipe] Adding RepeaterIterDataPipe #748

Closed

NivekT closed this as completed Aug 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion] Is the implementation of `cycler` efficient? #742

[Discussion] Is the implementation of `cycler` efficient? #742

NivekT commented Aug 17, 2022 •

edited

Loading

NivekT commented Aug 17, 2022

ejguan commented Aug 18, 2022

NivekT commented Aug 18, 2022 •

edited

Loading

NivekT commented Aug 30, 2022

[Discussion] Is the implementation of cycler efficient? #742

[Discussion] Is the implementation of cycler efficient? #742

Comments

NivekT commented Aug 17, 2022 • edited Loading

NivekT commented Aug 17, 2022

ejguan commented Aug 18, 2022

NivekT commented Aug 18, 2022 • edited Loading

NivekT commented Aug 30, 2022

[Discussion] Is the implementation of `cycler` efficient? #742

[Discussion] Is the implementation of `cycler` efficient? #742

NivekT commented Aug 17, 2022 •

edited

Loading

NivekT commented Aug 18, 2022 •

edited

Loading