-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
random_combination_with_replacement
recipe has misleading docstring
#102653
Comments
I cannot see anything in the documentation that says or implies that. The text immediately under the heading "Recipes" says: "These recipes show how to efficiently make random selections from the combinatoric iterators in the itertools module" But it doesn't say anything about being equivalent to |
@pochmann Please assign these to me as you produce one issue after another on the various recipes. In every case, the author of the code should be looped in on the conversation. |
@stevendaprano I didn't see that code, I'm saying that's what "random selection from [an iterator]" sounds like. How else do you interpret that? @rhettinger Ok, next time. |
I agree with @pochmann that there is an issue here. The docs do imply (falsely) that the output of the itertool is what is being sampled. The actual intent of the recipe was to model selection with replacement and then subsequently disregarding order. That happens to not be the same as making equiprobable selections from a deduped result space. I'll spend some more time thinking about this. For the moment, I'm inclined to just update the docstring to make clear what random process is being modeled. |
Here's a possible new docstring:
This is likely overkill and perhaps only the first line is needed. This recipe has been around for a while there haven't previously been any misunderstandings. This is likely because it matches what people usually want and because the four line recipe is clear about what it does. |
random_combination_with_replacement
recipe misbehavesrandom_combination_with_replacement
recipe has misleading docstring
Sounds alright. I'd probably remove the middle paragraph, I don't think it really helps and might even be distracting. Then the third paragraph could shrink to just extend the first: def random_combination_with_replacement(iterable, r): # baseline
"""Choose r elements with replacement. Order the result to match the iterable,
so it would be contained in:
set(itertools.combinations_with_replacement(iterable, r))
""" Or maybe even without the def random_combination_with_replacement(iterable, r): # baseline
"""Choose r elements with replacement. Order the result to match the iterable,
so it would occur in:
itertools.combinations_with_replacement(iterable, r)
"""
Might also be because it's rarely used. A GitHub search found 434 occurrences, most of which just defining the function, I had to go to page 5 of the results to find someone using it, and that was for test data in a benchmark about container types and their membership test speeds, where I doubt they cared about the distribution. |
…ythonGH-102742) (cherry picked from commit b0ec625) Co-authored-by: Raymond Hettinger <[email protected]>
…2742) (cherry picked from commit b0ec625) Co-authored-by: Raymond Hettinger <[email protected]>
* main: (34 commits) pythongh-102701: Fix overflow in dictobject.c (pythonGH-102750) pythonGH-78530: add support for generators in `asyncio.wait` (python#102761) Increase stack reserve size for Windows debug builds to avoid test crashes (pythonGH-102764) pythongh-102755: Add PyErr_DisplayException(exc) (python#102756) Fix outdated note about 'int' rounding or truncating (python#102736) pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives (python#102760) pythongh-99726: Improves correctness of stat results for Windows, and uses faster API when available (pythonGH-102149) pythongh-102192: remove redundant exception fields from ssl module socket (python#102466) pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives (python#102743) pythongh-102737: Un-ignore ceval.c in the CI globals check (pythongh-102745) pythonGH-102748: remove legacy support for generator based coroutines from `asyncio.iscoroutine` (python#102749) pythongh-102721: Improve coverage of `_collections_abc._CallableGenericAlias` (python#102722) pythonGH-102653: Make recipe docstring show the correct distribution (python#102742) Add comments to `{typing,_collections_abc}._type_repr` about each other (python#102752) pythongh-102594: PyErr_SetObject adds note to exception raised on normalization error (python#102675) pythongh-94440: Fix issue of ProcessPoolExecutor shutdown hanging (python#94468) pythonGH-100112: avoid using iterable coroutines in asyncio internally (python#100128) pythongh-102690: Use Edge as fallback in webbrowser instead of IE (python#102691) pythongh-102660: Fix Refleaks in import.c (python#102744) pythongh-102738: remove from cases generator the code related to register instructions (python#102739) ...
Documentation
The
random
module has four recipes that are supposed to "efficiently make random selections from the combinatoric iterators in the itertools module". And their docstrings all say "Random selection from [iterator]". Both suggest they're equivalent torandom.choice(list(iterator))
, just efficiently.For example,
itertools.combinations_with_replacement([0, 1], r=4)
produces these five combinations:So
random.choice(list(iterator))
would return one of those five with 20% probability each.But the
random_combination_with_replacement
recipe instead produces these probabilities:Here's an implementation that is equivalent to
random.choice(list(iterator))
:One can view the combinations as the result of actually simulating r random draws with replacement, where the multiset
{0,0,1,1}
indeed occurs more often, namely as0011
,0101
,0110
, etc. But that is not the only valid view and isn't the view suggested by the documentation (as my first paragraph argued). Though if that view and the bias is the intention, then I suggest its documentation should mention the bias.Test code
Attempt This Online!
Test results
Linked PRs
The text was updated successfully, but these errors were encountered: