Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit repr of arrays containing long strings #3900

Merged
merged 2 commits into from
Mar 28, 2020

Conversation

max-sixty
Copy link
Collaborator

@max-sixty max-sixty commented Mar 27, 2020

This also more strictly limits the length of array reprs where the max_width is small, such that it doesn't always display the first and last item. Lmk if that could have a adverse impact anywhere.

"""Summarize a variable in one line, e.g., for the Dataset.__repr__."""
if max_width is None:
max_width = OPTIONS["display_width"]
max_width_options = OPTIONS["display_width"]
if not isinstance(max_width_options, int):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a reasonable way to enforce types for objects coming from dicts? Or too much cruft and we should just use type hints?

@dcherian
Copy link
Contributor

LGTM. thanks

@max-sixty max-sixty merged commit acf7d41 into pydata:master Mar 28, 2020
@max-sixty max-sixty deleted the coord-repr branch March 28, 2020 02:22
dcherian added a commit to dcherian/xarray that referenced this pull request Mar 28, 2020
* upstream/master: (54 commits)
  Limit repr of arrays containing long strings (pydata#3900)
  expose a few zarr backend functions as semi-public api (pydata#3897)
  Use drawstyle instead of linestyle in plot.step. (pydata#3274)
  Implementation of polyfit and polyval (pydata#3733)
  misplaced quote in whatsnew (pydata#3889)
  Rename ordered_dict_intersection -> compat_dict_intersection (pydata#3887)
  Control attrs of result in `merge()`, `concat()`, `combine_by_coords()` and `combine_nested()` (pydata#3877)
  xfail test_uamiv_format_write (pydata#3885)
  Use `fixes` in PR template (pydata#3886)
  Tweaks to "how_to_release" (pydata#3882)
  whatsnew section for 0.16.0
  Release v0.15.1
  whatsnew for 0.15.1 (pydata#3879)
  update panel documentation (pydata#3880)
  reword the whats-new entry for unit support (pydata#3878)
  Raise error when assigning to IndexVariable.values & IndexVariable.data (pydata#3862)
  Re-enable tests xfailed in pydata#3808 and fix new CFTimeIndex failures due to upstream changes (pydata#3874)
  add spacing in the versions section of the issue report (pydata#3876)
  map_blocks: allow user function to add new unindexed dimension. (pydata#3817)
  Delete associated indexes when deleting coordinate variables. (pydata#3840)
  ...
Comment on lines +137 to +140
# NB: Probably not ideal; an alternative would be cutting after the
# first ellipsis
actual = formatting.format_array_flat(np.arange(100.0), 11)
expected = "0.0 ... 99.0"
expected = "0.0 ... ..."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd cut after the first ellipsis, two of them in a row look strange to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but what's the rule? Backtrack and see whether the previous values were ...?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like that, yes:

    if len(pprint_str) > max_width:
        pprint_str = pprint_str[: max(max_width - 3, 0)].rstrip()
        if not pprint_str.endswith("..."):
            pprint_str = pprint_str + "..."

Does that seem reasonable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this is very reasonable, for sure. Just a tradeoff of complexity vs. results.

For example, this case wouldn't change as a result of the function's change, because it ends with a space... We could add that logic in though

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, but we could also run rstrip first, truncate the string and run rstrip again (apologies if that's what you meant, I'm not really sure).

My suggestion might result in something like 783759.270 ... 592000... but avoids 0.0 ... .... If we'd also like to avoid the former, we could not use "".join() immediately but instead save the iterable, check if the result of "".join() is short enough and remove the last element of the iterable if it isn't.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes 100%, I agree that's better. I'm probably a bit less motivated to make these stat as I feel like the egregious tails have been limited, which was the pressing issue. But can come back to it in the future if no one else takes it up.

# replace the end with an ellipsis
# NB: this will still returns a full 3-character ellipsis when max_width < 3
if len(pprint_str) > max_width:
pprint_str = pprint_str[: max(max_width - 3, 0)] + "..."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pprint_str = pprint_str[: max(max_width - 3, 0)] + "..."
pprint_str = pprint_str[: max(max_width - 3, 0)] + padding

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't seem to accept this, maybe because the PR is closed...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we'd need a new PR for that.

dcherian added a commit to MeraX/xarray that referenced this pull request Mar 29, 2020
* upstream/master: (75 commits)
  Implement idxmax and idxmin functions (pydata#3871)
  Update pre-commit-config.yaml (pydata#3911)
  Revert "Use `fixes` in PR template (pydata#3886)" (pydata#3912)
  update the docstring of diff (pydata#3909)
  Un-xfail test_dayofyear_after_cftime_range (pydata#3907)
  Limit repr of arrays containing long strings (pydata#3900)
  expose a few zarr backend functions as semi-public api (pydata#3897)
  Use drawstyle instead of linestyle in plot.step. (pydata#3274)
  Implementation of polyfit and polyval (pydata#3733)
  misplaced quote in whatsnew (pydata#3889)
  Rename ordered_dict_intersection -> compat_dict_intersection (pydata#3887)
  Control attrs of result in `merge()`, `concat()`, `combine_by_coords()` and `combine_nested()` (pydata#3877)
  xfail test_uamiv_format_write (pydata#3885)
  Use `fixes` in PR template (pydata#3886)
  Tweaks to "how_to_release" (pydata#3882)
  whatsnew section for 0.16.0
  Release v0.15.1
  whatsnew for 0.15.1 (pydata#3879)
  update panel documentation (pydata#3880)
  reword the whats-new entry for unit support (pydata#3878)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Truncate long lines in repr of coords
3 participants