Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

command: Show dvc list output in pretty JSON format #6647

Merged
merged 1 commit into from
Sep 29, 2021

Conversation

eggqq007
Copy link
Contributor

@eggqq007 eggqq007 commented Sep 19, 2021

  1. update ls init.py to handle --show-pretty-json
  2. introduce pygments into requirements/default.txt

Closes iterative/dvc.org#2838
Closes #6646

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

@eggqq007 eggqq007 requested a review from a team as a code owner September 19, 2021 09:25
@eggqq007 eggqq007 requested a review from efiop September 19, 2021 09:25
@eggqq007 eggqq007 force-pushed the dev-add-show-pretty-json branch from e4a8eff to fff3317 Compare September 19, 2021 10:43
@eggqq007 eggqq007 changed the title command: Show dvc list output in pretty JSON format (#6646) command: Show dvc list output in pretty JSON format Sep 19, 2021
@@ -72,6 +84,11 @@ def add_parser(subparsers, parent_parser):
list_parser.add_argument(
"--show-json", action="store_true", help="Show output in JSON format."
)
list_parser.add_argument(
"--show-pretty-json",
Copy link
Member

@skshetry skshetry Sep 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just prettify the JSON by default by checking if we are in isatty mode and then prettify that output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! The patch was updated

@skshetry
Copy link
Member

skshetry commented Sep 20, 2021

Btw, why not just do dvc list --show-json . | python -m rich.json - or dvc list --show-json | jq?

@@ -44,3 +44,4 @@ typing_extensions>=3.10.0.2
fsspec[http]>=2021.8.1
aiohttp-retry==2.4.5
diskcache>=5.2.1
pygments>=2.10.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We depend on rich, so you could just do something like following:

from rich.json import JSON

obj = JSON.from_data(data)
ui.write(obj, styled=True)

@eggqq007 eggqq007 force-pushed the dev-add-show-pretty-json branch from fff3317 to a268c79 Compare September 20, 2021 06:37
Comment on lines 41 to 46
if sys.stdout.isatty():
from rich.json import JSON

ui.write(JSON.from_data(entries), styled=True)
else:
ui.write(json.dumps(entries))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this to dvc/ui/__init__.py as ui.write_json, so that we can reuse it in other parts of dvc.

Also, support for indent= and prettify=True|False that keeps the old behaviour would be nice to have. πŸ™‚

Copy link
Member

@skshetry skshetry Sep 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we need to bump rich requirement, as from_data is only available since 10.9.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that makes sense. patch updated.

@eggqq007 eggqq007 force-pushed the dev-add-show-pretty-json branch from a268c79 to 486c1b3 Compare September 20, 2021 13:39
@skshetry
Copy link
Member

The PR looks good. @iterative/dvc, if you have any opinions if dvc should format json. Otherwise, we'll be formatting in other places as well.

import json

ui.write(json.dumps(entries))
if sys.stdout.isatty():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could hide this check under ui.write_json (if rich doesn't do that already for us) CC @skshetry

Copy link
Contributor

@efiop efiop Sep 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though that would get a little confusing with prettify, as it is not clear if it should enforce it or be more like "prettify, but only if it is a tty". Still, it feels like tty check belongs to ui.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, is there any significant performace hit for prettifying? If not, maybe we should indeed consider just showing pretty json by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments!. It is hard to tell the performance gap between json.dumps and prettifying. I can do some evaluation on it later. Besides that, I tend to believe the prettifying is not the bottleneck since dvc list should fetch repo info from remote first which may spend more time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did two simple tests on it.

  1. A python array has 100000 element, each element is a dict which contains 3 key-value. json.dump takes about 0.05s, rich.json.from_date takes about 2.6s

  2. dvc list takes about 1.8s to fetch https://github.com/iterative/example-get-started while prettifying takes 0.03s

Copy link
Member

@skshetry skshetry Sep 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think performance should be a concern. We could choose a certain threshold like 1000 or so entries where we fallback to json.dumps(indent=4) instead of prettifying it. No one should expect us to prettify the 100000 many entries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regretfully dvc list(and others) are not yet fully optimized themselves and having 100K entries in the resulting json is not unusual for our use case, so looks like prettifying is kinda expensive to do always. Plus we have other projects depending on this json output, so that migth be slowing them down too. But isatty check kinda solves that, because if someone will be using the json output - they will redirect it somewhere anyway, so we should be safe and efficient, while being pretty. That should make the threshold approach unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments! I move the isatty into ui init.py and the patch updated. :)

1. update ls __init__.py to output pretty JSON format
2. introduce write_json function into ui __init__.py
Copy link
Member

@skshetry skshetry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this today in the team, and everyone is okay with this getting merged. For consistency, we'd like this to be used in all of the other commands that has --json flags (though in a separate PR).

@skshetry
Copy link
Member

skshetry commented Sep 29, 2021

@eggqq007, would you like to create a PR (or, issue) for #6647 (review)? Thanks for this PR.

@eggqq007
Copy link
Contributor Author

@eggqq007, would you like to create a PR (or, issue) for #6647 (review)? Thanks for this PR.

Sure, new issue in #6712

@efiop efiop added the enhancement Enhances DVC label Oct 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

introduce --show-pretty-json into dvc list options introduce --show-pretty-json into dvc list options
4 participants