Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

top-level plot collections: fail on empty dict #10233

Merged
merged 2 commits into from
Jan 12, 2024

Conversation

skshetry
Copy link
Member

@skshetry skshetry commented Jan 12, 2024

If you have a top-level plots with empty dict as follows:

# dvc.yaml
plots:
- {}

dvc would fail with traceback that looks like:

StopIteration                             Traceback (most recent call last)
Cell In[3], line 1
----> 1 repo.index._plot_sources

File ~/Projects/dvc/.venv/lib/python3.12/site-packages/funcy/objects.py:25, in cached_property.__get__(self, instance, type)
     23 if instance is None:
     24     return self
---> 25 res = instance.__dict__[self.fget.__name__] = self.fget(instance)
     26 return res

File ~/Projects/dvc/dvc/repo/index.py:448, in Index._plot_sources(self)
    445 from dvc.repo.plots import _collect_pipeline_files
    447 sources: List[str] = []
--> 448 for data in _collect_pipeline_files(self.repo, [], {}).values():
    449     for plot_id, props in data.get("data", {}).items():
    450         if isinstance(props.get("y"), dict):

File ~/Projects/dvc/dvc/repo/plots/__init__.py:490, in _collect_pipeline_files(repo, targets, props, onerror)
    488         dvcfile_defs_dict[elem] = None
    489     else:
--> 490         k, v = next(iter(elem.items()))
    491         dvcfile_defs_dict[k] = v
    493 resolved = _resolve_definitions(
    494     repo.dvcfs,
    495     targets,
   (...)
    499     onerror=onerror,
    500 )

StopIteration:

The empty dict is not supported by dvc, and we should fail validation in this case.

Copy link

codecov bot commented Jan 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (d711ecd) 90.47% compared to head (3a82fba) 90.23%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10233      +/-   ##
==========================================
- Coverage   90.47%   90.23%   -0.25%     
==========================================
  Files         493      493              
  Lines       37595    37596       +1     
  Branches     5455     5455              
==========================================
- Hits        34014    33924      -90     
- Misses       2953     3022      +69     
- Partials      628      650      +22     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @skshetry for a quick turnaround and for empty YAML catch ...

  • can we make it a validation error though? Otherwise it's easy to miss this kind of conditions
  • I think we need a test for this ... it's easy to add and it's easy to get the same regression later in this (edge) case

@shcheklein
Copy link
Member

Also, please add some description to PR :)

@shcheklein shcheklein added bug Did we break something? A: plots Related to the plots product: Studio Integration with Studio and removed product: Studio Integration with Studio labels Jan 12, 2024
@skshetry
Copy link
Member Author

There are just too many edge cases with schema and too many negative cases, that I don't find it worth it to cover it with tests. :)

And the schema is very declarative.

@shcheklein
Copy link
Member

There are just too many edge cases with schema and too many negative cases, that I don't find it worth it to cover it with tests. :)

that's exactly what we need I think, especially as we hit more and more edge cases like this. It's totally fine to have 1000s of tests for this. And even consider generating different combinations, etc if needed.

@skshetry
Copy link
Member Author

I'd look for property-based testing than writing 1000s of tests.

@skshetry
Copy link
Member Author

skshetry commented Jan 12, 2024

that's exactly what we need I think, especially as we hit more and more edge cases like this. It's totally fine to have 1000s of tests for this. And even consider generating different combinations, etc if needed.

Negative tests will just test voluptuous. Here, we'll be testing if Required works or not.

@shcheklein
Copy link
Member

Negative tests will just test voluptuous. Here, we'll be testing if Required works or not.

I would argue that you are testing the DVC schema in this case (e.g. that someone would not drop the Required). But I agree if this is becomes part of the validation it's way better and less error prone.

@skshetry skshetry force-pushed the plot-skip-empty-dict branch from b91ff5d to 3a82fba Compare January 12, 2024 04:00
@skshetry skshetry changed the title top-level plot collections: skip empty dict top-level plot collections: fail on empty dict Jan 12, 2024
Copy link
Member

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still add a test. Your call on this.

@skshetry
Copy link
Member Author

I would still add a test. Your call on this.

The problem here is that there is a mismatch between schema validation and parsing/deserialization.
Even though we might add a test for schema validation, there is no guarantee that the deserialization logic will be correct.

I don't think they are a different thing. They should be just one single thing. And we should just depend on validation to rule those edgecases out.

@skshetry skshetry merged commit b46bd9c into iterative:main Jan 12, 2024
22 checks passed
@skshetry skshetry deleted the plot-skip-empty-dict branch January 12, 2024 04:11
BradyJ27 pushed a commit to BradyJ27/dvc that referenced this pull request Apr 22, 2024
* top-level plot collections: skip empty dict

* tighten schema for plots; require key name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: plots Related to the plots bug Did we break something?
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants