Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openpyxl Worksheet columns is incorrectly typed #10744

Closed
ostr00000 opened this issue Sep 21, 2023 · 2 comments · Fixed by #10787
Closed

openpyxl Worksheet columns is incorrectly typed #10744

ostr00000 opened this issue Sep 21, 2023 · 2 comments · Fixed by #10787

Comments

@ostr00000
Copy link

ostr00000 commented Sep 21, 2023

Property columns should return a generator of tuple[Cell] instead of a single Cell.

This is a very similar issue to #9904, except for column.

@kltws
Copy link

kltws commented Sep 22, 2023

Looking at the implementation of Worksheet, particularly in the _clean_merge_range method (called by merge_cells), it seems that the Worksheet._cells dict can contain MergedCell instances in addition to Cell instances. Shouldn't columns (and for that matter, all the other methods like cell, getitem, rows, iter_rows/iter_cols that returns things found in _cells) be typed to reflect this?

@ostr00000
Copy link
Author

Type of internal _cells

Yes, the type of Worksheet()._cells is dict[tuple[int, int], Cell | MergedCell], so the return type is indeed Cell | MergedCell.
Current openpyxl implementation for reference: https://foss.heptapod.net/openpyxl/openpyxl/-/blob/branch/3.1/openpyxl/worksheet/worksheet.py#L609

Possible solution

If I understand tips from CONTRIBUTING.md correctly union type should be avoided, because to satisfy type check an explicit if must be written on result from such function.
The preferred solution is to use Any and then annotate the result with expected type (when we know that MergedCell is not used at all) or perform isinstance check.

I see 3 possible solutions:

  1. Keep in stub only Cell type - after all MergedCell is not so frequently used.
  2. Refactor stub to use Cell | MergedCell - this is real type, but it requires additional check in user code + it is not recommended by typeshed (avoid union return types in https://github.com/python/typeshed/blob/main/CONTRIBUTING.md).
  3. Refactor stub to use Any - this is recommended by typeshed, but provides no typing value (without additional, explicit annotation).
  4. Refactor stub to use common class for Cell and MergedCell. Fortunately, there is common class StyleableObject. But in that case, a typing.cast would be required (instead of annotating) if somebody would want to use Cell.

There is python/typing#566 that would be handy in this case.
I am ok with no additional type (option 1), so I am not going to create new issue/PR.

Real solution

Of course, a function returning many types is a code smell. According to current usage, most usages want only to get a value:

  • increment column until normal cell: while isinstance(cell, MergedCell): column += 1,
  • get real cell: cell = self.normalize_merged_cell_link,
  • skip processing: warnings.warn(...); continue.

The best solution, in my opinion, is to redesign Worksheet class interface:

  • let cell return always Cell object,
  • if you are interested whenever the cell is merged, call an additional function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants