Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Consistent terminology for outline items #1156

Merged
merged 7 commits into from
Jul 30, 2022

Conversation

mtd91429
Copy link
Contributor

@mtd91429 mtd91429 commented Jul 23, 2022

This PR makes sure PyPDF2 uses a consistent nomenclature for the outline:

  • Outline: A document has exactly one outline (also called "table of contents", in short toc). That outline might be empty.
  • Outline Item: An element within an outline. This is also called a "bookmark" by some PDF viewers.

Concretely, this means that some names will be deprecated to ensure consistency:

PdfReader

  • outlinesoutline
  • _build_outline()_build_outline_item()

PdfWriter

  • Keep get_outline_root()
  • add_bookmark_dict()add_outline()
  • add_bookmark()add_outline_item()

PdfMerger

  • _write_bookmarks()_write_outline()
  • _write_bookmark_on_page()_write_outline_item_on_page()
  • _associate_bookmarks_to_pages()_associate_outline_items_to_pages()
  • find_bookmark()find_outline_item()
  • Keep _trim_outline()

generic.py

  • BookmarkOutlineItem

Closes #1098

@codecov
Copy link

codecov bot commented Jul 24, 2022

Codecov Report

Merging #1156 (b78fccd) into main (844f238) will decrease coverage by 0.15%.
The diff coverage is 88.96%.

@@            Coverage Diff             @@
##             main    #1156      +/-   ##
==========================================
- Coverage   92.24%   92.08%   -0.16%     
==========================================
  Files          24       24              
  Lines        4811     4866      +55     
  Branches      991      996       +5     
==========================================
+ Hits         4438     4481      +43     
- Misses        231      242      +11     
- Partials      142      143       +1     
Impacted Files Coverage Δ
PyPDF2/_writer.py 88.21% <76.92%> (-0.80%) ⬇️
PyPDF2/_reader.py 90.33% <81.48%> (-0.29%) ⬇️
PyPDF2/_utils.py 97.77% <88.23%> (-1.01%) ⬇️
PyPDF2/_merger.py 94.07% <94.52%> (-0.93%) ⬇️
PyPDF2/generic.py 91.57% <100.00%> (ø)
PyPDF2/types.py 100.00% <100.00%> (ø)
PyPDF2/_page.py 92.57% <0.00%> (+0.01%) ⬆️
PyPDF2/_encryption.py 85.98% <0.00%> (+0.58%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 844f238...b78fccd. Read the comment docs.

@MartinThoma
Copy link
Member

Wow, very nice!

I've seen that you followed the the PyPDF2 deprecation process
🎉 👍

I think for types.BookmarkTypes and types.OutlinesType you also need to add aliases for a while. I would be ok if they don't have deprecation notices (I'm uncertain if those work for type annotations). You could just add a comment over the alias "# deprecated types:" or similar.

I've just noticed deprecate_bookmark and I love it 😍 (I still need to go through it in detail, though)

@mtd91429
Copy link
Contributor Author

Good catch. I didn't even think about deprecating the types. I've amended the last commit with changes to address this issue.

I pushed a second commit to add unit tests for the decorator function. There are 3 (could be more, but I thought this was enough).

  • one for the _writer module to show that the UserWarning is emitted.
  • I added two to the _merger module:
    • one to show that the UserWarning is emitted
    • the other to show that the method invoked still produces the desired result (namely, to import the outline on the merged page still works with the deprecated keyword)

@mtd91429 mtd91429 force-pushed the outline-nomenclature branch 4 times, most recently from 772ec0c to ebed584 Compare July 25, 2022 19:24
@mtd91429
Copy link
Contributor Author

So many nits (all the forced pushes). I hope @MartinThoma you and the other repo managers don't get emailed for every failed CI like I do. If so, I'll be more thorough with unit testing on my end before pushing to GitHub. 😳

@MartinThoma
Copy link
Member

Don't worry about that 😄

However, you don't need to make force-pushes. I will make a squashing commit in the end in any case and adjust the commit message. So it doesn't make a difference if you make normal pushes or force pushes.

@MartinThoma MartinThoma added the soon PRs that are almost ready to be merged, issues that get solved pretty soon label Jul 27, 2022
@MartinThoma
Copy link
Member

@mtd91429 I've adjusted your first comment so that it contains the deprecations. That comment will be used as a commit message when I squash the commits of this PR.

@MartinThoma
Copy link
Member

@pubpub-zz @MasterOdin Do you want to have a look at the PR? To you agree with the renamings?

@MartinThoma MartinThoma changed the title Use consistent internal nomenclature for outline items MAINT: Consistent terminology for outline items Jul 27, 2022
@pubpub-zz
Copy link
Collaborator

pubpub-zz commented Jul 27, 2022

looks good
good job @mtd91429

Copy link
Member

@MasterOdin MasterOdin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with the rename, some slight review comments.

"""
deprecate_with_replacement("getOutlines", "outlines")
return self._get_outlines(node, outlines)
deprecate_with_replacement("getOutlines", "outline")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to the PR renaming, but this isn't a true replacement since there's no way to pass the optional parameters of this method to the property? Just no one uses them so it's fine?

PyPDF2/generic.py Outdated Show resolved Hide resolved
PyPDF2/generic.py Outdated Show resolved Hide resolved
Use :py:attr:`outline` instead.
"""
deprecate_with_replacement("outlines", "outline")
return self._get_outline()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'd rather see this call the replacement than the underlying internal function, just so if we decide to alter self.outline, then won't potentially need to also change self.outlines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the replacement also makes it more clear how to use the replacement / not use use the internal function. It's not a big overhead as well 👍

tests/test_reader.py Outdated Show resolved Hide resolved
@MartinThoma
Copy link
Member

@MasterOdin Thank you for the thorough review 🙏

@MartinThoma
Copy link
Member

@mtd91429 I see that you've already spent a lot of time with this awesome PR / preparing it with the issue before. Thank you 🙏

All of the comments @MasterOdin added make a lot of sense to me. Especially the fact of moving some internal functions to the internal _utils.py module. Having many things only used internally / keeping the public interface slim allows us to change those quick without having to go through the deprecation process.

From my side, this PR is ready. I would appreciate it a lot if you incorporated the changes suggested by MasterOdin. However, if you prefer I can also do that after merging. Just let me know :-)

In any way, this will be merged soon :-)

@MartinThoma MartinThoma added the is-maintenance Anything that is just internal: Simplifying code, syntax changes, updating docs, speed improvements label Jul 28, 2022
@mtd91429
Copy link
Contributor Author

@MasterOdin thank you for the thorough review as well with so many lines of code changed, I'm glad there were more eyes on this.

@MartinThoma I am perfectly happy to allow you to make any of the changes. I won't be able to do them until tomorrow, so don't wait for me. I also agree with @MasterOdin suggestions.

@MartinThoma
Copy link
Member

I'm actually uncertain if I can do it before the weekend 😄 Let's see who is faster 😄 🚀

@MartinThoma MartinThoma merged commit 8c532a0 into py-pdf:main Jul 30, 2022
@MartinThoma
Copy link
Member

@mtd91429 Thank you so much for all of the effort you put into this topic 🤗 You improved PyPDF2 🎉 ❤️

I'll make a release tomorrow :-)

MartinThoma added a commit that referenced this pull request Jul 31, 2022
New Features (ENH):
-  Add ability to add hex encoded colors to outline items (#1186)
-  Add support for pathlib.Path in PdfMerger.merge (#1190)
-  Add link annotation (#1189)
-  Add capability to filter text extraction by orientation  (#1175)

Bug Fixes (BUG):
-  Named Dest in PDF1.1 (#1174)
-  Incomplete Graphic State save/restore (#1172)

Documentation (DOC):
-  Update changelog url in package metadata (#1180)
-  Table extraction (#1179)
-  Mention pyHanko for signing PDF documents (#1178)
-  We now have CMAP support (#1177)

Maintenance (MAINT):
-  Consistant usage of warnings / log messages (#1164)
-  Consistent terminology for outline items (#1156)

Code Style (STY):
-  Apply pre-commit (#1188)

Full Changelog: 2.8.1...2.9.0
@mtd91429 mtd91429 deleted the outline-nomenclature branch August 1, 2022 17:49
MartinThoma added a commit that referenced this pull request Dec 25, 2023
Also don't use a default for the deprecation functions: If the version is mentioned explicitly with every call, we can
more easily search for it.

Bookmarks: `MAINT: Consistent terminology for outline items` (#1156, pypdf==2.9.0) introduced it. Meaning in pypdf==4.0.0 we can remove them
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-maintenance Anything that is just internal: Simplifying code, syntax changes, updating docs, speed improvements soon PRs that are almost ready to be merged, issues that get solved pretty soon
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Utilize consistent internal nomenclature for outlines/bookmarks
4 participants