ENH: Improve/rewrite PDF permission retrieval #2400

stefan6419846 · 2024-01-08T11:30:25Z

Fixes #2391. Fixes #2399.

pypdf/_writer.py

pypdf/_reader.py

pypdf/constants.py

MartinThoma

Nice work!

MartinThoma · 2024-01-08T22:18:59Z

@pubpub-zz What do you think about this approach? Is it fine in your opinion you as well?

pubpub-zz · 2024-01-09T21:32:24Z

from my understanding of §3.5.3, permissions can also be attached to different users through PKCS#7.
it may be good to add a constructor to initialize properly the R1/R2/R6/R7 (for #2401) we should not merge this PR with this issue active

MartinThoma · 2024-01-09T21:37:30Z

Thanks for pointing this out. That's something I wasn't aware of. I will read up on the specs before merging it 👍

stefan6419846 · 2024-01-10T07:54:06Z

from my understanding of §3.5.3, permissions can also be attached to different users through PKCS#7.

Do we really have support for certificate-based encryption anywhere? AFAIK we can only use user and owner passwords at the moment.

According to table 23 of PDF 3200-1:2008, the most important difference is "If bit 2 is set to 1, all other bits are ignored and all operations are permitted. If bit 2 is set to 0, permission for operations are based on the values of the remaining flags defined in Table 24." This seems to change the semantics of R2 only nevertheless.

it may be good to add a constructor to initialize properly the R1/R2/R6/R7 (for #2401) we should not merge this PR with this issue active

IMHO this is a separate issue (and rather a bug than an enhancement) and should get a dedicated PR. This PR primarily is about PDF permission retrieval, as the title indicates.

MartinThoma · 2024-01-11T21:54:24Z

I've just checked other software:

PyMuPDF: What we call PdfReader.user_access_permissions is simply called Document.permissions there.
QPDF: qpdf --show-encryption 005-libreoffice-writer-password/libreoffice-writer-password.pdf --password=openpassword gives a dictionary-like output
PDFium: permission_flags = pdfium_c.FPDF_GetDocPermission(pdf)

from my understanding of §3.5.3, permissions can also be attached to different users through PKCS#7.
it may be good to add a constructor to initialize properly the R1/R2/R6/R7

For the moment, I think it's best to ignore that complexity. We already have a user interface which is public and I think the user_access_permissions property is better than the existing interface. It's also similar to PDFium and PyMuPDF - both are very good projects which I respect a lot.

I leave this open until Saturday (maybe Sunday), but if there are no new arguments / new insights, I would merge it.

pubpub-zz

proposition to prepare future

pypdf/_reader.py

codecov · 2024-01-18T19:52:44Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (480e840) 94.42% compared to head (e24b305) 94.42%.
Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2400   +/-   ##
=======================================
  Coverage   94.42%   94.42%           
=======================================
  Files          49       49           
  Lines        7961     7998   +37     
  Branches     1608     1616    +8     
=======================================
+ Hits         7517     7552   +35     
- Misses        274      276    +2     
  Partials      170      170

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shartzog

## What's new pypdf==4.0.0 is a big milestone forward: * We finally have a layout-mode text extraction. This enables users who want to detect / extract tables with heuristics to give it a try. * We deprecated a lot of the old PyPDF2 API that was either not following PEP8 naming styles or was not using a property. Users comming from PyPDF2 might want to switch first to pypdf<4.0.0 to get helpful error messages that show the new API in their speicific cases. A big 'Thank you!' the the whole pypdf community for your work. Thanks to you, pypdf is better than ever. Kudos to @shartzog who added the layout-mode with his first contribution! ### Deprecations (DEP) - Drop Python 3.6 support (#2369) by @MartinThoma - Remove deprecated code (#2367) by @MartinThoma - Remove deprecated XMP properties (#2386) by @stefan6419846 ### New Features (ENH) - Add "layout" mode for text extraction (#2388) by @shartzog - Add Jupyter Notebook integration for PdfReader (#2375) by @MartinThoma - Improve/rewrite PDF permission retrieval (#2400) by @stefan6419846 ### Bug Fixes (BUG) - PdfWriter.add_uri was setting the wrong type (#2406) by @pmiller66 - Add support for GBK2K cmaps (#2385) by @stefan6419846 ### Documentation (DOC) - Add pmiller66 for #2406 as a contributor by @MartinThoma - Add missing expand parameter (#2393) by @Atomnp - Resolve build warnings (#2380) by @stefan6419846 - Fix testing prerequisites (#2381) by @stefan6419846 - Improve formatting of contributors page (#2383) by @stefan6419846 - Add Tobeabellwether as a contributor for #2341 by @MartinThoma ### Developer Experience (DEV) - Make dependabot aware of our PR prefixes (#2415) by @stefan6419846 - Fail on Sphinx issues (#2405) by @stefan6419846 - Move title check to own workflow (#2384) by @MasterOdin - Write to temporary files instead of the working directory (#2379) by @stefan6419846 - Ensure that the PR titles have the correct format (#2378) by @stefan6419846 ### Maintenance (MAINT) - Complete FileSpecificationDictionaryEntries constants (#2416) by @MartinThoma - Return None instead of -1 when page is not attached (#2376) by @MartinThoma - Replace warning with logging.error (#2377) by @MartinThoma ### Testing (TST) - Add missing pytest.mark.samples annotations (#2412) by @kitterma - Correctly close temporary files (#2396) by @stefan6419846 - Fix side effect #2379 (#2395) by @pubpub-zz - Add test for layout extraction mode (#2390) by @MartinThoma ### Code Style (STY) - Use the UserAccessPermissions enum (#2398) by @MartinThoma - Run black (#2370) by @MartinThoma [Full Changelog](3.17.4...4.0.0)

stefan6419846 added 2 commits January 8, 2024 12:26

ENH: Improve/rewrite PDF permission retrieval

553ed6e

fix ruff violations

316471c

MartinThoma reviewed Jan 8, 2024

View reviewed changes

pypdf/_writer.py Show resolved Hide resolved

MartinThoma reviewed Jan 8, 2024

View reviewed changes

pypdf/_reader.py Show resolved Hide resolved

MartinThoma reviewed Jan 8, 2024

View reviewed changes

pypdf/constants.py Outdated Show resolved Hide resolved

MartinThoma previously approved these changes Jan 8, 2024

View reviewed changes

MartinThoma requested a review from pubpub-zz January 8, 2024 22:18

perform proposed renaming

8ed9a9f

stefan6419846 dismissed MartinThoma’s stale review via 8ed9a9f January 9, 2024 08:53

pubpub-zz reviewed Jan 12, 2024

View reviewed changes

pypdf/_reader.py Show resolved Hide resolved

Merge branch 'main' into user-access-permissions

e24b305

MartinThoma approved these changes Jan 18, 2024

View reviewed changes

MartinThoma merged commit bd571f5 into py-pdf:main Jan 18, 2024
15 checks passed

stefan6419846 deleted the user-access-permissions branch January 18, 2024 19:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Improve/rewrite PDF permission retrieval #2400

ENH: Improve/rewrite PDF permission retrieval #2400

stefan6419846 commented Jan 8, 2024 •

edited

Loading

MartinThoma left a comment

MartinThoma commented Jan 8, 2024

pubpub-zz commented Jan 9, 2024

MartinThoma commented Jan 9, 2024

stefan6419846 commented Jan 10, 2024 •

edited

Loading

MartinThoma commented Jan 11, 2024 •

edited

Loading

pubpub-zz left a comment

codecov bot commented Jan 18, 2024

ENH: Improve/rewrite PDF permission retrieval #2400

ENH: Improve/rewrite PDF permission retrieval #2400

Conversation

stefan6419846 commented Jan 8, 2024 • edited Loading

MartinThoma left a comment

Choose a reason for hiding this comment

MartinThoma commented Jan 8, 2024

pubpub-zz commented Jan 9, 2024

MartinThoma commented Jan 9, 2024

stefan6419846 commented Jan 10, 2024 • edited Loading

MartinThoma commented Jan 11, 2024 • edited Loading

pubpub-zz left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 18, 2024

Codecov Report

stefan6419846 commented Jan 8, 2024 •

edited

Loading

stefan6419846 commented Jan 10, 2024 •

edited

Loading

MartinThoma commented Jan 11, 2024 •

edited

Loading