Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: New LZW decoding implementation #2887

Merged
merged 14 commits into from
Oct 3, 2024
Merged

Conversation

MartinThoma
Copy link
Member

@MartinThoma MartinThoma commented Sep 30, 2024

The basis for this implementation is https://github.com/empira/PDFsharp/blob/master/src/foundation/src/PDFsharp/src/PdfSharp/Pdf.Filters/LzwDecode.cs (MIT licensed)

As this removes the LZWDecode class from a public module, we have to do a major release when we release this change.

Copy link

codecov bot commented Sep 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.35%. Comparing base (8e1799e) to head (9bf0e56).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2887      +/-   ##
==========================================
+ Coverage   96.27%   96.35%   +0.08%     
==========================================
  Files          52       52              
  Lines        8689     8735      +46     
  Branches     1733     1723      -10     
==========================================
+ Hits         8365     8417      +52     
+ Misses        187      186       -1     
+ Partials      137      132       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pypdf/filters.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@stefan6419846 stefan6419846 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for me now and keeps the necessary bits of backwards compatibility, thus I do not see any issue with merging this now.

@stefan6419846 stefan6419846 merged commit e825ac0 into main Oct 3, 2024
16 checks passed
@stefan6419846 stefan6419846 deleted the lzw-decoder-improvement branch October 3, 2024 14:10
stefan6419846 added a commit that referenced this pull request Oct 27, 2024
## What's new

### New Features (ENH)
- Add `layout_mode_font_height_weight` argument to `PageObject.extract_text()` (#2920) by @hpierre001

### Bug Fixes (BUG)
- Fix font specificier for FreeText annotation (#2893) by @ssjkamei
- Line breaks are not generated due to incorrect calculation of text leading (#2890) by @ssjkamei
- Improve handling of spaces in text extraction (#2882) by @ssjkamei

### Robustness (ROB)
- Soft failure for flate encode image mode 1 with wrong LUT size (#2900) by @stefan6419846

### Documentation (DOC)
- Use latest package versions (#2907) by @stefan6419846
- Correct example of reading FileAttachment annotation (#2906) by @j-t-1

### Developer Experience (DEV)
- Update pinned requirements (#2918) by @stefan6419846
- Make make_release.py compatible with Windows environment (#2894) by @pubpub-zz

### Maintenance (MAINT)
- Remove references to outdated Python versions (#2919) by @stefan6419846
- Generalize the method of obtaining space_code (#2891) by @ssjkamei
- Unnecessary character mapping process (#2888) by @ssjkamei
- New LZW decoding implementation (#2887) by @MartinThoma

### Testing (TST)
- Add LzwCodec for encoding (#2883) by @MartinThoma

### Code Style (STY)
- Capitalize error messages (#2903) by @j-t-1
- Modify error messages in PdfWriter (#2902) by @j-t-1

[Full Changelog](5.0.1...5.1.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants