Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Unnecessary character mapping process #2888

Conversation

ssjkamei
Copy link
Contributor

@ssjkamei ssjkamei commented Oct 4, 2024

This is a fix for the problem that occurred when #2882 was changed.

The string length of characters was checked after conversion by cmap, but after cmap conversion, there is a pattern where the string length is more than one character, and it cannot be measured accurately.

This is necessary, for example, when considering whether to measure the distance from the ligature or the base character corresponding to the ligature in fixing #1351.

The change in handle_tj is because it cannot pass Ruff's check.
Error: PLR0915 Too many statements (nnn > 176)

The following code is only used to get the character code for a space.
However, I think it would be better to split the code into parts for obtaining the character code.
Style changes are considered in another PR.

_, space_code = parse_encoding(cmap[3], space_code)
_, space_code, _ = parse_to_unicode(cmap[3], space_code)

ssjkamei and others added 30 commits September 24, 2024 13:07
This reverts commit 5400f5a.

BUG: Missing spaces in extract_text() method (py-pdf#1328)

BUG: Missing spaces in extract_text() method (py-pdf#1328) add test
Copy link

codecov bot commented Oct 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.36%. Comparing base (e825ac0) to head (b25b28f).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2888   +/-   ##
=======================================
  Coverage   96.35%   96.36%           
=======================================
  Files          52       52           
  Lines        8735     8738    +3     
  Branches     1723     1727    +4     
=======================================
+ Hits         8417     8420    +3     
  Misses        186      186           
  Partials      132      132           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ssjkamei
Copy link
Contributor Author

ssjkamei commented Oct 4, 2024

I have addressed the parts that were returned due to code check errors.
Also, I'm sorry, it seems that I made a mistake and the past commit history is displayed..

Copy link
Collaborator

@stefan6419846 stefan6419846 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I could not see any apparent issue with this and all the tests still pass without having to change anything. Thus I am going to merge this for now.

@stefan6419846 stefan6419846 merged commit abb62ac into py-pdf:main Oct 4, 2024
17 checks passed
@stefan6419846
Copy link
Collaborator

Also, I'm sorry, it seems that I made a mistake and the past commit history is displayed.

Your previous PR was from your main branch, while this PR used a separate branch, but all branches usually originate from main. Thus your previous commits where included here verbosely.

I recommend you to reset the changes from your main branch and sync it with upstream. If this is too complex, consider deleting and re-creating your fork if there is no work which would be lost by such a process, and always use dedicated branches for further PRs.

@ssjkamei ssjkamei deleted the MAINT--No-unnecessary-character-mapping-process branch October 4, 2024 09:46
stefan6419846 added a commit that referenced this pull request Oct 27, 2024
## What's new

### New Features (ENH)
- Add `layout_mode_font_height_weight` argument to `PageObject.extract_text()` (#2920) by @hpierre001

### Bug Fixes (BUG)
- Fix font specificier for FreeText annotation (#2893) by @ssjkamei
- Line breaks are not generated due to incorrect calculation of text leading (#2890) by @ssjkamei
- Improve handling of spaces in text extraction (#2882) by @ssjkamei

### Robustness (ROB)
- Soft failure for flate encode image mode 1 with wrong LUT size (#2900) by @stefan6419846

### Documentation (DOC)
- Use latest package versions (#2907) by @stefan6419846
- Correct example of reading FileAttachment annotation (#2906) by @j-t-1

### Developer Experience (DEV)
- Update pinned requirements (#2918) by @stefan6419846
- Make make_release.py compatible with Windows environment (#2894) by @pubpub-zz

### Maintenance (MAINT)
- Remove references to outdated Python versions (#2919) by @stefan6419846
- Generalize the method of obtaining space_code (#2891) by @ssjkamei
- Unnecessary character mapping process (#2888) by @ssjkamei
- New LZW decoding implementation (#2887) by @MartinThoma

### Testing (TST)
- Add LzwCodec for encoding (#2883) by @MartinThoma

### Code Style (STY)
- Capitalize error messages (#2903) by @j-t-1
- Modify error messages in PdfWriter (#2902) by @j-t-1

[Full Changelog](5.0.1...5.1.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants