Page Labels -- IMPLEMENTED #782
Replies: 18 comments 51 replies
-
It has not been incorporated in MuPDF. But anyway: could you please elaborate a bit on what you would like to see?
|
Beta Was this translation helpful? Give feedback.
-
@chapmanwilliam As per writing the labels entry, I think your contribution is more or less final enough to become a
What I will do next is to make two (private)
Looking forward to your reaction. |
Beta Was this translation helpful? Give feedback.
-
Hi William, just call me Jorj 😎. >>> doc=fitz.open("pillow.pdf")
>>> doc._get_page_labels()
'0<</S/D>>2<</S/r>>6<</S/D>>'
>>> # if without labels, "" is returned So your code parsing that string is fully applicable. The function is extremely fast - as is typical for MuPDF. So computing the label for any page number simply needs to check in which of the intervals For a computation "label => pno-list" one could use a Python list comprehension: pno_list = [page.number for page in doc if page.label() == label] Which is not the most performant way, but extremely simple. for page in doc:
if page.label() == label:
break This is obviously faster, because not all pages are loaded. Even better would be a function that returns a label for a given integer ... without needing a page object for this. This function can then be called for computing pno_list = [pno for pno in range(doc.pageCount) if label_function(pno) == label]
# etc. For Creating / updating the page label info, the code would be like this: doc._set_page_labels(create_nums(labels))
# with **your** function 'create_nums' - that's it.
# only omit prefix "Nums[" and suffix "]" I believe you use Max OSX? Let me generate a pre-release wheel. When done I will let you from where to download. |
Beta Was this translation helpful? Give feedback.
-
The catalog just contains |
Beta Was this translation helpful? Give feedback.
-
Okay, modified the Anyway this is the output for your example PDF: >>> import fitz
>>> doc=fitz.open("1 Court Guides.pdf")
>>> from pprint import pprint
>>> pprint(doc._get_page_labels())
[(0, '<</S/r>>'), (12, '<</S/D>>')] # this is what you now have to parse
>>> doc.pdf_catalog()
4166
>>> print(doc.xref_object(4166))
<<
/AcroForm 4176 0 R
/Metadata 3415 0 R
/PageLabels 4082 0 R # indirect ref successfully resolved
/Pages 4085 0 R
/Type /Catalog
>>
>>> print(doc.xref_object(4082))
<<
/Nums [ 0 4083 0 R 12 4084 0 R ] # indirect refs successfully resolved
>>
>>> print(doc.xref_object(4083))
<<
/S /r
>>
>>> print(doc.xref_object(4084))
<<
/S /D
>>
>>> Old example also still works: >>> doc=fitz.open("pillow.pdf")
>>> pprint(doc._get_page_labels())
[(0, '<</S/D>>'), (2, '<</S/r>>'), (6, '<</S/D>>')]
>>> I will let you know a new wheel generation. |
Beta Was this translation helpful? Give feedback.
-
New wheel is generated. You can check now if your large example works. |
Beta Was this translation helpful? Give feedback.
-
we are approaching the goal millimeter-wise: |
Beta Was this translation helpful? Give feedback.
-
Thanks William, I have submitted wheel generation based on what we discussed / found. Should be available in half an hour or so. Note: This version includes your other request for identifyable link insertions and the ability to modify toc item colors. |
Beta Was this translation helpful? Give feedback.
-
wheels are ready to go! |
Beta Was this translation helpful? Give feedback.
-
just built another set of pre-release wheels here.
As I said, you can define them as normal Python functions that can be included in |
Beta Was this translation helpful? Give feedback.
-
Huh? |
Beta Was this translation helpful? Give feedback.
-
Jorj, you are quite right. My mistake, apologies. My problem was reading the colours and putting them into Treeview. William
… On 6 Jan 2021, at 17:23, Jorj X. McKie ***@***.***> wrote:
... the colour of the bookmarks is not being saved
Huh?
Please show me the complete code you used: it works for me.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#782 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AI5FHCG23UQEMF4M3CZZD73SYSMCJANCNFSM4VEIFDNA>.
|
Beta Was this translation helpful? Give feedback.
-
Hi William, Main changes I made:
With this code, the following new methods become available:
|
Beta Was this translation helpful? Give feedback.
-
The new version 1.18.6 has been uploaded. |
Beta Was this translation helpful? Give feedback.
-
Which ones? I am not getting errors if no labels are defined:
I already have a method |
Beta Was this translation helpful? Give feedback.
-
Ok - I’ll handle it my side
…Sent from my iPhone
On 14 Jan 2021, at 13:19, Jorj X. McKie ***@***.***> wrote:
As per a method like parse_page_string: I do not think it makes much sense.
it seems a rather mundane requirement - which realistic situation would require an answer like this? And then it would probably be better to document how the respective functions in utils.py which are not "published" as Document methods, can be used to build a specific own solution. One could also explain how to use the single label rule dictionaries can be used to build whatever special solution: those dictionaries are well designed and easy to access and able to answer any inquiry - just by using Python capabilities.
the parameter specification as a string does not make much sense either: after all how can we know that there never will be rule prefixes containing commas? or "-"? It practically impossible to disambiguate a string parameter in that respect.
As a general comment:
I have put our names - when mentioned as authors - outside the """ ... """ comment blocks for a reason: they would be displayed if someone does a help(fitz.Document.function) inquiry. Embarrassing and immodest.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Beta Was this translation helpful? Give feedback.
-
Yes, fair point, although Adobe acrobat seems to accept page labels as user input for page ranges.
…Sent from my iPhone
On 14 Jan 2021, at 15:49, William Chapman ***@***.***> wrote:
Ok - I’ll handle it my side
Sent from my iPhone
>> On 14 Jan 2021, at 13:19, Jorj X. McKie ***@***.***> wrote:
>>
>
> As per a method like parse_page_string: I do not think it makes much sense.
>
> it seems a rather mundane requirement - which realistic situation would require an answer like this? And then it would probably be better to document how the respective functions in utils.py which are not "published" as Document methods, can be used to build a specific own solution. One could also explain how to use the single label rule dictionaries can be used to build whatever special solution: those dictionaries are well designed and easy to access and able to answer any inquiry - just by using Python capabilities.
> the parameter specification as a string does not make much sense either: after all how can we know that there never will be rule prefixes containing commas? or "-"? It practically impossible to disambiguate a string parameter in that respect.
> As a general comment:
> I have put our names - when mentioned as authors - outside the """ ... """ comment blocks for a reason: they would be displayed if someone does a help(fitz.Document.function) inquiry. Embarrassing and immodest.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or unsubscribe.
|
Beta Was this translation helpful? Give feedback.
-
Adobe Acrobat is a viewer not a programming library. Also I doubt that this logic is fool-proof: how does it disguindish between a page range "5-7" and a prefix looking exactly like this? or like this: "1,4,3-II"? |
Beta Was this translation helpful? Give feedback.
-
Amazing library. But it is missing something fundamental: support for page labels.
Mr Bloomfield claims he has provided such a feature here in Sep 2015, but I don’t think it has been incorporated.
https://bugs.ghostscript.com/show_bug.cgi?id=695351
Any chance that could be done?
With many thanks in advance.
Beta Was this translation helpful? Give feedback.
All reactions