Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid OCGs not ignored by SVG image creation #3569

Closed
JorjMcKie opened this issue Jun 12, 2024 Discussed in #3567 · 2 comments
Closed

Invalid OCGs not ignored by SVG image creation #3569

JorjMcKie opened this issue Jun 12, 2024 Discussed in #3567 · 2 comments
Labels
upstream bug bug outside this package

Comments

@JorjMcKie
Copy link
Collaborator

Discussed in #3567

Originally posted by serhii-brovarnyk June 11, 2024
Hello!

I have a PDF file with only one page I got via another tool for PDF documents, and my PDF document has some OCGs.
Unfortunately, I cannot provide the actual file.

If I try to get the pixmap of the page, it is completely OK, but when I try to get an SVG image via page.get_svg_image(text_as_path=False) method then the appearance of the page is completely different.

Investigating the issue I`ve concluded that some of the clip-paths affect the appearance of the drawing that I see.
The defs section does not have any relation to the layers or OCGS but some of the groups look like this:

<g clip-path="url(#clip_1)">
  <g id="layer_1" data-name="SomeName">
	  <path transform="matrix(0,-.06,-.06,-0,3024,2160)"
	      d="M28564 13431V14031L27914 13431ZM28564 14031 27914 13431V14031Z"
	      fill="#7f7f7f"/>
  </g>

If I delete a certain clip-path in the defs section then I`ll get more visible content on the SVG image so I suppose the only reason that I get such a result is the SVG has some invisible data from some of the OCGs and since it does not being managed by the PDF I see it whether I suppose to see it or not.

So my question is How to detect and delete invisible and unnecessary OCGs from my PDF document so I won`t see the difference between the SVG image and the pixmap that I got from the pymupdf Page object?

It is important to notice, that the pymupdf Document object does not have any info about layers or OCGs.
I have tried doc.get_layers(), doc.get_ocgs(), doc.layer_ui_configs() methods but they return empty lists.
But page.get_oc_items() returns such a list of OCGs:

> [('oc10', 68, 'ocg'),
>  ('oc1009', 67, 'ocg'),
>  ('oc1010', 66, 'ocg'),
>  ...
>  ('oc945', 7, 'ocg'),
>  ('oc946', 6, 'ocg'),
>  ('oc947', 5, 'ocg')]

Also, I used such a code

page_xref = doc.page_xref(0)
xref_keys = doc.xref_get_keys(page_xref)
for key in xref_keys:
    print(f"KEY: {key}")
    print(doc.xref_get_key(page_xref, key))
    print('---------------')

To get such info:

> KEY: Contents
> ('xref', '80 0 R')
> ---------------
> KEY: MediaBox
> ('array', '[0 0 2160 3024]')
> ---------------
> KEY: Parent
> ('xref', '82 0 R')
> ---------------
> KEY: Resources
> ('dict', '<</ExtGState<</GT255 79 0 R>>/Font<</F1 74 0 R/F2 69 0 R>>/ProcSet[/PDF/Text/ImageC]/Properties<</oc10 68 0 R/oc1009 67 0 R/oc1010 66 0 R/oc1011 65 0 R/oc1013 64 0 R/oc1014 63 0 R/oc1023 62 0 R/oc1027 61 0 R/oc16 60 0 R/oc17 59 0 R/oc19 58 0 R/oc2 57 0 R/oc3 56 0 R/oc4 55 0 R/oc5 54 0 R/oc507 53 0 R/oc6 52 0 R/oc7 51 0 R/oc8 50 0 R/oc832 49 0 R/oc833 48 0 R/oc834 47 0 R/oc835 46 0 R/oc840 45 0 R/oc842 44 0 R/oc843 43 0 R/oc844 42 0 R/oc848 41 0 R/oc850 40 0 R/oc852 39 0 R/oc853 38 0 R/oc855 37 0 R/oc856 36 0 R/oc858 35 0 R/oc861 34 0 R/oc862 33 0 R/oc863 32 0 R/oc868 31 0 R/oc869 30 0 R/oc870 29 0 R/oc875 28 0 R/oc876 27 0 R/oc877 26 0 R/oc878 25 0 R/oc880 24 0 R/oc883 23 0 R/oc884 22 0 R/oc885 21 0 R/oc898 20 0 R/oc9 19 0 R/oc909 18 0 R/oc925 17 0 R/oc926 16 0 R/oc929 15 0 R/oc931 14 0 R/oc934 13 0 R/oc935 12 0 R/oc936 11 0 R/oc937 10 0 R/oc942 9 0 R/oc943 8 0 R/oc945 7 0 R/oc946 6 0 R/oc947 5 0 R>>>>')
> ---------------
> KEY: Rotate
> ('int', '270')
> ---------------
> KEY: Type
> ('name', '/Page')
> ---------------
> KEY: VP
> ('array', '[]')
> ---------------

In conclusion, this document has some OCGs that are accessible only on the Page level. I want to preserve only visible OCGs to get the right appearance of the resulting SVG image and delete the rest. Can you give me some advice on how to do it?
I have read 2 similar discussions (about OCGs) but eventually did not get the answer :(

@JorjMcKie JorjMcKie added the upstream bug bug outside this package label Jun 12, 2024
@julian-smith-artifex-com
Copy link
Collaborator

Associated MuPDF bug is: https://bugs.ghostscript.com/show_bug.cgi?id=707824

@julian-smith-artifex-com
Copy link
Collaborator

Fixed in 1.24.10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upstream bug bug outside this package
Projects
None yet
Development

No branches or pull requests

2 participants