Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF created with Adobe Illustrator are wrongly detected as .ai files #360

Closed
fungiboletus opened this issue May 13, 2020 · 15 comments · Fixed by #396
Closed

PDF created with Adobe Illustrator are wrongly detected as .ai files #360

fungiboletus opened this issue May 13, 2020 · 15 comments · Fixed by #396
Labels
enhancement Add new functionality

Comments

@fungiboletus
Copy link

Since #323 (src: Add support for AI files (Adobe Illustrator)), file-type looks for the text "Adobe Illustrator" in PDF documents and if it matches, it assumes it's an adobe .ai file.

It seems that normal PDF created with Adobe Illustrator will contain the text "Adobe Illustrator" quite a few times in the metadata too, even though they are not Adobe Illustrator files.

@sindresorhus
Copy link
Owner

// @vladfrangu

@vladfrangu
Copy link
Contributor

Can you send me an AI PDF file please? You can attach it here, email it or send it on Discord if it's easier for you (Vladdy#0002)

I'll look into it as soon as I can! 😅

@fungiboletus
Copy link
Author

fungiboletus commented May 13, 2020

I installed Adobe Illustrator's trial and made a few test documents:

  • One normal AI
    • ✅ detected as ai
    • normal.ai
  • One AI without the PDF comptability (cannot be opened by PDF viewers I guess)
    • ✅ detected as ai
    • without-pdf-comptability.ai
  • One PDF saved from Adobe Illustrator, using the default "[Illustrator Default]" preset
    • ❌ detected as ai
    • adobe-illustrator.pdf
  • One PDF saved from Adobe Illustrator, using the preset "smallest PDF"
    • ✅ detected as pdf
    • smallest.pdf
  • One PDF saved from Adobe Illustrator,using the default "[Illustrator Default"] preset, but enabling "Optimize for Fast Web View"
    • ✅ detected as pdf
    • fast-web.pdf
  • One PDF printed from Adobe Illustrator, but with a PDF printer.
    • ✅ detected as pdf
    • printed.pdf

issue_360_filetype.zip

@vladfrangu
Copy link
Contributor

Well it's good to know out of 6 cases, only one fails 😅
I'll look into it asap and let you know!

@fungiboletus
Copy link
Author

True, but it's the one with the default settings for PDF in Adobe Illustrator.

@vladfrangu
Copy link
Contributor

Hey! Sorry to keep you in the dark for 8 whole days, just shot a quick eye at the text using a text diff viewer. Running a diff between the fixture.ai file present in the fixtures folder on this repository, and the adobe-illustrator.pdf file from your archive yielded... A big middle finger from the metadata! However, I thing I spotted in the PDF is that there's this section of data:

Data slice
20 0 obj
<</AIMetaData 21 0 R/AIPDFPrivateData1 22 0 R/AIPDFPrivateData2 23 0 R/AIPDFPrivateData3 24 0 R/AIPDFPrivateData4 25 0 R/AIPDFPrivateData5 26 0 R/ContainerVersion 11/CreatorVersion 24/NumBlock 5/RoundtripStreamType 2/RoundtripVersion 24>>
endobj
21 0 obj
<</Length 1162>>stream
%!PS-Adobe-3.0
%%Creator: Adobe Illustrator(R) 24.0
%%AI8_CreatorVersion: 24.1.2
%%For: (Antoine Pultier) ()
%%Title: (test-no-pdf.ai)
%%CreationDate: 5/13/2020 1:22 PM
%%Canvassize: 16383
%%BoundingBox: 145 -255 407 -166
%%HiResBoundingBox: 145.537109375 -254.33642578125 406.0888671875 -166.9560546875
%%DocumentProcessColors: Black
%AI5_FileFormat 14.0
%AI12_BuildNumber: 408
%AI3_ColorUsage: Color
%AI7_ImageSettings: 0
%%CMYKProcessColor: 1 1 1 1 ([Registration])
%AI3_Cropmarks: 0 -841.8897637795 595.2755905512 0
%AI3_TemplateBox: 298.5 -421.5 298.5 -421.5
%AI3_TileBox: 27.6377952756002 -780.944881889751 567.637795275601 -60.9448818897499
%AI3_DocumentPreview: None
%AI5_ArtSize: 14400 14400
%AI5_RulerUnits: 1
%AI9_ColorModel: 2
%AI5_ArtFlags: 0 0 0 1 0 0 1 0 0
%AI5_TargetResolution: 800
%AI5_NumLayers: 1
%AI9_OpenToView: -381 23 1.13 1542 988 18 0 0 120 87 0 0 0 1 1 0 1 1 0 0
%AI5_OpenViewLayers: 7
%%PageOrigin:-8 -817
%AI7_GridSettings: 72 8 72 8 1 0 0.800000011920929 0.800000011920929 0.800000011920929 0.899999976158142 0.899999976158142 0.899999976158142
%AI9_Flatten: 1
%AI12_CMSettings: 00.MS
%%EndComments
endstream
endobj

Technically, this can be used to detect if this is, in the end, a PDF file. However, I don't know how many cans of worms this will also open up, as I'm not an active user of Adobe products. I can, however, attempt to implement a PR for this!

@CSoellinger-IDS
Copy link

Any chance to get this fixed? Cause the console version from "file-type" is getting the correct file-type (it's using npm file-type v12.xx i think). I'm using file-type as upload validator so my only two options are i am allowing AI file types too or waiting for a bugfix for this :)

Rechnung_400006880095.pdf.zip

@vladfrangu
Copy link
Contributor

It's fixable but I have to mess around with it a lot cause of the way PDF files exist... Basically:

@CSoellinger-IDS
Copy link

Ok, so for now i "just" also accept AI files and hope it will be fixed anytime :)
I am not familiar with the code from the file-type package, but maybe i get some time to check this problem too... based on your three steps :)

However, will be cool to get this fixed :)

@Borewit Borewit added the enhancement Add new functionality label Jul 22, 2020
@thekiwi
Copy link

thekiwi commented Aug 24, 2020

We've also encountered this same regression. We rely on the PDF detection functionality to validate specific PDF processing requests but as of [email protected] this process breaks as file-type returns the files as .ai (application/postscript).

Is there a solution in mind here? I'd suggest it's more of a 'bug' than an 'enhancement' as it is falsely identifying one file type as another.

For now, we've pinned to 14.0.0 until this is resolved. We'll also look at submitting a PR if we can determine a nice fix on our side.

@cmcgrath13
Copy link

cmcgrath13 commented Apr 14, 2022

This appears to be back. I am currently using [email protected] and PDFs exported from illustrator with similar parameters to @fungiboletus and it is improperly detecting it as a .ai file. @vladfrangu

@vladfrangu
Copy link
Contributor

Well the parsing was changed in #396 from what I did so I don't really know what the issue is. Best thing you can probably do is attach a file sample with the broken detection and someone will hopefully take a look

@cmcgrath13
Copy link

Well the parsing was changed in #396 from what I did so I don't really know what the issue is. Best thing you can probably do is attach a file sample with the broken detection and someone will hopefully take a look

@vladfrangu I can DM someone the file for testing, but would prefer to not share it in a public setting. Where should I send this?

@vladfrangu
Copy link
Contributor

Could you replicate the pdf with non-sensitive information? (also helps since it can be added as a text fixture in the repo)

@cmcgrath13
Copy link

Could you replicate the pdf with non-sensitive information? (also helps since it can be added as a text fixture in the repo)

Sure, let me generate something

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Add new functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants