src: Add support for AI files (Adobe Illustrator) #323

vladfrangu · 2020-02-04T17:56:14Z

Thought I'd give a shot at adding support for Adobe's AI files. This might also allow other sequence-based searches to be done! Closes #64 (what an oldie!)

How does it work? There's no easy way to differentiate between AI and PDF. The first bytes are the same (%PDF in bytes). What you can look for however is the string <xmp:CreatorTool>Adobe Illustrator 24.0 (Windows)</xmp:CreatorTool> or similar, which is near the start of the file, before all the data!

The only other way I could think of is creating a PDF parser of sorts and reading it through that, but that doesn't sound easy enough 😝

Link: https://en.wikipedia.org/wiki/Adobe_Illustrator_Artwork

Question

If wanted, we could also just remove the limit when reading the buffer for a sequence check, which might help for stripped files.

Quick note

Some of the changes in this PR have been made to make xo happy.

IssueHunt Summary

Referenced issues

This pull request has been submitted to:

#64: Support for Adobe Illustrator file ai

IssueHunt has been backed by the following sponsors. Become a sponsor

Borewit · 2020-02-04T18:17:01Z

I need to fix an error message: Borewit/strtok3#147

vladfrangu · 2020-02-04T18:33:35Z

@Borewit while you're here, hi!

I found an odd bug? (or misuse issue on my part) with peekBuffer, the tests for aac (fixture-id3v2.aac to be specific) always fail, and that happens only if the check for the AI files happens...
How can this.position be changed with a peekBuffer? Or am I missing something?

core.js

Borewit · 2020-02-04T18:34:23Z

core.js

+
+	const buffer = Buffer.alloc(minimumBytes);
+
+	await tokenizer.peekBuffer(buffer, {position: options.position, length: 512, mayBeLess: true});


You can safely use tokenizer.readBuffer (advancing the position) once you detected the is PDF kind.

What you can look for however is the string xmp:CreatorToolAdobe Illustrator 24.0 (Windows)</xmp:CreatorTool> or similar, which is near the start of the file, before all the data!

Do you think it possible to tokenizer.read... as long we read before all the data is met?

I assume you're talking about readToken, correct? If so, we probably can read the buffer starting at the 1350s byte, read 512 (for safety) and check if the resulting string includes Adobe Illustrator? Is that what you're referring to?

I have not looked the format at all, but I was (naively) hoping there is a structured way of reading possible. Similar how the zip or png file is done.

Here is a screenshot about how it looks. The items marked with a square are those that can (and will) change; specifically the document title (file name you save it as) and some arbitrary length... This is why the code skips 1350 bytes (which gives plenty leeway even for long file names) and reads 512 bytes which should hopefully catch the CreatorTool

Oh my god, Adobe XMP using RDF, serialized with RDF/XML inside PDF. And maybe I did not notice all layers yet.

If we can somehow figure out the length, then ignore till the title start, ignore title length, then till CreatorTool, we could probably use readToken... But, at least in this case... It's easier to read 512 bytes than do that. Plus, this method makes it work with other file types, if needed! Thoughts?

If there is a header or field length encoded, it could be worth to try to decode that. Your current solution simple, which is also worth something.

Borewit · 2020-02-04T18:40:04Z

I found an odd bug? (or misuse issue on my part) with peekBuffer, the tests for aac (fixture-id3v2.aac to be specific) always fail, and that happens only if the check for the AI files happens...
How can this.position be changed with a peekBuffer? Or am I missing something?

~~No, I think you have a point. Created issue: Borewit/strtok3#149~~

this.position does not change, it is probably the absolute position what is causes it.

core.js

vladfrangu added 3 commits February 3, 2020 20:25

src: Add support for .ai files

9af0e7d

src: Increase buffer read space in case of long file names

61e0d52

src: Revert changes that fixed tests locally

10faad4

src: Fix errors in tests

ce1029c

Repository owner deleted a comment from vladfrangu Feb 4, 2020

Borewit requested changes Feb 4, 2020

View reviewed changes

src: Do first requested change

a857783

Borewit reviewed Feb 4, 2020

View reviewed changes

core.js Outdated Show resolved Hide resolved

src: Another requested change

dc45850

Borewit approved these changes Feb 4, 2020

View reviewed changes

Borewit added the enhancement Add new functionality label Feb 4, 2020

Borewit merged commit 5eb8458 into sindresorhus:master Feb 4, 2020

fungiboletus mentioned this pull request May 13, 2020

PDF created with Adobe Illustrator are wrongly detected as .ai files #360

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src: Add support for AI files (Adobe Illustrator) #323

src: Add support for AI files (Adobe Illustrator) #323

vladfrangu commented Feb 4, 2020 •

edited

Loading

Borewit commented Feb 4, 2020

vladfrangu commented Feb 4, 2020

Borewit Feb 4, 2020

vladfrangu Feb 4, 2020

Borewit Feb 4, 2020

vladfrangu Feb 4, 2020

Borewit Feb 4, 2020 •

edited

Loading

vladfrangu Feb 4, 2020

Borewit Feb 4, 2020

Borewit commented Feb 4, 2020 •

edited

Loading


		const buffer = Buffer.alloc(minimumBytes);

		await tokenizer.peekBuffer(buffer, {position: options.position, length: 512, mayBeLess: true});

src: Add support for AI files (Adobe Illustrator) #323

src: Add support for AI files (Adobe Illustrator) #323

Conversation

vladfrangu commented Feb 4, 2020 • edited Loading

Question

Quick note

Referenced issues

Borewit commented Feb 4, 2020

vladfrangu commented Feb 4, 2020

Borewit Feb 4, 2020

Choose a reason for hiding this comment

vladfrangu Feb 4, 2020

Choose a reason for hiding this comment

Borewit Feb 4, 2020

Choose a reason for hiding this comment

vladfrangu Feb 4, 2020

Choose a reason for hiding this comment

Borewit Feb 4, 2020 • edited Loading

Choose a reason for hiding this comment

vladfrangu Feb 4, 2020

Choose a reason for hiding this comment

Borewit Feb 4, 2020

Choose a reason for hiding this comment

Borewit commented Feb 4, 2020 • edited Loading

vladfrangu commented Feb 4, 2020 •

edited

Loading

Borewit Feb 4, 2020 •

edited

Loading

Borewit commented Feb 4, 2020 •

edited

Loading