Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: Add support for AI files (Adobe Illustrator) #323

Merged

Conversation

vladfrangu
Copy link
Contributor

@vladfrangu vladfrangu commented Feb 4, 2020

Thought I'd give a shot at adding support for Adobe's AI files. This might also allow other sequence-based searches to be done! Closes #64 (what an oldie!)

How does it work? There's no easy way to differentiate between AI and PDF. The first bytes are the same (%PDF in bytes). What you can look for however is the string <xmp:CreatorTool>Adobe Illustrator 24.0 (Windows)</xmp:CreatorTool> or similar, which is near the start of the file, before all the data!

The only other way I could think of is creating a PDF parser of sorts and reading it through that, but that doesn't sound easy enough 😝

Link: https://en.wikipedia.org/wiki/Adobe_Illustrator_Artwork

Question

If wanted, we could also just remove the limit when reading the buffer for a sequence check, which might help for stripped files.

Quick note

Some of the changes in this PR have been made to make xo happy.


IssueHunt Summary

Referenced issues

This pull request has been submitted to:


IssueHunt has been backed by the following sponsors. Become a sponsor

@Borewit
Copy link
Collaborator

Borewit commented Feb 4, 2020

I need to fix an error message: Borewit/strtok3#147

Repository owner deleted a comment from vladfrangu Feb 4, 2020
@vladfrangu
Copy link
Contributor Author

@Borewit while you're here, hi!

I found an odd bug? (or misuse issue on my part) with peekBuffer, the tests for aac (fixture-id3v2.aac to be specific) always fail, and that happens only if the check for the AI files happens...
How can this.position be changed with a peekBuffer? Or am I missing something?

core.js Outdated Show resolved Hide resolved
core.js Outdated Show resolved Hide resolved
core.js Outdated

const buffer = Buffer.alloc(minimumBytes);

await tokenizer.peekBuffer(buffer, {position: options.position, length: 512, mayBeLess: true});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can safely use tokenizer.readBuffer (advancing the position) once you detected the is PDF kind.

What you can look for however is the string xmp:CreatorToolAdobe Illustrator 24.0 (Windows)</xmp:CreatorTool> or similar, which is near the start of the file, before all the data!

Do you think it possible to tokenizer.read... as long we read before all the data is met?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you're talking about readToken, correct? If so, we probably can read the buffer starting at the 1350s byte, read 512 (for safety) and check if the resulting string includes Adobe Illustrator? Is that what you're referring to?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not looked the format at all, but I was (naively) hoping there is a structured way of reading possible. Similar how the zip or png file is done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a screenshot about how it looks. The items marked with a square are those that can (and will) change; specifically the document title (file name you save it as) and some arbitrary length... This is why the code skips 1350 bytes (which gives plenty leeway even for long file names) and reads 512 bytes which should hopefully catch the CreatorTool

image

Copy link
Collaborator

@Borewit Borewit Feb 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my god, Adobe XMP using RDF, serialized with RDF/XML inside PDF. And maybe I did not notice all layers yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can somehow figure out the length, then ignore till the title start, ignore title length, then till CreatorTool, we could probably use readToken... But, at least in this case... It's easier to read 512 bytes than do that. Plus, this method makes it work with other file types, if needed! Thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a header or field length encoded, it could be worth to try to decode that. Your current solution simple, which is also worth something.

@Borewit
Copy link
Collaborator

Borewit commented Feb 4, 2020

I found an odd bug? (or misuse issue on my part) with peekBuffer, the tests for aac (fixture-id3v2.aac to be specific) always fail, and that happens only if the check for the AI files happens...
How can this.position be changed with a peekBuffer? Or am I missing something?

No, I think you have a point. Created issue: Borewit/strtok3#149

this.position does not change, it is probably the absolute position what is causes it.

core.js Outdated Show resolved Hide resolved
@Borewit Borewit added the enhancement Add new functionality label Feb 4, 2020
@Borewit Borewit merged commit 5eb8458 into sindresorhus:master Feb 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Add new functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for Adobe Illustrator file ai
2 participants