Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is_file check on html #307

Closed
blaaat opened this issue Mar 11, 2019 · 12 comments
Closed

is_file check on html #307

blaaat opened this issue Mar 11, 2019 · 12 comments

Comments

@blaaat
Copy link

blaaat commented Mar 11, 2019

PHP Warning is_file(): open_basedir restriction in effect. File(<!DOCTYPE HTML>
<html>

phpwkhtmltopdf/src/Pdf.php:325 mikehaertl\wkhtmlto\Pdf::processOptions

On my production machine (CentOS7; PHP 7.2) PHP_MAXPATHLEN is quite high. Somehow the html header-footer-html triggers the openbase_dir error in the is_file call.

Maybe better to do the other checks (html/xml/url regex or strip_tags) before the is_file call?

@blaaat
Copy link
Author

blaaat commented Mar 11, 2019

For now; if anyone is interested; this is my work-around;

class Pdf extends \mikehaertl\wkhtmlto\Pdf
{
	public function setHeader($input)
	{
		return $this->setOptionHtml('header-html', $input);
	}
	public function setFooter($input)
	{
		return $this->setOptionHtml('footer-html', $input);
	}
	
	protected function setOptionHtml($option, $input) {
        $options[$option] = $this->_tmpFiles[] = new File($input, '.html', self::TMP_PREFIX, $this->tmpDir);
        $this->setOptions($options);
	}
}

@mikehaertl
Copy link
Owner

Maybe better to do the other checks (html/xml/url regex or strip_tags) before the is_file call?

It's tricky. We simply can't know, that what you passed is some static text or a file name. Using is_file() was the only option we saw. Note, that what you pass here doesn't need to be HTML. If you see a better way, feel free to suggest a change. Also see #100 where we already changed this.

@mikehaertl
Copy link
Owner

We could maybe add another check for HTML via the Regex we already have. This would at least help in the cases, where a full HTML document is passed.

@blaaat
Copy link
Author

blaaat commented Mar 11, 2019

Isn't the processInput() method suitable to reuse?

@mikehaertl
Copy link
Owner

Yeah, your're probably right. It's been a while since I wrote this. I'll try to modify things a little and find better method names. Will let you know when I've updated it. It would be great if you could then help testing.

@blaaat
Copy link
Author

blaaat commented Mar 11, 2019

Thanks! and off course!

@mikehaertl
Copy link
Owner

@blaaat I've created a MR with my refactorings, see the comments there. Could you give it a try?

As a side note: This changes the default behavior. Since we follow Semver this change will probably also require another major release.

@blaaat
Copy link
Author

blaaat commented Mar 11, 2019

Thanks, I'll and let you know.

The strip_tags way to determine HTML looks risky. A path might include a tag which could be stripped:

/home/files<2018>/test.html

I'd prefer the old regex (maybe even with a ^prefixed)

@blaaat
Copy link
Author

blaaat commented Mar 11, 2019

@blaaat I've created a MR with my refactorings, see the comments there. Could you give it a try?

Fixes my problem! Thanks.

As a side note: This changes the default behavior. Since we follow Semver this change will probably also require another major release.

I think it isn't necessary to change default behavior and still fix the problem. An extra regex (prefixed with ^) won't affect any path's, because I don't think a filesystem exists that allows a path starting with <html or <doctype

@mikehaertl
Copy link
Owner

mikehaertl commented Mar 11, 2019

The strip_tags way to determine HTML looks risky. A path might include a tag which could be stripped:

Good point, thanks. I somehow assumed < and > are not valid filename characters - but they are. So I've reverted to the old regex HTML check and removed the strip_tags() completely. I've not added the ^ though, as this would definitely break BC: Before it also accepted strings with content before <html>.

Right now it tries to stay away from is_file() as long as possible. There may still be situations where you hit your initial problem. So I've now also included the option to pass a File instance if all other methods fail.

$pdf->setOption([
    'header-html' => new File('some complex content', '.html'),
]);

See the updated MR here: #308

Maybe you can take another look?

@blaaat
Copy link
Author

blaaat commented Mar 12, 2019

Looks good and works perfect! Thanks.

Right now it tries to stay away from is_file() as long as possible. There may still be situations where you hit your initial problem. So I've now also included the option to pass a File instance if all other methods fail.

Nice solution!

mikehaertl added a commit that referenced this issue Mar 12, 2019
mikehaertl added a commit that referenced this issue Mar 12, 2019
Issue #307 Refactor check for temp file creation
@mikehaertl
Copy link
Owner

Great. Just released 2.4.0 including this change. Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants