Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion about security #3394

Closed
craigfrancis opened this issue Sep 10, 2020 · 4 comments
Closed

Discussion about security #3394

craigfrancis opened this issue Sep 10, 2020 · 4 comments

Comments

@craigfrancis
Copy link

I'm glad to see Twig auto-escaping values by default (with html), but I'd like to discuss the future, and how Twig could prevent the most common security issues that affect the web today.

Going on the basis that Twig is used by thousands of developers, with many being unaware of the security issues they are introducing into their projects, I believe it's Twig's job to fix these issues (developer training has not worked, where many assume that html auto escaping solves everything, e.g. #2623).

The main issue is following up on the discussion on Contextual Auto-Escaping, where it's very easy for developers to get this wrong:

https://twigfiddle.com/ss6qc3

<a href="{{ url }}" data-value={{ value }}>My Link</a>

url: "javascript:alert(document.domain)"
value: "abc onclick=alert(document.location)"

In this example, the developer is not aware of how dangerous inline JavaScript is, and isn't using html_attr on an un-quoted attribute (this example "works" with the values you expect, but it's still wrong).

Secondly, I'm trying to introduce a concept into PHP where we can identify trusted developer strings (aka "literals", which have been defined within the PHP script) which need to be used for HTML templates, SQL strings, CLI strings; and keep those completely separate from user controlled (attacker tainted) strings. Where, in the case of HTML, we need a trusted templating system to carefully combine these 2 data sources in a safe/context-aware way.

@stof
Copy link
Member

stof commented Sep 10, 2020

Contextual auto-escaping is hard, because Twig is not aware of the format being output (it is not tied to outputting HTML). anything outside the Twig syntax markers ({{ }}, {% %} and {# #}) is opaque text as far as Twig is concerned. So that would require huge changes to Twig.

@craigfrancis
Copy link
Author

Where do you see Twig going in the future?

Will it be a proper/secure templating engine, or something that's very basic/simple (requiring the developer to know about, and be responsible for, escaping in every context).

In a way Twig already has some awareness of the context, as ['autoescape' => 'name'] sets the auto-escaping strategy based on the template filename; or it defaults to 'html'.

I'm concerned that many developers are using Twig as a way to get the job done, and they are assuming it's escaping everything correctly.

@fabpot
Copy link
Contributor

fabpot commented Sep 11, 2020

First, let's keep in mind that Twig is not an HTML template system, but a template system that can also use for HTML generation. Most people are using Twig for HTML, but it's not limited to HTML.

Regarding auto-escaping, and as @stof already mentioned, we would need to parse the HTML to understand the context, something we don't do at all right now.

A few years back, there was a discussion about this but nobody stepped up.

Go templating system has something like what you describe, but to my knowledge, no other Twig like templating system has it.

In any case, that would represent a huge amount of work and would probably slow down compilation a lot. That's not something I will work on in the forseeable future, so I'm closing here for now. If you want to give it a try, don't hesitate to ping me if I can help.

@fabpot fabpot closed this as completed Sep 11, 2020
@craigfrancis
Copy link
Author

I'm not familiar with how Twig works, but I've just made a quick mock up (only took a couple of hours to write, please don't use in production), but it's able to determine a variables parent tag, and if/which attribute it's in.

<?php

class html_parser {

    private $vars = [];
    private $tokens = [];

    public function __construct(\Twig\TokenStream $stream) {

        $html = '';
        $token_id = 0;
        $var_id = 0;
        $var_open = false;

        try {
            while (($token = $stream->next()) !== NULL) {

                $type = $token->getType();

                if ($type === \Twig\Token::VAR_START_TYPE) {
                    $var_id++;
                    $var_open = true;
                }

                if ($var_open) {
                    $this->tokens[$token_id] = $var_id;
                    if ($type === \Twig\Token::VAR_END_TYPE) {
                        $var_open = false;
                        $html .= '{VAR-' . $var_id . '}';
                    }
                } else {
                    $this->tokens[$token_id] = false;
                    if ($type === \Twig\Token::TEXT_TYPE) {
                        $html .= $token->getValue();
                    }
                }

                $token_id++;

            }
        } catch (\Twig\Error\SyntaxError $e) {
            // print_r($e);
        }

        libxml_use_internal_errors(true);

        $dom = new DomDocument();
        $dom->loadHTML($html);

        foreach (libxml_get_errors() as $error) {
            // print_r($error);
        }

        $this->nodes_parse($dom);

        // print_r($this->vars);
        // print_r($this->tokens);

    }

    public function token_context_get($token_id) {
        $variable_id = $this->tokens[$token_id];
        if ($variable_id !== false) {
            return $this->vars[$variable_id];
        }
    }

    private function nodes_parse($parent) {
        foreach ($parent->childNodes as $node) {
            if ($node->nodeType === XML_TEXT_NODE) {
                $this->var_search($node->wholeText, $parent->nodeName);
            } else {
                if ($node->hasAttributes()) {
                    foreach ($node->attributes as $attr) {
                        $this->var_search($attr->nodeValue, $node->nodeName, $attr->nodeName);
                    }
                }
                if ($node->hasChildNodes()) {
                    $this->nodes_parse($node);
                }
            }
        }
    }

    private function var_search($value, $tag = NULL, $attribute = NULL) {
        preg_match_all('/{VAR-([0-9]+)}/is', $value, $matches, PREG_SET_ORDER);
        foreach ($matches as $match) {
            $this->vars[$match[1]] = [$tag, $attribute];
        }
    }

}

?>

Tested with:

<?php

$html = '<!DOCTYPE html>
    <html lang="en-GB">
        <head>
            <meta charset="UTF-8" />
            <title>Example</title>
        </head>
        <body>
            <nav>
            {% for item in navigation %}
                <li><a href="{{ item.href }}" class={{ item.class }}>{{ item.text }}</a></li>
            {% endfor %}
            </nav>
            <h1>Title</h1>
            <p>Hi {{ name }}</p>
        </body>
    </html>';

$loader = new \Twig\Loader\ArrayLoader([]);
$twig = new \Twig\Environment($loader);

$stream = $twig->tokenize(new \Twig\Source($html, 'example'));

$start = microtime(true);

$k = 0;
$c = 1000;
while (++$k <= $c) {

    $stream_clone = clone $stream;

    $html_parser = new html_parser($stream_clone);

    if ($k == 1) {
        print_r($html_parser->token_context_get(8)); // ['a', 'href']
        print_r($html_parser->token_context_get(14)); // ['a', 'class']
        print_r($html_parser->token_context_get(20)); // ['a', NULL]
        print_r($html_parser->token_context_get(30)); // ['p', NULL]
    }

}

echo round(((microtime(true) - $start) / $c), 5);

?>

I'm getting 0.00024 seconds for this to complete, and I don't think it needs to be re-done when Twig is using cached templates.

Personally I'd like:

  1. Getting Twig to recognise the document type it's working with (maybe default ['autoescape' => 'name']).
  2. If working with a HTML document, use something like the above to determine if html or html_attr is appropriate (assuming html is dangerous, as it gives a false sense of security).
  3. In html_attr, depending on the tag/attribute, start to warn about unsafe values (we have to assume these have come from the end-user); e.g. <a href="{{ var }}"> should not start with javascript:.

I wouldn't expect the first version to be perfect... but if the basic parts are in place, security researchers can find exceptions, then the implementation in Twig can be changed/improved, and that improves the security for every single website that uses Twig.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants