Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add taint flows for remaining built-in pure functions such as utf8_decode, max($strings), etc #3636

Open
TysonAndre opened this issue Jun 22, 2020 · 10 comments

Comments

@TysonAndre
Copy link
Contributor

TysonAndre commented Jun 22, 2020

It seems like psalm only knows about functions in src/Psalm/Internal/Stubs/CoreGenericFunctions.phpstub . It may help to expand that list of functions to other pure functions such as json_encode(), base64_decode(), trim(), etc.

<?php
$globalVariable = $_GET['evil'];
eval('echo ' . json_encode($globalVariable));

This example still allows evaluating arbitrary code, such as echo "$(ls)"

https://github.com/phan/phan/blob/master/src/Phan/Plugin/Internal/UseReturnValuePlugin.php may be of help, because it lists many common "pure" functions that return a value based on their inputs.

Off-topic notes:

Aside: $_REQUEST combines $_GET and $_POST, so should that also be included as a source?
$_COOKIE can be set by browsers, so should that be considered for eval (but possibly not html)

Aside: json_encode() technically escapes html, because the only reasonable place to echo it is inside <script>. However, it might be a useful sanity check to assert that the file contains </script> after the echo line but before the next <script> substring, if any occur. Probably not worth the effort.

  • e.g. <p><?= json_encode($unsafe) ?></p> is potentially unsafe for emitting (malformed) html tags
muglug added a commit that referenced this issue Jun 22, 2020
@muglug
Copy link
Collaborator

muglug commented Jun 22, 2020

$_REQUEST combines $_GET and $_POST

Good point, I've fixed that

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Jun 22, 2020

Miscellaneous notes I might find useful if working on this in the future:

In src/Psalm/Internal/Analyzer/Statements/Expression/Call/FunctionCallAnalyzer.php, $function_storage->return_source_params and attributes such as $function_storage->added_taints and $function_storage->removed_taints seem to be how these propogate. (this may change)

  • Adding core stubs may not work in all cases, such as conflicting with php 7.x or 8.x adding a brand new param? Not familiar with psalm's handling of core stubs or best practices.

    src/Psalm/PluginRegistrationSocket.php seems to provide one way for end users to register stubs with rare or internal PECL modules (config->addStubFile()), and those are handled the same way as core stubs

  • Alternately, a plugin could theoretically provide just the additional properties to inherit taintedness from and the removed_taints/added_taints for various functions.

@muglug
Copy link
Collaborator

muglug commented Jun 22, 2020

I improved things a little in e8be2c5, adding support for (l|r)?trim and explode, which lead to the discovery of 12 new XSS bugs in Vimeo's code.

@muglug
Copy link
Collaborator

muglug commented Jun 22, 2020

json_encode is this interesting case where the victim of tainted input is normally going to be a Javascript app that Psalm doesn't know about (at least that's the case at Vimeo).

It might be useful to generate a map of all tainted json_encoded data that could be passed to a JS taint analysis tool.

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Jun 23, 2020

Some code to generate candidates is below - this excludes functions that are possibly impure depending on their args. Some obvious ones are commented out.

chop() is an alias of rtrim().

I don't know how taint detection currently works with array keys/values or how it is meant to work

<?php

use Phan\Plugin\Internal\UseReturnValuePlugin;
use Phan\Language\FQSEN\FullyQualifiedFunctionName;
use Phan\Language\UnionType;
use Phan\Language\Element\FunctionInterface;

require_once dirname(__DIR__) . '/src/Phan/Bootstrap.php';
$code_base = require(dirname(__DIR__) . '/src/codebase.php');
$unsafe_types = UnionType::fromFullyQualifiedRealString('string|array');

$isPotentialTaintPropogator = function (FunctionInterface $function) use ($code_base, $unsafe_types): bool {
    $function_return_type = $function->getUnionType();
    if (!$function_return_type->canCastToUnionType($unsafe_types)) {
        return false;
    }
    foreach ($function->getParameterList() as $param) {
        if ($param->getUnionType()->canCastToUnionType($unsafe_types)) {
            return true;
        }
    }
    return false;
};

foreach (UseReturnValuePlugin::HARDCODED_FQSENS as $fqsen_string => $value) {
    if (strpos($fqsen_string, '::') !== false) {
        continue;
    }
    if ($value !== true) {
        continue;
    }
    $fqsen = FullyQualifiedFunctionName::fromFullyQualifiedString($fqsen_string);
    if (!$code_base->hasFunctionWithFQSEN($fqsen)) {
        continue;
    }
    $function = $code_base->getFunctionByFQSEN($fqsen);
    // echo "looking up $fqsen\n";
    if (!$isPotentialTaintPropogator($function)) {
        continue;
    }
    echo "$function\n";
}
<?php

// Limitations:
// - Excludes uncommon functions like hebrev()
// - Excludes potentially impure functions such as var_export(), highlight_string()

// Returns original string if no translation is found
function _(string $message) : string;
// prefer htmlentities/escapeshellarg()
function addcslashes(string $str, string $charlist) : string;
function addslashes(string $str) : string;
// Taint checking probably won't be able to check if keys are tainted.
function array_change_key_case(array $input, int $case = unknown) : associative-array<mixed,mixed>;
function array_chunk(array $input, int $size, bool $preserve_keys = unknown) : list<array>;
function array_column(array $array, mixed $column_key, mixed $index_key = unknown) : array;
function array_combine(int[]|string[] $keys, array $values) : associative-array<mixed,mixed>|false;
function array_count_values(array $input) : associative-array<mixed,int>;
function array_diff_assoc(array $arr1, array $arr2, array ...$args) : associative-array<mixed,mixed>;
function array_diff_key(array $arr1, array $arr2, array ...$args) : associative-array<mixed,mixed>;
function array_diff(array $arr1, array $arr2, array ...$args) : associative-array<mixed,mixed>;
function array_fill_keys(array $keys, mixed $val) : array;
function array_fill(int $start_key, int $num, mixed $val) : array<int,mixed>;
function array_filter(array $input, callable(mixed):bool|callable(mixed,mixed):bool $callback = unknown, int $flag = unknown) : associative-array<mixed,mixed>;
function array_flip(array $input) : associative-array<mixed,int>|associative-array<mixed,string>;
function array_intersect_assoc(array $arr1, array $arr2, array ...$args) : associative-array<mixed,mixed>;
function array_intersect_key(array $arr1, array $arr2, array ...$args) : associative-array<mixed,mixed>;
function array_intersect(array $arr1, array $arr2, array ...$args) : associative-array<mixed,mixed>;
function array_key_first(array $array) : int|null|string;
function array_key_last(array $array) : int|null|string;
function array_keys(array $input, mixed $search_value = unknown, bool $strict = unknown) : list<int>|list<string>;
function array_map(?callable $callback, array $input1, array ...$args) : array;
function array_merge_recursive(array $arr1, array ...$args) : array;
function array_merge(array $arr1, array ...$args) : array;
function array_pad(array $input, int $pad_size, mixed $pad_value) : array;
function array_rand(array $input, int $num_req) : array<int,int>|array<int,string>|int|string;
function array_reduce(array $input, callable(mixed,mixed):mixed $callback, mixed $initial = unknown) : mixed;
function array_replace_recursive(array $arr1, array $arr2, array ...$args) : array;
function array_replace(array $arr1, array $arr2, array ...$args) : array;
function array_reverse(array $input, bool $preserve = unknown) : array;
function array_search(mixed $needle, array $haystack, bool $strict = unknown) : false|int|string;
function array_slice(array $input, int $offset, ?int $length = null, bool $preserve_keys = unknown) : array;
function array_unique(array $input, int $sort_flags = unknown) : associative-array<mixed,mixed>;
function array_values(array $input) : list<mixed>;
function base64_decode(string $str, bool $strict = unknown) : false|string;
// function base64_encode(string $str) : string;
// function base_convert(string $number, int $frombase, int $tobase) : string;
function basename(string $path, string $suffix = unknown) : string;
// function bin2hex(string $data) : string;
function bzcompress(string $source, int $blocksize100k = unknown, int $workfactor = unknown) : int|string;
function bzdecompress(string $source, int $small = unknown) : int|string;
function chop(string $str, string $character_mask = unknown) : string;
function chunk_split(string $str, int $chunklen = unknown, string $ending = unknown) : string;
// function class_implements(object|string $what, bool $autoload = unknown) : array<string,class-string>|false;
// function class_parents(object|string $instance, bool $autoload = unknown) : array<string,class-string>|false;
function compact(array|string $var_name, array|string ...$var_names) : array;
function convert_cyr_string(string $str, string $from, string $to) : string;
function convert_uudecode(string $data) : string;
function convert_uuencode(string $data) : string;
// function count_chars(string $input, int $mode = unknown) : array<int,int>|false|string;
function current(array|object $array_arg) : false|mixed;
function date(string $format, int $timestamp = unknown) : string;
function dirname(string $path, int $levels = unknown) : string;
function each(array &$arr) : array;
// eval safe?
function escapeshellarg(string $arg) : string;
function explode(string $separator, string $str, int $limit = unknown) : list<string>;
// function fgetcsv(resource $fp, int $length = unknown, string $delimiter = unknown, string $enclosure = unknown, string $escape = unknown) : false|list<?string>;
// function file(string $filename, int $flags = unknown, resource $context = unknown) : false|list<string>;

// filter_input filter types depends on $type/$filter
// function filter_input_array(int $type, array|int $definition = unknown, bool $add_empty = unknown) : false|mixed;
// function filter_input(int $type, string $variable_name, int $filter = unknown, array|int $options = unknown) : false|mixed;
// function filter_var(mixed $variable, int $filter = unknown, mixed $options = unknown) : false|mixed;
// function filter_var_array
// function get_cfg_var(string $option_name) : array[]|false|string|string[];
// function get_class_methods(mixed $class) : list<string>;
// function getenv(string $varname, bool $local_only = unknown) : false|string;
// function getimagesize(string $imagefile, array &$info = unknown) : false|int[]|string[];
// function get_parent_class(mixed $object = unknown) : class-string|false;
function gettext(string $msgid) : string;
// function gettype(mixed $var) : string;
// Can unescape $format with backslashes if user controlled
function gmdate(string $format, int $timestamp = unknown) : false|string;
function gzcompress(string $data, int $level = unknown, int $encoding = unknown) : false|string;
function gzdecode(string $data, int $length = unknown) : false|string;
function gzdeflate(string $data, int $level = unknown, int $encoding = unknown) : false|string;
function gzencode(string $data, int $level = unknown, int $encoding_mode = unknown) : false|string;
function gzinflate(string $data, int $length = unknown) : false|string;
function gzuncompress(string $data, int $length = unknown) : false|string;
// unsafe with $raw_output = true
// function hash_hmac(string $algo, string $data, string $key, bool $raw_output = unknown) : string;
// function hash_pbkdf2(string $algo, string $password, string $salt, int $iterations, int $length = unknown, bool $raw_output = unknown) : string;
// function hash(string $algo, string $data, bool $raw_output = unknown) : string;
function hex2bin(string $data) : false|string;
function htmlentities(string $string, int $quote_style = unknown, string $encoding = unknown, bool $double_encode = unknown) : string;
function html_entity_decode(string $string, int $quote_style = unknown, string $encoding = unknown) : string;
function htmlspecialchars_decode(string $string, int $quote_style = unknown) : string;
function htmlspecialchars(string $string, int $quote_style = unknown, string $encoding = unknown, bool $double_encode = unknown) : string;
function http_build_query(array|object $querydata, string $prefix = unknown, string $arg_separator = unknown, int $enc_type = unknown) : string;
function iconv(string $in_charset, string $out_charset, string $str) : false|string;
function implode(string $glue, array $pieces) : string;
//function inet_ntop(string $in_addr) : false|string;
//function inet_pton(string $ip_address) : false|string;
// function ini_get(string $varname) : false|string;
function join(string $glue, array $pieces) : string;
function json_decode(string $json, bool $assoc = unknown, int $depth = unknown, int $options = unknown) : mixed;
function json_encode(mixed $data, int $options = unknown, int $depth = unknown) : false|string;
function key(array|object $array_arg) : int|null|string;
function lcfirst(string $str) : string;
// function long2ip(int|string $proper_address) : string;
// already done
function ltrim(string $str, string $character_mask = unknown) : string;
// max() also works on strings.
function max(array $arg1) : mixed;
function mb_convert_case(string $sourcestring, int $mode, string $encoding = unknown) : false|string;
function mb_convert_encoding(string $str, string $to_encoding, string|string[] $from_encoding = unknown) : false|string;
function mb_detect_encoding(string $str, mixed $encoding_list = unknown, bool $strict = unknown) : false|string;
function mb_strtolower(string $str, string $encoding = unknown) : false|string;
function mb_substr(string $str, int $start, ?int $length = null, string $encoding = unknown) : false|string;
// Probably unrealistically wrong if $raw_output = true and sent to a sink
// function md5_file(string $filename, bool $raw_output = unknown) : false|string;
// function md5(string $str, bool $raw_output = unknown) : string;

// metaphone filters out non-letters?
// function metaphone(string $text, int $phones = unknown) : false|string;
function min(array $arg1) : mixed;
function ngettext(string $msgid1, string $msgid2, int $n) : string;
function nl2br(string $str, bool $is_xhtml = unknown) : string;
// $key is probably secret from application
// function openssl_encrypt(string $data, string $method, string $key, int $options = unknown, string $iv = unknown, string &$tag = unknown, string $aad = unknown, int $tag_length = unknown) : false|string;
function pack(string $format, mixed ...$args) : string;
// function parse_ini_file(string $filename, bool $process_sections = unknown, int $scanner_mode = unknown) : array|false;
// depends on arguments
// function parse_url(string $url, int $url_component = unknown) : array{scheme?:string,host?:string,port?:int,user?:string,pass?:string,path?:string,query?:string,fragment?:string}|false|int|null|string;
// function pathinfo(string $path, int $options = unknown) : array|string;
// function php_uname(string $mode = unknown) : string;
// function phpversion(string $extension = unknown) : false|string;
function preg_filter(mixed $regex, mixed $replace, mixed $subject, int $limit = unknown, int &$count = unknown) : string|string[];
function preg_grep(string $regex, array $input, int $flags = unknown) : array;
function preg_quote(string $str, string $delim_char = unknown) : string;
function preg_replace_callback(array|string $regex, callable(array):string $callback, array|string $subject, int $limit = unknown, int &$count = unknown) : string|string[];
function preg_replace_callback_array(array<string,callable(array):string> $pattern, array|string $subject, int $limit = unknown, int &$count = unknown) : string|string[];
function preg_replace(array|string $regex, array|string $replace, array|string $subject, int $limit = unknown, int &$count = unknown) : string|string[];
function preg_split(string $pattern, string $subject, ?int $limit = null, int $flags = unknown) : list<string>;
function quoted_printable_decode(string $str) : string;
function quoted_printable_encode(string $str) : string;
function quotemeta(string $str) : string;
// function range(mixed $low, mixed $high, float|int $step = unknown) : array;
function rawurldecode(string $str) : string;
function rawurlencode(string $str) : string;
function readlink(string $filename) : false|string;
function realpath(string $path) : false|string;
// already done
function rtrim(string $str, string $character_mask = unknown) : string;
function serialize(mixed $variable) : string;
// depends on raw_output, but impractical
// function sha1(string $str, bool $raw_output = unknown) : string;
// function soundex(string $str) : string;
function sprintf(string $format, float|int|string ...$vars) : string;
// function stat(string $filename) : array|false;
function strchr(string $haystack, int|string $needle, bool $before_needle = unknown) : false|string;
// function stream_resolve_include_path(string $filename) : false|string;
function strftime(string $format, int $timestamp = unknown) : string;
function stripcslashes(string $str) : string;
function stripslashes(string $str) : string;
function strip_tags(string $str, string|string[] $allowable_tags = unknown) : string;
function str_ireplace(array|string $search, array|string $replace, array|string $subject, int &$replace_count = unknown) : string|string[];
function stristr(string $haystack, int|string $needle, bool $before_needle = unknown) : false|string;
function str_pad(string $input, int $pad_length, string $pad_string = unknown, int $pad_type = unknown) : string;
function strpbrk(string $haystack, string $char_list) : false|string;
function strrchr(string $haystack, int|string $needle) : false|string;
function str_repeat(string $input, int $multiplier) : string;
function str_replace(array|string $search, array|string $replace, array|string $subject, int &$replace_count = unknown) : string|string[];
function strrev(string $str) : string;
function str_rot13(string $str) : string;
function str_split(string $str, int $split_length = unknown) : list<string>;
// depends on $needle (and $haystack if $before_needle)
function strstr(string $haystack, int|string $needle, bool $before_needle = unknown) : false|string;
function strtolower(string $str) : string;
function strtoupper(string $str) : string;
function strtr(string $str, string $from, string $to) : string;
function strval(mixed $var) : string;
function str_word_count(string $string, int $format = unknown, string $charlist = unknown) : array<int,string>|int;
function substr_replace(string|string[] $str, mixed $repl, mixed $start, mixed $length = unknown) : string|string[];
function substr(string $str, int $start, int $length = unknown) : false|string;
// mostly html safe but can contain " and >?
function tempnam(string $dir, string $prefix) : false|string;
function token_get_all(string $source, int $flags = unknown) : list<array{0:int,1:string,2:int}>|list<string>;
function trim(string $str, string $character_mask = unknown) : string;
function ucfirst(string $str) : string;
function ucwords(string $str, string $delims = unknown) : string;
function uniqid(string $prefix = unknown, bool $more_entropy = unknown) : string;
function unpack(string $format, string $data, int $offset = unknown) : array|false;
function urldecode(string $str) : string;
// mostly safe
// function urlencode(string $str) : string;
function utf8_decode(string $data) : string;
function utf8_encode(string $data) : string;
function vsprintf(string $format, array $args) : string;
function wordwrap(string $str, int $width = unknown, string $break = unknown, bool $cut = unknown) : string;
function zlib_decode(string $data, int $max_decoded_len = unknown) : string;
function zlib_encode(string $data, int $encoding, int|string $level = unknown) : string;

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Jun 24, 2020

And then there's other helpers like UConverter->convert().

I wonder if fuzzing would help build a larger list ahead of time - e.g. in docker, instantiate classes, call methods to check for inputs that would emit < or " in the return value or result, and terminate abnormally, to cover rarer code such as UConverter::convert

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Jun 24, 2020

A second pass at adding functions to src/Psalm/Internal/Stubs/CoreGenericFunctions.phpstub based on the earlier snippet - This helps with join(), strval(), etc, and probably has some incorrect entries

Because of the missing type information, it may cause issues, and I'm not sure how psalm will handle the php 8.0 changes (e.g. dropping support for int $needle in string functions such as strpos).

It could be put into a plugin until those issues are worked out, though

/**
 * @psalm-pure
 * @psalm-flow ($message) -> return
 */
function _(string $message) : string {}
// prefer htmlentities/escapeshellarg()
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function addcslashes(string $str, string $charlist) : string {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function addslashes(string $str) : string {}
// Taint checking probably won't be able to check if keys are tainted.
// /** @return associative-array<mixed, mixed> */
// function array_change_key_case(array $input, int $case = 0) : associative-array<mixed,mixed> {}
// function array_chunk(array $input, int $size, bool $preserve_keys = false) : list<array> {}
// function array_column(array $array, $column_key, $index_key = null) : array {}
// function array_combine(int[]|string[] $keys, array $values) : associative-array<mixed,mixed> {}
// function array_count_values(array $input) : associative-array<mixed,int> {}
// function array_diff_assoc(array $arr1, array $arr2, array ...$args) : associative-array<mixed,mixed> {}
// function array_diff_key(array $arr1, array $arr2, array ...$args) : associative-array<mixed,mixed> {}
// function array_diff(array $arr1, array $arr2, array ...$args) : associative-array<mixed,mixed> {}
// function array_fill_keys(array $keys, $val) : array {}
// function array_fill(int $start_key, int $num, $val) : array<int,mixed> {}
// function array_filter(array $input, callable(mixed):bool|callable(mixed,mixed):bool $callback = null, int $flag = 0) : associative-array<mixed,mixed> {}
// function array_flip(array $input) : associative-array<mixed,int>|associative-array<mixed,string> {}
// function array_intersect_assoc(array $arr1, array $arr2, array ...$args) : associative-array<mixed,mixed> {}
// function array_intersect_key(array $arr1, array $arr2, array ...$args) : associative-array<mixed,mixed> {}
// function array_intersect(array $arr1, array $arr2, array ...$args) : associative-array<mixed,mixed> {}
// function array_key_first(array $array) : int|null|string {}
// function array_key_last(array $array) : int|null|string {}
// function array_keys(array $input, $search_value = unknown, bool $strict = false) : list<int>|list<string> {}
// function array_map(?callable $callback, array $input1, array ...$args) : array {}
// function array_merge_recursive(array $arr1, array ...$args) : array {}
// function array_merge(array $arr1, array ...$args) : array {}
// function array_pad(array $input, int $pad_size, $pad_value) : array {}
// function array_rand(array $input, int $num_req) : array<int,int>|array<int,string>|int|string {}
// function array_reduce(array $input, callable(mixed,mixed):$callback, $initial = null) {}
// function array_replace_recursive(array $arr1, array $arr2, array ...$args) : array {}
// function array_replace(array $arr1, array $arr2, array ...$args) : array {}
// function array_reverse(array $input, bool $preserve = false) : array {}
// function array_search($needle, array $haystack, bool $strict = false) : false|int|string {}
// function array_slice(array $input, int $offset, ?int $length = null, bool $preserve_keys = false) : array {}
// function array_unique(array $input, int $sort_flags = 2) : associative-array<mixed,mixed> {}
// function array_values(array $input) : list<mixed> {}
/**
 * @psalm-pure
 *
 * @return string|false
 *
 * @psalm-flow ($str) -> return
 */
function base64_decode(string $str, bool $strict = false) {}
// function base64_encode(string $str) : string {}
// function base_convert(string $number, int $frombase, int $tobase) : string {}

/**
 * @psalm-pure
 * @psalm-flow ($path) -> return
 */
function basename(string $path, string $suffix = '') : string {}
// function bin2hex(string $data) : string {}
/**
 * @return int|string
 * @psalm-pure
 * @psalm-flow ($source) -> return
 */
function bzcompress(string $source, int $blocksize100k = 4, int $workfactor = 0) {}
/**
 * @return int|string
 * @psalm-pure
 * @psalm-flow ($source) -> return
 */
function bzdecompress(string $source, int $small = 0) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function chop(string $str, string $character_mask = '

' . "\0" . '') : string {}
/**
 * @psalm-pure
 * @psalm-flow ($str, $ending) -> return
 */
function chunk_split(string $str, int $chunklen = 76, string $ending = '
') : string {}
// function class_implements(object|string $what, bool $autoload = unknown) : array<string,class-string>|false {}
// function class_parents(object|string $instance, bool $autoload = unknown) : array<string,class-string>|false {}
/**
 * @psalm-pure
 * @psalm-flow ($var_name, $var_names) -> return
 */
function compact($var_name, ...$var_names) : array {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function convert_cyr_string(string $str, string $from, string $to) : string {}
/**
 * @psalm-pure
 * @psalm-flow ($data) -> return
 */
function convert_uudecode(string $data) : string {}
/**
 * @psalm-pure
 * @psalm-flow ($data) -> return
 */
function convert_uuencode(string $data) : string {}
// function count_chars(string $input, int $mode = unknown) : array<int,int>|false|string {}
/**
 * @param object|array $array_arg
 * @psalm-pure
 * @psalm-flow ($array_arg) -> return
 */
function current($array_arg) {}
/**
 * @psalm-pure
 * @psalm-flow ($path) -> return
 */
function dirname(string $path, int $levels = 1) : string {}
/**
 * @psalm-taint-specialize
 * @psalm-flow ($arr) -> return
 */
function each(array &$arr) : array {}
// eval safe?
/**
 * @psalm-pure
 * @psalm-flow ($arg) -> return
 * @psalm-taint-escape shell
 */
function escapeshellarg(string $arg) : string {}
// function fgetcsv(resource $fp, int $length = unknown, string $delimiter = unknown, string $enclosure = unknown, string $escape = unknown) : false|list<?string> {}
// function file(string $filename, int $flags = unknown, resource $context = unknown) : false|list<string> {}

// filter_input filter types depends on $type/$filter
// function filter_input_array(int $type, array|int $definition = unknown, bool $add_empty = unknown) : false|mixed {}
// function filter_input(int $type, string $variable_name, int $filter = unknown, array|int $options = unknown) : false|mixed {}
// function filter_var($variable, int $filter = unknown, $options = unknown) : false|mixed {}
// function filter_var_array
// function get_cfg_var(string $option_name) : array[]|false|string|string[] {}
// function get_class_methods($class) : list<string> {}
// function getenv(string $varname, bool $local_only = unknown) : false|string {}
// function getimagesize(string $imagefile, array &$info = unknown) : false|int[]|string[] {}
// function get_parent_class($object = unknown) : class-string|false {}
function gettext(string $msgid) : string {}
// function gettype($var) : string {}
// Can unescape $format with backslashes if user controlled
function gmdate(string $format, int $timestamp = null) : string {}
/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($data) -> return
 */
function gzcompress(string $data, int $level = -1, int $encoding = 15) {}
/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($data) -> return
 */
function gzdecode(string $data, int $length = 0) {}
/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($data) -> return
 */
function gzdeflate(string $data, int $level = -1, int $encoding = -15) {}
/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($data) -> return
 */
function gzencode(string $data, int $level = -1, int $encoding_mode = 31) {}
/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($data) -> return
 */
function gzinflate(string $data, int $length = 0) {}
/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($data) -> return
 */
function gzuncompress(string $data, int $length = 0) {}
// unsafe with $raw_output = true
// function hash_hmac(string $algo, string $data, string $key, bool $raw_output = unknown) : string {}
// function hash_pbkdf2(string $algo, string $password, string $salt, int $iterations, int $length = unknown, bool $raw_output = unknown) : string {}
// function hash(string $algo, string $data, bool $raw_output = unknown) : string {}
/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($data) -> return
 */
function hex2bin(string $data) {}
/**
 * @psalm-pure
 * @param array|object $querydata
 * @psalm-flow ($querydata) -> return
 */
function http_build_query($querydata, string $prefix = '', string $arg_separator = '', int $enc_type = 1) : string {}
/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($str) -> return
 */
function iconv(string $in_charset, string $out_charset, string $str) {}
//function inet_ntop(string $in_addr) {}
//function inet_pton(string $ip_address) {}
// function ini_get(string $varname) {}
/**
 * @psalm-pure
 * @psalm-flow ($glue, $pieces) -> return
 */
function join(string $glue, array $pieces) : string {}
/**
 * @psalm-pure
 * @psalm-flow ($data) -> return
 * TODO What taints does this unescape? (\uxxxx can quote)
 */
function json_decode(string $json, bool $assoc = null, int $depth = 512, int $options = 0) {}
/**
 * @psalm-pure
 * @psalm-flow ($data) -> return
 * @psalm-taint-escape html
 * @return false|string
 */
function json_encode($data, int $options = 0, int $depth = 512) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function lcfirst(string $str) : string {}
// function long2ip(int|string $proper_address) : string {}
// already done
// max() also works on strings.
/**
 * @psalm-pure
 * @psalm-flow ($arg1) -> return
 */
function max(array $arg1) {}
/**
 * @psalm-pure
 * @psalm-flow ($sourcestring) -> return
 */
function mb_convert_case(string $sourcestring, int $mode, string $encoding = null) {}
/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($str) -> return
 */
function mb_convert_encoding(string $str, string $to_encoding, $from_encoding = false) {}
/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($str) -> return
 */
function mb_detect_encoding(string $str, $encoding_list = null, bool $strict = false) {}
/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($str) -> return
 */
function mb_strtolower(string $str, string $encoding = null) {}
/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($str) -> return
 */
function mb_substr(string $str, int $start, ?int $length = null, string $encoding = '') {}
// Probably unrealistically wrong if $raw_output = true and sent to a sink
// function md5_file(string $filename, bool $raw_output = unknown) {}
// function md5(string $str, bool $raw_output = unknown) : string {}

// metaphone filters out non-letters?
// function metaphone(string $text, int $phones = unknown) {}

/**
 * @psalm-pure
 * @return false|string
 * @psalm-flow ($arg1) -> return
 */
function min(array $arg1) {}
/**
 * @psalm-pure
 * @return string
 * @psalm-flow ($str) -> return
 */
function ngettext(string $msgid1, string $msgid2, int $n) : string {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function nl2br(string $str, bool $is_xhtml = false) : string {}
// $key is probably secret from application
// function openssl_encrypt(string $data, string $method, string $key, int $options = unknown, string $iv = unknown, string &$tag = unknown, string $aad = unknown, int $tag_length = unknown) {}
// function pack(string $format, mixed ...$args) : string {}
// function parse_ini_file(string $filename, bool $process_sections = unknown, int $scanner_mode = unknown) {}
// depends on arguments
// function parse_url(string $url, int $url_component = unknown) : array{scheme?:string,host?:string,port?:int,user?:string,pass?:string,path?:string,query?:string,fragment?:string}|false|int|null|string {}
// function pathinfo(string $path, int $options = unknown) {}
// function php_uname(string $mode = unknown) : string {}
// function phpversion(string $extension = unknown) {}
/**
 * @psalm-pure
 * @psalm-flow ($subject) -> return
 */
function preg_filter($regex, $replace, $subject, int $limit = -1, int &$count = null) {}
/**
 * @psalm-pure
 * @psalm-flow ($subject) -> return
 */
function preg_replace_callback_array(array $pattern, $subject, int $limit = -1, int &$count = null) {}
/**
 * @psalm-pure
 * @psalm-flow ($subject) -> return
 */
function preg_split(string $pattern, string $subject, ?int $limit = -1, int $flags = 0) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function quoted_printable_decode(string $str) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function quoted_printable_encode(string $str) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function quotemeta(string $str) {}
// function range($low, $high, float|int $step = unknown) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 * @psalm-taint-unescape html
 */
function rawurldecode(string $str) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 * @psalm-taint-escape html
 */
function rawurlencode(string $str) {}
// not pure
// function readlink(string $filename) {}
// already done
/**
 * @psalm-pure depending on definition
 * @psalm-flow ($variable) -> return
 */
function serialize($variable) {}
// depends on raw_output, but impractical
// function sha1(string $str, bool $raw_output = unknown) {}
// function soundex(string $str) {}
// function stat(string $filename) {}
/**
 * @psalm-pure
 * @psalm-flow ($needle) -> return
 * TODO support before_needle
 */
function strchr(string $haystack, $needle, bool $before_needle = false) {}
// function stream_resolve_include_path(string $filename) {}
/**
 * @psalm-pure
 * @psalm-flow ($format) -> return
 * Backslashes can be used for special characters
 */
function strftime(string $format, int $timestamp = null) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function stripcslashes(string $str) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function stripslashes(string $str) {}
/**
 * @psalm-pure
 * @psalm-flow ($replace, $subject) -> return
 */
function str_ireplace($search, $replace, $subject, int &$replace_count = 0) {}
/**
 * @psalm-pure
 * @psalm-flow ($needle) -> return
 */
function stristr(string $haystack, $needle, bool $before_needle = false) {}
/**
 * @psalm-pure
 * @psalm-flow ($input, $pad_string) -> return
 */
function str_pad(string $input, int $pad_length, string $pad_string = '', int $pad_type = 0) {}
/**
 * @psalm-pure
 * @psalm-flow ($haystack) -> return
 */
function strpbrk(string $haystack, string $char_list) {}
/**
 * @psalm-pure
 * @psalm-flow ($haystack, $needle) -> return
 */
function strrchr(string $haystack, $needle) {}
/**
 * @psalm-pure
 * @psalm-flow ($input) -> return
 */
function str_repeat(string $input, int $multiplier) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function strrev(string $str) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function str_rot13(string $str) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function str_split(string $str, int $split_length = 1) {}
// depends on $needle (and $haystack if $before_needle)
/**
 * @psalm-pure
 * @psalm-flow ($needle) -> return
 * TODO support before_needle=true
 */
function strstr(string $haystack, string $needle, bool $before_needle = false) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function strtr(string $str, string $from, string $to) {}
/**
 * @psalm-pure
 * @psalm-flow ($var) -> return
 */
function strval($var) {}
/**
 * @psalm-pure
 * @psalm-flow ($string) -> return
 */
function str_word_count(string $string, int $format = 0, string $charlist = '') {}
/**
 * @psalm-pure
 * @psalm-flow ($str, $repl) -> return
 */
function substr_replace($str, $repl, $start, $length = 0) {}
// mostly html safe but can contain " and >?
/**
 * @psalm-pure
 * @psalm-flow ($dir, $prefix) -> return
 */
function tempnam(string $dir, string $prefix) {}
/**
 * @psalm-pure
 * @psalm-flow ($source) -> return
 */
function token_get_all(string $source, int $flags = 0) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function ucfirst(string $str) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function ucwords(string $str, string $delims = ' 	
') {}
/**
 * @psalm-pure
 * @psalm-flow ($prefix) -> return
 */
function uniqid(string $prefix = '', bool $more_entropy = false) {}
/**
 * @psalm-pure
 * TODO
 */
function unpack(string $format, string $data, int $offset = 0) {}
/**
 * TODO: This also may add taints other than html?
 * @psalm-pure
 * @psalm-flow ($str) -> return
 * @psalm-taint-unescape html
 */
function urldecode(string $str) {}
// mostly safe
// function urlencode(string $str) {}
/**
 * @psalm-pure
 * @psalm-flow ($data) -> return
 */
function utf8_decode(string $data) {}
/**
 * @psalm-pure
 * @psalm-flow ($data) -> return
 */
function utf8_encode(string $data) {}
/**
 * @psalm-pure
 * @psalm-flow ($format, $args) -> return
 */
function vsprintf(string $format, array $args) {}
/**
 * @psalm-pure
 * @psalm-flow ($str) -> return
 */
function wordwrap(string $str, int $width = 75, string $break = '
', bool $cut = false) {}
/**
 * @psalm-pure
 * @psalm-flow ($data) -> return
 */
function zlib_decode(string $data, int $max_decoded_len = 0) {}
/**
 * @psalm-pure
 * @psalm-flow ($data) -> return
 */
function zlib_encode(string $data, int $encoding, $level = -1) {}

@TysonAndre TysonAndre changed the title Support taint analysis for remaining common pure functions such as json_encode() Support taint analysis for remaining common pure functions such as json_decode()/strval()/join() Jun 24, 2020
LukasReschke added a commit to LukasReschke/psalm that referenced this issue Nov 16, 2020
This adds string functions from
https://www.php.net/manual/en/ref.strings.php

This commit adds the flows for functions from "addcslashes" to "sprintf".
More are to follow in later commits.

Ref vimeo#3636
LukasReschke added a commit to LukasReschke/psalm that referenced this issue Nov 16, 2020
This adds string functions from
https://www.php.net/manual/en/ref.strings.php

This commit adds the flows for functions from "addcslashes" to "sprintf".
More are to follow in later commits.

Ref vimeo#3636
muglug pushed a commit that referenced this issue Nov 16, 2020
This adds string functions from
https://www.php.net/manual/en/ref.strings.php

This commit adds the flows for functions from "addcslashes" to "sprintf".
More are to follow in later commits.

Ref #3636
LukasReschke added a commit to LukasReschke/psalm that referenced this issue Nov 17, 2020
This adds string functions from
https://www.php.net/manual/en/ref.strings.php

This commit adds the flows for functions from "addcslashes" to "sprintf".
More are to follow in later commits.

Ref vimeo#3636
LukasReschke added a commit to LukasReschke/psalm that referenced this issue Nov 17, 2020
LukasReschke added a commit to LukasReschke/psalm that referenced this issue Nov 17, 2020
LukasReschke added a commit to LukasReschke/psalm that referenced this issue Nov 21, 2020
muglug added a commit that referenced this issue Nov 21, 2020
* Add string functions from sscanf to wordwrap

This should conclude all string functions from https://www.php.net/manual/en/book.strings.php

Continuation of #4576

Ref #3636

* Add StrTrReturnTypeProvider

* Fix psalm error

* phpcs

* Line length

* Ignore false return on vsprintf

Co-authored-by: Matthew Brown <[email protected]>
danog pushed a commit to danog/psalm that referenced this issue Jan 29, 2021
This adds string functions from
https://www.php.net/manual/en/ref.strings.php

This commit adds the flows for functions from "addcslashes" to "sprintf".
More are to follow in later commits.

Ref vimeo#3636
danog pushed a commit to danog/psalm that referenced this issue Jan 29, 2021
* Add string functions from sscanf to wordwrap

This should conclude all string functions from https://www.php.net/manual/en/book.strings.php

Continuation of vimeo#4576

Ref vimeo#3636

* Add StrTrReturnTypeProvider

* Fix psalm error

* phpcs

* Line length

* Ignore false return on vsprintf

Co-authored-by: Matthew Brown <[email protected]>
@orklah
Copy link
Collaborator

orklah commented Nov 7, 2021

I believe all functions in callmap are assumed pure unless they're listed in an 'impure list' array somewhere. Is there still a point for that issue I missed?

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Nov 13, 2021

My original request was to add the @psalm-flow internal type stubs to indicate how taint flows from inputs to outputs of those functions which weren't there at the time.

danog@4de2bf8 and other associated commits did that for the most commonly used ones.

However, some remaining less common things such as echo utf8_decode($_GET['foo']); from the list in my comment aren't detected and don't have stubs like others in stubs/CoreGenericFunctions.phpstub. (using https://www.php.net/utf8_decode this way is obviously not an example of good code, but is an example of tainted code)

Converts a string with ISO-8859-1 characters encoded with UTF-8 to single-byte ISO-8859-1

https://psalm.dev/r/9b5d105c35 emits "No issues" but I'd expect a taint warning

@psalm-github-bot
Copy link

I found these snippets:

https://psalm.dev/r/9b5d105c35
<?php // --taint-analysis

echo urldecode($_GET['x']);
Psalm output (using commit c21aefa):

No issues!

@TysonAndre TysonAndre changed the title Support taint analysis for remaining common pure functions such as json_decode()/strval()/join() Add taint flows for remaining built-in pure functions such as utf8_decode, max($strings), etc Nov 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants