Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Provide _is_utf8_charset() in compat.php for early use. #7052

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 39 additions & 2 deletions src/wp-includes/compat.php
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,43 @@ function _wp_can_use_pcre_u( $set = null ) {
return $utf8_pcre;
}

/**
* Indicates if a given slug for a character set represents the UTF-8 text encoding.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other functions in this file mention that they're for internal use, like this:

Internal compat function to mimic mb_substr().

Should we mention that this is an internal function only intended to prevent load-order issues and that is_utf8_charset should be used in most cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this, but also I don't see a reason why this shouldn't be made available 🤷‍♂️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_utf8_charset is the API we expect folks to use, not this new function. To me, that's a good reason this should include @ignore and be noted as an internal compact function

I also think we need a note for why this is here since it's not a compat function like everything else

*
* A charset is considered to represent UTF-8 if it is a case-insensitive match
* of "UTF-8" with or without the hyphen.
*
* Example:
*
* true === _is_utf8_charset( 'UTF-8' );
* true === _is_utf8_charset( 'utf8' );
* false === _is_utf8_charset( 'latin1' );
* false === _is_utf8_charset( 'UTF 8' );
*
* // Only strings match.
* false === _is_utf8_charset( [ 'charset' => 'utf-8' ] );
*
* `is_utf8_charset` should be used outside of this file.
*
* @ignore
* @since 6.6.1
*
* @param string $charset_slug Slug representing a text character encoding, or "charset".
* E.g. "UTF-8", "Windows-1252", "ISO-8859-1", "SJIS".
*
* @return bool Whether the slug represents the UTF-8 encoding.
*/
function _is_utf8_charset( $charset_slug ) {
if ( ! is_string( $charset_slug ) ) {
return false;
}

return (
0 === strcasecmp( 'UTF-8', $charset_slug ) ||
0 === strcasecmp( 'UTF8', $charset_slug )
);
}

if ( ! function_exists( 'mb_substr' ) ) :
/**
* Compat function to mimic mb_substr().
Expand Down Expand Up @@ -91,7 +128,7 @@ function _mb_substr( $str, $start, $length = null, $encoding = null ) {
* The solution below works only for UTF-8, so in case of a different
* charset just use built-in substr().
*/
if ( ! is_utf8_charset( $encoding ) ) {
if ( ! _is_utf8_charset( $encoding ) ) {
return is_null( $length ) ? substr( $str, $start ) : substr( $str, $start, $length );
}

Expand Down Expand Up @@ -176,7 +213,7 @@ function _mb_strlen( $str, $encoding = null ) {
* The solution below works only for UTF-8, so in case of a different charset
* just use built-in strlen().
*/
if ( ! is_utf8_charset( $encoding ) ) {
if ( ! _is_utf8_charset( $encoding ) ) {
return strlen( $str );
}

Expand Down
18 changes: 4 additions & 14 deletions src/wp-includes/functions.php
Original file line number Diff line number Diff line change
Expand Up @@ -7496,27 +7496,17 @@ function get_tag_regex( $tag ) {
* $is_utf8 = is_utf8_charset();
*
* @since 6.6.0
* @since 6.6.1 A wrapper for _is_utf8_charset
*
* @see _is_utf8_charset
*
* @param string|null $blog_charset Optional. Slug representing a text character encoding, or "charset".
* E.g. "UTF-8", "Windows-1252", "ISO-8859-1", "SJIS".
* Default value is to infer from "blog_charset" option.
* @return bool Whether the slug represents the UTF-8 encoding.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should include an @see _is_utf8_charset since this is now essentially a wrapper function

*/
function is_utf8_charset( $blog_charset = null ) {
$charset_to_examine = $blog_charset ?? get_option( 'blog_charset' );

/*
* Only valid string values count: the absence of a charset
* does not imply any charset, let alone UTF-8.
*/
if ( ! is_string( $charset_to_examine ) ) {
return false;
}

return (
0 === strcasecmp( 'UTF-8', $charset_to_examine ) ||
0 === strcasecmp( 'UTF8', $charset_to_examine )
);
return _is_utf8_charset( $blog_charset ?? get_option( 'blog_charset' ) );
}

/**
Expand Down
Loading