PHP mb_chr() Function

What is PHP mb_chr() Function?

If you want to get a character of an Unicode code point (or codepoint or code position) value, use mb_chr() function.

Syntax:

mb_chr(codepoint, encoding)

Parameters:

The Function has 1 required parameter and 1 optional parameter-

codepoint (Required): The Unicode code point value that we’ll convert to a character. Check example 1.

encoding (Optional): The character encoding system that the function uses to encode the code point. If you omit this parameter or use NULL as value, the internal character encoding system will be used. You can find the default character encoding system in php.ini file in “default_charset = “ setting. Check example 2.

Return Values:

The function returns-

  • a string that contains the requested character – on success.
  • FALSE – on failure. Check example 6.

How Unicode code point can understand code points of other encodings?

Let’s check what’s the relation between Unicode and other encodings that we can define in the encoding parameter in the function.

ASCII: American Standard Code for Information Interchange is a character encoding standard for electronic communication.

  • It is a single-byte (7-bit) encoding system.
  • It has 128 code points.
  • Encoded characters include control codes (code point 0-31 and 127), digits (code point 48-57), uppercase (code points 65-90) or lowercase (97-122) English letters, punctuation symbols (e.g. !, “, #, %), miscellaneous symbols.

ISO/IEC 8859-1: International Standards Organization.

  • It is a single-byte (8-bit) encoding system.
  • It has 256 code points.
  • It is used throughout the Americas, Western Europe, Oceania, and much of Africa.
  • It is backward compatible with ASCII.

Windows-1252: It has few other names – ANSI, Extended ASCII, Windows Latin 1, Code Page 1252, CP-1252. It is most used single-byte encoding system.

  • It is a single-byte (7-bit) encoding system.
  • It contains 256 code points.
  • It’s a superset of ASCII, first 128 (code points 0-127) characters are same.
  • It is a superset of ISO 8859-1 in term of printable characters.

UTF-8: Unicode Transformation Format-8 encoding is a variable length character encoding used for electronic communication.

  • It is a multi-byte (1-4 byte) encoding system.
  • It contains 1,112,064 code points.
  • It is backward compatible with 7-bit ASCII, first 128 are same.
  • UTF-8 is not 100% backwards compatible with ISO 8859-1 or ANSI/ Windows-1252, first 0-127 code points are same.

Examples:

Example 1: Simple mb_chr() function-

<?php
echo "The character for the code point 97 is: " . mb_chr(97) . "<br />";
echo "The character for the code point 50 is: " . mb_chr(50) . "<br />";
echo "The character for the code point 19968 is: " . mb_chr(19968, "UTF-8") . "<br />";
echo mb_chr(80).mb_chr(72).mb_chr(80).mb_chr(32).mb_chr(99).mb_chr(104).mb_chr(114).mb_chr(40).mb_chr(41);
?>

Output:

The character for the code point 97 is: a
The character for the code point 50 is: 2
The character for the code point 19968 is: 一
PHP chr()

Explanation:

“a”, “2”, and “一” are the characters of the Unicode code points of 97, 50, and 19968 respectively according to the UTF-8 encoding (it is the default character encoding system). mb_chr() function converts these numbers to their corresponding characters. Similarly, all the 12 mb_chr() functions in line 5 convert the numbers into PHP mb_chr() function.

Example 2: Displaying all the character of any language (Arabic here) –

<?php
for ($i=1575; $i<=1610;$i++){
    echo mb_chr($i) . " ";
}
?>

Output:

ا ب ة ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع غ ػ ؼ ؽ ؾ ؿ ـ ف ق ك ل م ن ه و ى ي

Explanation:

In UTF-8, code points of Arabic alphabet characters are from 1565 to 1610. The loop prints convert these numbers into the Arabic alphabet letters.

Example 3: Random password generation with mb_chr() function-

<?php
$password = "";
for($i=0;$i<3;$i++){
    $password .= mb_chr(rand(65,90)).mb_chr(rand(33,47)).mb_chr(rand(97,122)).mb_chr(rand(48,57));
}
echo "Random Password: " . $password;
?>

Output:

Random Password: X)c3Y+s4O’z3

Explanation:

Line 4: The rand() function generates a random integer between its first and second parameters and the mb_chr() function converts this random Unicode code point to a character. The code points between 65 to 90 represent all the uppercase English letters. Similarly, code points 33 to 47 represents some special characters code points 97 to 122 represents lowercase English letters, code points 48 to 57 represents digits. With all these the function creates a password.

Note, as it is a random generated password, your password will be a different one when you run the code.

Example 4: Binary/Octal/Hexadecimal values to character with mb_chr() function-

<?php
echo "The character for the Unicode code point of Binary 0b100111000000000 (Chinese 1) is: " . mb_chr(0b100111000000000) . "<br />";
echo "The character for the Unicode code point of Octal 047214 (Chinese 2) is: " . mb_chr(047214) . "<br />";
echo "The character for the Unicode code point of Hexadecimal 0x4e09 (Chinese 3) is: " . mb_chr(0x4e09);
?>

Output:

The character for the Unicode code point of Binary 0b100111000000000 (Chinese 1) is: 一
The character for the Unicode code point of Octal 047214 (Chinese 2) is: 二
The character for the Unicode code point of Hexadecimal 0x4e09 (Chinese 3) is: 三

Explanation:

The function mb_chr() can convert binary Unicode code point value 0b100111000000000 to Chinese 一. The mb_chr() function can convert octal Unicode code point value 047214 to Chinese 二. Similarly, The mb_chr() function can convert hexadecimal Unicode code point value 0x4e09 to Chinese 三.

Example 5: Finding missing digits-

<?php
$arr = mb_str_split("۰۱۳۴۶۷۹");
for($i=0; $i<count($arr); $i++){
    $numbers[] = mb_ord($arr[$i], "UTF-8");
}
$missingNumbers = "";
for ($i=1776; $i<=1785; $i++){
    if(!in_array($i, $numbers)){
        $missingNumbers .= mb_chr($i) . " ";
    }
}
echo $missingNumbers;
?>

Output:

۲ ۵ ۸

Explanation:

Line 2: The mb_str_split() function splits the numeric string “۰۱۳۴۶۷۹” which are Arabic digits into separate digit and create an array with each digit.

Line 3: The loop stores the code points of the digits in the $number array.

Line 7: Code point 1776 to 1785 is for 10 digits in Arabic Alphabet (۰۱۲۳۴۵۶۷۸۹) which is equivalent to English (0,1,2,3,4,5,6,7,8,9)

Line 8: If any code point digit doesn’t match with the code points stored in the array, we add this to the variable $missingNumbers in line 10. We used mb_chr() function to convert the code point back to numbers.

Example 6: mb_chr() returns FALSE on failure-

<?php
var_dump(mb_chr(99999999)) . "<br />";
var_dump(mb_chr(-97));
?>

Output:

bool(false)
bool(false)

Explanation:

As 99999999 is not a valid Unicode code point, the function returns FALSE. Similarly, negative number can’t be a valid Unicode code point, so the function returns FALSE too.

Practical Usages of mb_chr() Function:

  • You can use mb_chr() function to generate password. Check example 3.
  • You can find any missing characters from any language with this function. Check example 4.

Notes on mb_chr() Function:

  • You can also specify the code point not only in decimal, but you can mention it in binary, octal, or hexadecimal values.
    • A binary value starts with 0b or 0B e.g. 0b1111000. You can convert this value to ASCII character. Check example 5.
    • An octal value starts with 0o or 0O or 0 e.g. 0171. You can convert an octal value to ASCII character. Check example 5.
    • A hexadecimal value starts with 0x or 0X e.g. 0x7A. You can convert a hexadecimal value to ASCII character. Check example 5.
  • The opposite of the mb_chr() function is mb_ord() function.

PHP Version Support:

PHP 4, PHP 5, PHP 7, PHP 8

Summary: PHP mb_chr() Function

mb_chr() is a very useful and dependable function to convert any Unicode point to a character. Use the function in your project without any doubt. It is a built-in string functions in PHP.

Reference:

https://www.php.net/manual/en/function.mb-chr.php