Presenting several UTF-8 / Multibyte-aware escape functions.
These functions represent alternatives to mysqli::real_escape_string, as long as your DB connection and Multibyte extension are using the same character set (UTF-8), they will produce the same results by escaping the same characters as mysqli::real_escape_string.
This is based on research I did for my SQL Query Builder class:
https://github.com/twister-php/sql
<?php
if (function_exists('mb_ereg_replace'))
{
function mb_escape(string $string)
{
return mb_ereg_replace('[\x00\x0A\x0D\x1A\x22\x27\x5C]', '\\\0', $string);
}
} else {
function mb_escape(string $string)
{
return preg_replace('~[\x00\x0A\x0D\x1A\x22\x27\x5C]~u', '\\\$0', $string);
}
}
?>
Characters escaped are (the same as mysqli::real_escape_string):
00 = \0 (NUL)
0A = \n
0D = \r
1A = ctl-Z
22 = "
27 = '
5C = \
Note: preg_replace() is in PCRE_UTF8 (UTF-8) mode (`u`).
Enhanced version:
When escaping strings for `LIKE` syntax, remember that you also need to escape the special characters _ and %
So this is a more fail-safe version (even when compared to mysqli::real_escape_string, because % characters in user input can cause unexpected results and even security violations via SQL injection in LIKE statements):
<?php
if (function_exists('mb_ereg_replace'))
{
function mb_escape(string $string)
{
return mb_ereg_replace('[\x00\x0A\x0D\x1A\x22\x25\x27\x5C\x5F]', '\\\0', $string);
}
} else {
function mb_escape(string $string)
{
return preg_replace('~[\x00\x0A\x0D\x1A\x22\x25\x27\x5C\x5F]~u', '\\\$0', $string);
}
}
?>
Additional characters escaped:
25 = %
5F = _
Bonus function:
The original MySQL `utf8` character-set (for tables and fields) only supports 3-byte sequences.
4-byte characters are not common, but I've had queries fail to execute on 4-byte UTF-8 characters, so you should be using `utf8mb4` wherever possible.
However, if you still want to use `utf8`, you can use the following function to replace all 4-byte sequences.
<?php
function mysql_utf8_sanitizer(string $str)
{
return preg_replace('/[\x{10000}-\x{10FFFF}]/u', "\xEF\xBF\xBD", $str);
}
?>
Pick your poison and use at your own risk!