You can't just simple print separated characters of a text which is encoded in multibyte character set like this;
Because fgetc() will break each multibyte character on its every byte. Consider this example:
<?php
$path = 'foo/cyrillic.txt';
$handle = fopen($path, 'rb');
while (FALSE !== ($ch = fgetc($handle))) {
$curs = ftell($hanlde);
print "[$curs:] $ch\n";
}
/* The result will be something like this:
<
[1]: <
[2]: h
[3]: 2
[4]: >
[5]: ?
[6]: ?
[7]: ?
[8]: ?
[9]: ?
[10]: ?
[11]:
[12]: ?
[13]: ?
[14]: ?
[15]: ?
[16]: ?
*/ ?>
I don't think this is the best, but it can be a workaround:
<?php
$path = 'path/to/your/file.ext';
if (!$handle = fopen($path, 'rb')) {
echo "Can't open ($path) file';
exit;
}
$mbch = ''; // keeps the first byte of 2-byte cyrillic letters
while (FALSE !== ($ch = fgetc($handle))) {
//check for the sign of 2-byte cyrillic letters
if (empty($mbch) && (FALSE !== array_search(ord($ch), Array(208,209,129)))) {
$mbch = $ch; // keep the first byte
continue;
}
$curs = ftell($handle);
print "[$curs]: " . $mbch . $ch . PHP_EOL;
// or print "[$curs]: $mbch$ch\n";
if (!empty($mbch)) $mbch = ''; // erase the byte after using
}
?>