Transliterator::transliterate

transliterator_transliterate

(PHP 5 >= 5.4.0, PHP 7, PHP 8, PECL intl >= 2.0.0)

Transliterator::transliterate -- transliterator_transliterateTransliterate a string

˵Ã÷

ÃæÏò¶ÔÏó·ç¸ñ

public Transliterator::transliterate ( string $subject , int $start = ? , int $end = ? ) : string|false

¹ý³Ì»¯·ç¸ñ

transliterator_transliterate ( mixed $transliterator , string $subject , int $start = ? , int $end = ? )

Transforms a string or part thereof using an ICU transliterator.

²ÎÊý

transliterator

In the procedural version, either a Transliterator or a string from which a Transliterator can be built.

subject

The string to be transformed.

start

The start index (in UTF-16 code units) from which the string will start to be transformed, inclusive. Indexing starts at 0. The text before will be left as is.

end

The end index (in UTF-16 code units) until which the string will be transformed, exclusive. Indexing starts at 0. The text after will be left as is.

·µ»ØÖµ

The transformed string on success, »òÕßÔÚʧ°Üʱ·µ»Ø false.

·¶Àý

Example #1 Converting escaped UTF-16 code units

<?php
$s 
"\u304A\u65E9\u3046\u3054\u3056\u3044\u307E\u3059";
echo 
transliterator_transliterate("Hex-Any/Java"$s), "\n";

//now the reverse operation with a supplementary character
$supplChar html_entity_decode('&#x1D11E;');
echo 
mb_strlen($supplChar"UTF-8"), "\n";
$encSupplChar transliterator_transliterate("Any-Hex/Java"$supplChar);
//echoes two encoded UTF-16 code units
echo $encSupplChar"\n";
//and back
echo transliterator_transliterate("Hex-Any/Java"$encSupplChar), "\n";
?>

ÒÔÉÏÀý³ÌµÄÊä³öÀàËÆÓÚ£º

¤ªÔ礦¤´¤¶¤¤¤Þ¤¹
1
\uD834\uDD1E
??

²Î¼û

User Contributed Notes

Anonymous 14-Nov-2016 10:04
There are some possibly undesirable conversions with ASCII//TRANSLIT//IGNORE or your users may require some custom stuff.

You might want to run a substitution up front for certain things, such as when you want 3 letter ISO codes to replace currency symbols. ¡ê transliterates to "lb", for example, which is incorrect since it's a currency symbol, not a weight symbol (#).

ASCII//TRANSLIT//IGNORE does a great job within the realm of possibility :-)

When it doesn't do something you want it to, you can set up a CSV with one replacement per line and run a function like:

    function stripByMap($inputString, $mapFile)
    {
        $csv = file($mapFile);
        foreach($csv as $line)
        {
            $arrLine = explode(',', trim($line));
            $inputString = str_replace($arrLine[0],$arrLine[1],$inputString);
        }
        return $inputString;
    }

or you can write some regexes. Transliterating using ASCII//TRANSLIT//IGNORE  works so well that your map probably won't be very long...
simonsimcity at gmail dot com 05-Jun-2014 08:16
Sorry, for posting it again, but I found a bug in my code:

If you have a character, like the cyrillic §î (a soft-sign - no sound), the "Any-Latin" would translate it to a prime-character, and the "Latin-ASCII" doesn't touch prime-characters. Therefore I added an option to remove all characters, that are higher than \u0100.

Here's my new code, including an example:

var_dump(transliterator_transliterate('Any-Latin; Latin-ASCII; [\u0100-\u7fff] remove',
    "A ? ¨¹b¨¦rmensch p? h?yeste niv?! §ª §ñ §Ý§ð§Ò§Ý§ð PHP! §Ö§ã§ä§î. ?"));
// string(50) "A ae Ubermensch pa hoyeste niva! I a lublu PHP! est. fi"

Another approach, I found quite helpful (if you by no way want to remove characters ...), try to use iconv() in addition. This surely will just return ASCII characters.

See: http://stackoverflow.com/a/3542748/517914

Also an example here:

var_dump(iconv("UTF-8", "ASCII//TRANSLIT//IGNORE", transliterator_transliterate('Any-Latin; Latin-ASCII',
    "A ? ¨¹b¨¦rmensch p? h?yeste niv?! §ª §ñ §Ý§ð§Ò§Ý§ð PHP! §Ö§ã§ä§î. ?"));
// string(50) "A ae Ubermensch pa hoyeste niva! I a lublu PHP! est'. fi"
simonsimcity at gmail dot com 15-Apr-2013 06:11
I pretty much like the idea of hdogan, but there's at least one group of characters he's missing: ligature characters.
They're at least used in Norwegian and I read something about French, too ... Some are just used for styling (f.e. ?)

Here's an example that supports all characters (should at least, according to the documentation):
<?php
var_dump
(transliterator_transliterate('Any-Latin; Latin-ASCII; Lower()', "A ? ¨¹b¨¦rmensch p? h?yeste niv?! §ª §ñ §Ý§ð§Ò§Ý§ð PHP! ?"));
// string(41) "a ae ubermensch pa hoyeste niva! i a lublu php! fi"
?>

In this example any character will firstly be converted to a latin character. If that's finished, replace all latin characters by their ASCII replacement.
hdogan at gmail dot com 11-Nov-2012 02:33
You can create slugs easily with:

<?php
function slugify($string) {
   
$string = transliterator_transliterate("Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC; [:Punctuation:] Remove; Lower();", $string);
   
$string = preg_replace('/[-\s]+/', '-', $string);
    return
trim($string, '-');
}

echo
slugify("§Á §Ý§ð§Ò§Ý§ð PHP!");
?>
jinmoku at hotmail dot com 09-Feb-2011 12:59
OOP version :

<?php
$str
= '¨¤¨¢a???¨¨¨¦¨º?¨¬¨ª???¨°¨®???¨´¨²?¨¹y?
¨¤¨¢????¨¨¨¦¨º?¨¬¨ª???¨°¨®???¨´¨²?¨¹Y'
;
$rule = 'NFD; [:Nonspacing Mark:] Remove; NFC';

$myTrans = Transliterator::create($rule);
echo
$myTrans->transliterate($str);
 
//aaaaaceeeeiiiinooooouuuuyy
//AAAAACEEEEIIIINOOOOOUUUUY
?>