📚 Looking for current Exponential documentation? This site contains legacy SDK & API reference docs (eZ Publish era). For up-to-date 2026 product documentation, visit doc.exponential.earth.

  ezi18n / textcodec  

<< MBString extension  Translation >> 

TextCodec conversion

The TextCodec allows for uniform conversion from one charset to another, it does conversion using codepages and using the mbstring extension for some charset conversions.

The table below displays some of the various charsets it can convert and it's utf8 conversion. Using utf8 allows us to display the various characters in one page.

TODO — Image not available in archive

Title:   mbstring Charset Conversion Table
File:    /doc/images/mbstring.jpg

What this image shows:
Screenshot of a table showing which charsets can be converted using the PHP mbstring extension within eZ i18n. Each row lists a charset name and its UTF-8 conversion output, demonstrating multi-byte character handling. Appears in both the mbstring and textcodec SDK documentation sections.

Replacement instructions:
Reproduce by running the sample code from sdk/ezi18n/view/mbstring and taking a screenshot. Save as /doc/images/mbstring.jpg

See /sdk/missing.html for the full list of missing images.

This table can be produced by the following code:

include_once( "lib/ezi18n/classes/eztextcodec.php" );

header( "Content-Type: text/html; charset=utf8" );

$latinstr = "";
for ( $i = 0x41; $i < 0x60; ++$i )
    $latinstr .= chr( $i );

$cyrstr = "";
for ( $i = 0xb0; $i < 0xd0; ++$i )
    $cyrstr .= chr( $i );

$greekstr = "";
for ( $i = 0xc1; $i < 0xe0; ++$i )
    $greekstr .= chr( $i );

$hebrewstr = "";
for ( $i = 0xe0; $i < 0xfb; ++$i )
    $hebrewstr .= chr( $i );

$arabicstr = "";
for ( $i = 0xc0; $i < 0xdf; ++$i )
    $arabicstr .= chr( $i );

$northeurstr = "";
for ( $i = 0xc0; $i < 0xe0; ++$i )
    $northeurstr .= chr( $i );

$sjisstr = "";
for ( $i = 0x829f; $i < 0x82c2; ++$i )
    $sjisstr .= chr( $i >> 8 ) . chr( $i & 0xff );

$sjisstr2 = "";
for ( $i = 0x8340; $i < 0x8362; ++$i )
    $sjisstr2 .= chr( $i >> 8 ) . chr( $i & 0xff );

$sjisstr3 = "";
for ( $i = 0x889f; $i < 0x88cb; ++$i )
    $sjisstr3 .= chr( $i >> 8 ) . chr( $i & 0xff );

$sjisstr4 = "";
for ( $i = 0xa6; $i < 0xc5; ++$i )
    $sjisstr4 .= chr( $i );

$convert_list = array( array( "text" => $latinstr,
                              "charset" => "iso-8859-1",
                              "name" => "Latin1" ),
                       array( "text" => $cyrstr,
                              "charset" => "cyrillic",
                              "name" => "Cyrillic" ),
                       array( "text" => $greekstr,
                              "charset" => "iso-8859-7",
                              "name" => "Greek" ),
                       array( "text" => $northeurstr,
                              "charset" => "iso-8859-4",
                              "name" => "Northern Europe" ),
                       array( "text" => $hebrewstr,
                              "charset" => "iso-8859-8",
                              "name" => "Hebrew" ),
                       array( "text" => $arabicstr,
                              "charset" => "iso-8859-6",
                              "name" => "Arabic" ),
                       array( "text" => $sjisstr,
                              "charset" => "cp932",
                              "name" => "Hiragana" ),
                       array( "text" => $sjisstr2,
                              "charset" => "cp932",
                              "name" => "Katakana" ),
                       array( "text" => $sjisstr3,
                              "charset" => "cp932",
                              "name" => "CJK Unified" ),
                       array( "text" => $sjisstr4,
                              "charset" => "cp932",
                              "name" => "Halfwidth Katakana" )
                       );

print( "<p>
The TextCodec allows for uniform conversion from one charset to another, it does conversion using codepages
and using the mbstring extension for some charset conversions. This page does not use the mbstring extension
to show that it can handle these charsets itself.
</p>

<p>
The table below displays some of the various charsets it can convert and it's utf8 conversion. Using
utf8 allows us to display the various characters in one page.
</p>
" );

print( "<table>
<tr><th>Requested charset</th><th>Charset</th><th>Name</th><th>Text</th><th>Original strlen</th><th>Correct strlen</th></tr>\n" );

eZTextCodec::setUseMBString( false );

foreach( $convert_list as $convert_item )
{
    $charset = $convert_item["charset"];
    $text = $convert_item["text"];
    $name = $convert_item["name"];
    $codec =& eZTextCodec::instance( $charset, "utf-8" );
    $req_charset = $codec->requestedInputCharsetCode();
    $charset = $codec->inputCharsetCode();

    $out = $codec->convertString( $text );
    print( "<tr><td>$req_charset</td><td>$charset</td><td>$name</td><td>$out</td><td>" . strlen( $text ) . "</td><td>" . $codec->strlen( $text ) . "</td></tr>\n" );
}
print( "</table>
<p class=\"footnote\">Not all characters may be visible depending on your browser font support.</p>
" );

Exponential Copyright © 1998 - 2026 7x