 Unicode ready?

Le Mardi 2 Avril 2002 12:53, vous avez crit :

> Is PostgreSQL unicode compliant/ready?
> Does it store/export text in Unicode wide-character format, or single
> character strings?

[By the way : there are several Unicode encodings (UTF-8, UTF-16, UCS2).
UTF-8 is the most popular because wide characters are coded using 1 to 3
single ASCII character. Thus UTF-8 extracts can be read in a normal text
editor. On the converse, UTF-16 is coded on 16 bytes, thus can't be read

I guess your question was "Is PostgreSQL multi-byte safe and Unicode ready?"

1) Server-side :
a) PostgreSQL needs to be compiled with
b) Create a database with

Several other multi-byte encodings are available. In the case of Unicode,
data is stored in UTF-8 format. Data and searches are performed on
wide-characters, not 8 bits characters.

2) Client side
By default connection is done with server encoding. But it is possible to
automatically recode connections on the fly using :

SET CLIENT_ENCODING = Latin9 (this example recodes Unicode streams to Western
European with Euro symbol). It is possible to recode several streams at the
same time.

3) ODBC interface
The current odbc interface provides Unicode UTF-8 Unicode encoding. But
Microsoft platform needs a Unicode UCS-2 encoding (ex: Access 2K). Therefore,
you will be able to view data under OpenOffice but not Microsoft Office.

The new ODBC driver in CVS supports UCS-2.

4) Server side languages
Server-side languages are the traditional weakness of Unicode programming.
When writing code, you need to calculate the lenght of a string, crop the
left side of it, etc... In PHP, this is dones using special mb_string
libraries. Usually, this breaks your code because these libraries provide
additional programming words.

This is not the case in PostgreSQL where all PLpgSQL functions are multi-byte
safe. Because of PHP instability, I ported several functions to PLpgSQL.

PostgreSQL is a pure marvel.

Jean-MIchel POURE

