The mb_Substitute_Character() Function in PHP
PHP itself does not know about encodings or Unicode. Instead, it uses strings which are encoded in a single byte-oriented encoding, like ASCII. This means that when a programmer wants to use a character from a different encoding, they have to convert it first by using a short string of bytes and then an integer representing its codepoint in the original encoding. This is complicated, slow and error-prone.
Fortunately, an extension called Mbstring was developed to handle the situation. It provides encoding-aware versions of many of PHP's standard string functions (substr(), strlen(), ereg() etc). It also provides a function to convert between different encodings, which is used by some external libraries and extensions like Imap or GNU Recode.
However, the Mbstring extension is quite large and it is not easy to install on typical PHP installs. Furthermore, its functions do not form a clean and consistent API, since they have no naming convention (e.g. some functions start with mb_, while others use str()). This is exacerbated by the fact that PHP's default encoding is ISO-8859-1, which is not compatible with many multi-byte encodings, so the extension has to be enabled explicitly in order to work properly with them.
What is needed is a separate Unicode text string type and a new API which offers a complete and consistent solution for dealing with encodings and Unicode in PHP. The mb_substitute_character() function is an example of the sort of functionality which should be added.