Using the PHP Function Mysql_Set_Charset and SET_Collage to Change the Character Set and Collation of a MySQL Database
As a PHP developer working with multibyte strings and MySQL you must understand how character sets and collations work. A slight misunderstanding can lead to incorrect data being stored or a database with unexpected behavior.
A collation is a set of rules for comparing characters in a column. A character set determines what characters can go into a column and what sorting order is used for that data. It is important to consider both when designing a database as they play a significant role in how data is stored, compared and sorted.
The character set and collation of a MySQL database can be set at the time of creating it using the CREATE DATABASE statement or they can be changed at runtime by using the SET CHARACTER SET and SET COLLATON statements respectively. A default value can also be set by specifying them in the Options section of the MySQL configuration file (see Configuring the Server).
For example you could use the following command to change the default character set and collation to utf8mb4:
This will ensure that your DB is physically stored and retrieved in UTF-8 which should be fine for most applications. However, some applications will still expect to retrieve a latin1 byte from the db and will return a bad result if this is not handled properly.
You can also set the encoding to utf8mb4 in your mysql_connect() call which will also ensure that all your MySQL data is transmitted and received in UTF-8. The UTF-8 encoding is the universally supported Unicode encoding. It includes both Basic Multilingual Plane (BMP) and Supplementary Multilingual Plane (SMP) characters.