CNK's Blog

MySQL Character Set Configuration

I want to set up MySQL server and all to it connections to it to use utf8. For work, we are using MySQL server 5.5 and RHEL 6.5 (or the equivalent CentOS on our Vagrant VMs).

Step 1. What is it currently using?

    mysql> show variables like 'char%';
    +--------------------------+----------------------------+
    | Variable_name            | Value                      |
    +--------------------------+----------------------------+
    | character_set_client     | latin1                     |
    | character_set_connection | latin1                     |
    | character_set_database   | latin1                     |
    | character_set_filesystem | binary                     |
    | character_set_results    | latin1                     |
    | character_set_server     | latin1                     |
    | character_set_system     | utf8                       |
    | character_sets_dir       | /usr/share/mysql/charsets/ |
    +--------------------------+----------------------------+
    8 rows in set (0.00 sec)

    mysql> show variables like '%collation%';
    +----------------------+-------------------+
    | Variable_name        | Value             |
    +----------------------+-------------------+
    | collation_connection | latin1_swedish_ci |
    | collation_database   | latin1_swedish_ci |
    | collation_server     | latin1_swedish_ci |
    +----------------------+-------------------+
    3 rows in set (0.00 sec)

Step 2. Changing the server’s character set.

The my.cnf file has a number of sections:

    mysqld_safe
    mysqld
    mysql
    client

Settings in the [mysqld] section apply to the server. Settings in the [mysql] section apply to the mysql command line client. And settings in the [client] section apply to all clients. Since I am mainly trying to change how the server and our python (or ruby) clients interact with the database, I want to make changes in the [mysqld] and [client] sections.

If I add default-character-set = utf8 to the [mysqld] section, and restart the server, my settings are now:

    mysql> show variables like 'char%';
    +--------------------------+----------------------------+
    | Variable_name            | Value                      |
    +--------------------------+----------------------------+
    | character_set_client     | latin1                     |
    | character_set_connection | latin1                     |
    | character_set_database   | utf8                       |
    | character_set_filesystem | binary                     |
    | character_set_results    | latin1                     |
    | character_set_server     | utf8                       |
    | character_set_system     | utf8                       |
    | character_sets_dir       | /usr/share/mysql/charsets/ |
    +--------------------------+----------------------------+
    8 rows in set (0.00 sec)

    mysql> show variables like '%collation%';
    +----------------------+-------------------+
    | Variable_name        | Value             |
    +----------------------+-------------------+
    | collation_connection | latin1_swedish_ci |
    | collation_database   | utf8_general_ci   |
    | collation_server     | utf8_general_ci   |
    +----------------------+-------------------+
    3 rows in set (0.01 sec)

When I create a new database without specifying a character set, it is created with server’s default, UTF8. That’s good. But the client and connection character set still say latin1.

    mysql> create database project_database;
    Query OK, 1 row affected (0.00 sec)

    mysql> SELECT schema_name, DEFAULT_CHARACTER_SET_NAME,
    DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA;
    +--------------------+----------------------------+------------------------+
    | schema_name        | DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME |
    +--------------------+----------------------------+------------------------+
    | information_schema | utf8                       | utf8_general_ci        |
    | mysql              | utf8                       | utf8_general_ci        |
    | project_database   | utf8                       | utf8_general_ci        |
    +--------------------+----------------------------+------------------------+
    3 rows in set (0.00 sec)

Step 3a. Setting the connection characterset explicitly

I can explicitly set the character set for character_set_client, character_set_connection, and character_set_results after I have logged into the database but doing either: mysql> SET NAMES 'utf8'; OR mysql> charset utf8

In either case, all of my character set variables are now utf8. With ‘set names’, the character set reverts to the default ‘latin1’ if the client has to reconnect but with the ‘charset utf8’ command, the change survives needing to reconnect. (Note the syntax differences betweeen the 2 commands - set names requires quoting ‘utf8’ and a semicolon to execute. Charset does not require either.)

    mysql> show variables like 'char%';
    +--------------------------+----------------------------+
    | Variable_name            | Value                      |
    +--------------------------+----------------------------+
    | character_set_client     | utf8                       |
    | character_set_connection | utf8                       |
    | character_set_database   | utf8                       |
    | character_set_filesystem | binary                     |
    | character_set_results    | utf8                       |
    | character_set_server     | utf8                       |
    | character_set_system     | utf8                       |
    | character_sets_dir       | /usr/share/mysql/charsets/ |
    +--------------------------+----------------------------+
    8 rows in set (0.00 sec)

But both of those commands have to be issued after the connection is established - and need to be repeated each time I connect. I want something more automatic.

Step 3b. Changing the client / connection character set.

It looked like I should be able to bet the client settings working by adding the same line, default-character-set = utf8, to the [client] section. But in my initial trials, that didn’t seem to change anything. I also tried adding default-character-set = utf8 to both the [client] and [mysql] sections. But I still get:

    mysql> show variables like 'char%';
    +--------------------------+----------------------------+
    | variable_name            | Value                      |
    +--------------------------+----------------------------+
    | character_set_client     | latin1                     |
    | character_set_connection | latin1                     |
    | character_set_database   | utf8                       |
    | character_set_filesystem | binary                     |
    | character_set_results    | latin1                     |
    | character_set_server     | utf8                       |
    | character_set_system     | utf8                       |
    | character_sets_dir       | /usr/share/mysql/charsets/ |
    +--------------------------+----------------------------+
    8 rows in set (0.01 sec)

    mysql> show variables like '%collation%';
    +----------------------+-------------------+
    | Variable_name        | Value             |
    +----------------------+-------------------+
    | collation_connection | latin1_swedish_ci |
    | collation_database   | utf8_general_ci   |
    | collation_server     | utf8_general_ci   |
    +----------------------+-------------------+
    3 rows in set (0.00 sec)

Some of the things I was reading indicated that my problem might be that connections for superusers (like root) might behave differently than connections for normal users. I tried creating a normal user and then retesting various combinations of parameters - with the same results.

I finally found something that consistently gave me utf8 connection parameters without having to set them expliclty (see digression below). In this article about converting to a better version of MySQL utf8, utf8mb4, I see this directive: character-set-client-handshake = FALSE. Adding that to my [mysqld] section gives me consistent utf8 connection parameters with root and other users. However, I don’t think I really want to do that. I may want to be able to override the character set information for some connections (for example if I am connecting to legacy databases that are not in utf8). I think most of the frameworks I use already pass a character set parameter with the connection - at least Rails has a character set option in it’s database.yml config file.

So I am still looking for the right answer.