CNK's Blog

Upgrading to MySQL 8

Character set considerations

We have largely been using UTF8 for our databases for some time. However, if we want to support emojis or other more exotic characters, we need to start using utf8mb4. In fact, our Rails projects are set up to require that character set.

So while we are upgrading databases versions (from the EOL MySQL 5.6 to MySQL 8), we decided to upgrade the character sets of our databases at the same time. I initially tried to configure our database with collation_server as utf8mb4_0900_ai_ci to get all the latest unicode goodness, but AWS said I couldn’t set collation_server to that value. The closest option was utf8mb4_unicode_ci so I went with that. Once I got into the database, I found the following settings:

  MySQL [mysql]> show variables like '%collation%';
  | Variable_name                 | Value              |
  | collation_connection          | utf8_general_ci    |
  | collation_database            | utf8mb4_0900_ai_ci |
  | collation_server              | utf8mb4_unicode_ci |
  | default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |
  4 rows in set (0.00 sec)

  MySQL [mysql]> show variables like '%character_set%';
  | Variable_name            | Value                                     |
  | character_set_client     | utf8                                      |
  | character_set_connection | utf8                                      |
  | character_set_database   | utf8mb4                                   |
  | character_set_filesystem | binary                                    |
  | character_set_results    | utf8                                      |
  | character_set_server     | utf8mb4                                   |
  | character_set_system     | utf8                                      |
  | character_sets_dir       | /rdsdbbin/mysql-8.0.23.R3/share/charsets/ |
  8 rows in set (0.00 sec)

So from that we see databases should automatically be created with character set utf8b4 and collation utf8mb4_0900_ai_ci because that is the default collation for the character set utf8mb4. However, I suspect I may want to set the character set (and collation) for the client and connection to utf8b4.

Changing the values in my terraform aws_db_parameter_group did not change the values in the running database.

Authentication plugin

I was expecting the password hashing mechanism for the new database to be caching_sha2_password since I had read that was the new method for MySQL8. But when I created my first user on the new database, I found they had their password stored as mysql_native_password.

  MySQL [mysql]> show variables like '%auth%';
  | Variable_name                 | Value                 |
  | default_authentication_plugin | mysql_native_password |
  1 row in set (0.00 sec)

We do still have some clients that will need to connect with mysql_native_password but I want those to be the exceptions, not the rule.

Trying to update that in place gave:

  Error: Error modifying DB Parameter Group: InvalidParameterValue: The parameter default_authentication_plugin cannot be modified.
    on line 726, in resource "aws_db_parameter_group" "mysql-sensible-defaults-8-0":
   726: resource "aws_db_parameter_group" "mysql-sensible-defaults-8-0" {

Timezone information

The Django docs about using MySQL say you need to load the timezone tables when creating a new MySQL instance. Both the docker containers we use in dev and the AWS RDS MySQL instances take care of loading this data automatically.

Notes from MySQL 8 Docs The server character set and collation are used as default values if the database character set and collation are not specified in CREATE DATABASE statements. They have no other purpose.

The values you see for show variables like '%collation%'; will vary depending on the session details. The database you are in (e.g. one of the user schemas vs in mysql or information_schema) will affect what you see for collation_database and character_set_database. And how you connected will affect the values you see for collation_connection and character_set_connection, character_set_results, and character_set_client. It looks to me like the controlling variable there is character_set_client.