CNK's Blog

Django manage.py Commands

sendtestemail

Someone on the PyDev Slack channel was having trouble getting emails from the Django admin system and another member mentioned there is a sendtestemail manage.py command that can help one debug your email settings. That made me wonder what else is available that I didn’t know about.

diffsettings

The Django settings object is a bit odd in that you can’t do print(settings.__dict__) to figure out what is available - and even if you know (or guess) the name of a setting, how do you know if the value is the default or something you have overridden in your app? There is a manage.py command for that! The most useful version is python manage.py diffsettings --output unified. That gives you all the settings - with the overridden versions in red.

showmigrations

I have used the default version to check to see if I have applied all the existing migrations in my test and prod environments - that’s the python manage.py showmigrations --list version. But there is also a python manage.py showmigrations --plan version. That will show you the order in which Django will apply migrations.

inspectdb

If you run python manage.py inspectdb against an existing database, it will generate the Django models that would have created those tables (and indexes and constraints). This command is for projects that must use a legacy database so all of the models are created with a Meta class with managed = False.

ping_google

If your site has had a bunch of changes that you want Google to recrawl, you can use this command to submit your sitemap: python manage.py ping_google. If you have the Django sitemaps app installed and a url configured for your sitemap, this command will figure out what the url should be.

New Django Query Tricks

Union queries

Union queries are surprisingly easy to create. I need a list of ids and content type ids from a bunch of different models. I was very surpised at how straightforward it is in Django 3.2 to create the UNION query I want.

  union_query = None
  for content_type in <queryset of content types>:
      model = apps.get_model(content_type.app_label, content_type.model)
      query = model.objects.filter(<criteria>).values('pk', <content_type.id>)
      if union_query is None:
         union_query = query
      else:
          union_query = union_query.union(query, all=True)

Note: I used all=True because I will never have duplicates in my (id, content_type_id) tuples and UNION ALL is faster than UNION in this case because we can skip the DISTINCT operation on the final result.

The observant among you will have noticed a bit of pseudocode in the example above. I want to insert the content_type_id from python into my query. In SQL this would be something like:

  SELECT id, 99 FROM myapp_model;

In the Django ORM, that turns out to be something I didn’t know how to do. I can’t leave it as a bare name and I can’t quote it or the ORM tries to turn it into a column name or relation that could be turned into a column name. Turns out I need to use Value:

  query = model.objects \
               .filter(<criteria>) \
               .values('pk', Value(content_type.id, output_field=IntegerField()))

OK so that now will give me a queryset that produces a list of dicts like: [{pk: 3, content_type_id: 44}, {pk: 3, content_type_id: 48}] But when I tried to use those results in the filter section of another query… I had my next problem.

Querying by value - without Foreign Key relationships

So now I need to use those ids and content_type_ids to filter another model that has rows with content_type_id and object_id columns. I want all the lines in the table for the ModelLogEntry model where the (object_id, content_type_id) tuple is in the list of (pk, content_type_id) tuples created by our UNION query above.

If I only needed to match on a single value, I would probably evaluate the UNION query, and then do something like .filter(pk__in=<list of pks>) - as I did to get the list of content types I need. But I need to match the id and content_type_id fields. In SQL, I would do:

  SELECT wagtailcore_modellogentry.*
    FROM wagtailcore_modellogentry
    INNER JOIN (
    (
    ((SELECT `link_farms_audience`.`id`, 104 AS `content_type_id`  FROM `link_farms_audience` WHERE `link_farms_audience`.`site_id` =  12)
     UNION
     (SELECT `link_farms_collection`.`id`, 105 AS `content_type_id` FROM `link_farms_collection` WHERE `link_farms_collection`.`site_id` = 12))
     UNION
     (SELECT `link_farms_link`.`id`, 106 AS `content_type_id` FROM `link_farms_link` WHERE `link_farms_link`.`site_id` = 12))
     UNION
     (SELECT `core_didyouknowitem`.`id`, 110 AS `content_type_id` FROM `core_didyouknowitem` WHERE `core_didyouknowitem`.`site_id` = 12 ORDER BY `core_didyouknowitem`.`text` ASC)
    ) AS models
    ON models.id = wagtailcore_modellogentry.object_id
    AND models.content_type_id = wagtailcore_modellogentry.content_type_id

This was relatively straightforward to write in SQL, so I tried using raw SQL, e.g. ModelLogQuery.objets.raw('<query here>'). That definitely gave me the data I was looking for when I ran it in shell_plus. But when I tried to use it in my monkey patch, the calling function wanted to use values(), which is a function only defined on real ORM QuerySets - and not available when using raw.

At this point I suspect I won’t want to use this in production. Goodness only knows how big the union query is likely to get. But it is bothering me that I don’t know how to get Django to let me do a relatively straightforward join without having defined a ForeignQuery relationship in my Python model code.

I still don’t know how to tell Django “do this join damn it!”, but after some reading and thinking of alternate ways to write the SQL, I think I have found a way to write this in the ORM using Exists to create a correlated subquery.

    from django.apps import apps
    from django.db.models import Exists, IntegerField, OuterRef, Value
    from django.contrib.contenttypes.models import ContentType

    request = get_current_request()
    site = Site.find_for_request(request)
    union_query = None
    content_types = (
        ContentType.objects
                   .filter(id__in=ModelLogEntry.objects.values_list('content_type_id', flat=True).distinct())
    )
    for content_type in content_types:
        model = apps.get_model(content_type.app_label, content_type.model)
        query = (model.objects.filter(site_id=site.id)
                 .values('pk', content_type_id=Value(content_type.id, output_field=IntegerField()))
                 )
        if union_query is None:
            union_query = query
        else:
            union_query = union_query.union(query, all=True)

    return ModelLogEntry.objects.filter(Exists(
        union_query.filter(pk=OuterRef('object_id'), content_type_id=OuterRef('content_type_id'))
    ))

Sigh. One can’t combine .filter with a union query.

    NotSupportedError at /admin/reports/site-history/
    Calling QuerySet.filter() after union() is not supported.

I tested the Exists query by setting the union_query to be just one type and it works fine. So I learned something useful about the Django ORM - even if I can’t apply that knowledge in the context in which I wanted to to use it this time.

Hosting static websites on AWS

I have had a VM for web hosting for ….. about as long as VMs have been a thing. Before that I shared a physical machine with some old work collegues from ArsDigita. But ever since I hosed the PHP on my current VM, I haven’t really been doing anything that needed dynamic hosting. So it’s about time I saved some time and effort and move things to static hosting.

We do some AWS stuff at work so one of the simpler ways to move away from having my own server would be to host static sites in S3. The AWS console is a hot mess AND constantly changing AND isn’t version controlled. But terraform will let me manage my AWS configuration using code that can be version controlled. And will give me some additional experience using terraform and AWS. So win, win, win.

I found this blog post that does a pretty good job of explaining how to use terraform to set up a static web site on aws: The only things I had questions about where:

  1. Why use CloudFront for a low traffic site?
  2. Why not move DNS registration to AWS?
  3. Why redirect everything to www instead of to the bare hostname?

Item 1: CloudFront. I think for the author, part of the answer might be to get good scores on page response speed. I don’t really do a lot of speed optimization (other than mostly having just text on my site) but CloudFront is important for another reason. From the AWS docs as of August 2021:

Amazon S3 website endpoints do not support HTTPS or access points. If you want to use HTTPS, you can use Amazon CloudFront to serve a static website hosted on Amazon S3.

Item 2: DNS registration. I had originally been thinking of moving my DNS registration to AWS so I had everything in one place. I have been using DotYou.com for ages and they are fine, but nothing special. So while I was moving things, why not consolidate everything? Because AWS doesn’t have a super simple way to do email forwarding. With DotYou.com, I can navigate to the configuration for any of my domains and add one or more email addresses and where the email to those addresses should go. It is a personal life goal never to run a mail server. So I’ll be keeping my domains registered elsewhere - at least until setting up email forwarding on AWS is as simple as with my current provider.

Note bene: Because I want to continue to use the email forwarding from DotYou.com, I had to keep using their default nameservers rather than setting the name servers to be the Route53 name servers that Terraform set up for me. AND because CloudFront gives me a name rather than an IP to point my domain at, I can no longer have an A record for *.example.com; it needs to be a cname. I am not 100% there aren’t some implications of this that I am missing - but so far having a cname seems to work just fine….. or mostly, which brings me to item 3, what should be my canonical url?

Item 3: Canonial url. Google prefers if there is one canonical copy of any page - but it treats http://example.com, https://example.com, http://www.example.com, and https://www.example.com as 4 different sites - even if they all serve the exact same content. The best way to consolidate your analytics is to set up redirects so that all of those similar urls actually end up on one canonical url. I usually choose https://example.com as my real url. But the blog post I was following chose to redirect everything to https://www.example.com. It wasn’t until I had everything set up and was updating the DNS records, that I realized why it might have been better if I had redirected everything to www.example.com as in the example. I can create a wildcard cname record and point everything to some CloudFront distribution, and I have a wildcard SSL certificate. But if I try accessing http://foo.example.com, I get “403 Forbidden” and a message about this being a mis-configuration. If I try https://foo.example.com, I get an error with the error code: SSL_ERROR_NO_CYPHER_OVERLAP

CloudFlare supports wildcard CNAMEs so I might be able to set things up to catch all names - but that if that record points to the canonical distribution, then no redirect happens - so no consolidation of requests. So I think what I need to do is reverse my setup and make https://www.example.com my real bucket and real url. I did find this post about wildcard CloudFront distributions not behaving as the author expected - but I am not sure I understand how this is going to affect my setup.


Nope

After much fooling around with DNS settings and S3 bucket settings, I finally concluded that having working email addresses AND a web site served by S3/Cloudflare on my bare domain name (e.g. cynthiakiser.com) are incompatible. So I guess I will be keeping my VM after all.

Upgrading to MySQL 8

Character set considerations

We have largely been using UTF8 for our databases for some time. However, if we want to support emojis or other more exotic characters, we need to start using utf8mb4. In fact, our Rails projects are set up to require that character set.

So while we are upgrading databases versions (from the EOL MySQL 5.6 to MySQL 8), we decided to upgrade the character sets of our databases at the same time. I initially tried to configure our database with collation_server as utf8mb4_0900_ai_ci to get all the latest unicode goodness, but AWS said I couldn’t set collation_server to that value. The closest option was utf8mb4_unicode_ci so I went with that. Once I got into the database, I found the following settings:

  MySQL [mysql]> show variables like '%collation%';
  +-------------------------------+--------------------+
  | Variable_name                 | Value              |
  +-------------------------------+--------------------+
  | collation_connection          | utf8_general_ci    |
  | collation_database            | utf8mb4_0900_ai_ci |
  | collation_server              | utf8mb4_unicode_ci |
  | default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |
  +-------------------------------+--------------------+
  4 rows in set (0.00 sec)

  MySQL [mysql]> show variables like '%character_set%';
  +--------------------------+-------------------------------------------+
  | Variable_name            | Value                                     |
  +--------------------------+-------------------------------------------+
  | character_set_client     | utf8                                      |
  | character_set_connection | utf8                                      |
  | character_set_database   | utf8mb4                                   |
  | character_set_filesystem | binary                                    |
  | character_set_results    | utf8                                      |
  | character_set_server     | utf8mb4                                   |
  | character_set_system     | utf8                                      |
  | character_sets_dir       | /rdsdbbin/mysql-8.0.23.R3/share/charsets/ |
  +--------------------------+-------------------------------------------+
  8 rows in set (0.00 sec)

So from that we see databases should automatically be created with character set utf8b4 and collation utf8mb4_0900_ai_ci because that is the default collation for the character set utf8mb4. However, I suspect I may want to set the character set (and collation) for the client and connection to utf8b4.

Changing the values in my terraform aws_db_parameter_group did not change the values in the running database.

Authentication plugin

I was expecting the password hashing mechanism for the new database to be caching_sha2_password since I had read that was the new method for MySQL8. But when I created my first user on the new database, I found they had their password stored as mysql_native_password.

  MySQL [mysql]> show variables like '%auth%';
  +-------------------------------+-----------------------+
  | Variable_name                 | Value                 |
  +-------------------------------+-----------------------+
  | default_authentication_plugin | mysql_native_password |
  +-------------------------------+-----------------------+
  1 row in set (0.00 sec)

We do still have some clients that will need to connect with mysql_native_password but I want those to be the exceptions, not the rule.

Trying to update that in place gave:

  Error: Error modifying DB Parameter Group: InvalidParameterValue: The parameter default_authentication_plugin cannot be modified.
    on main.tf line 726, in resource "aws_db_parameter_group" "mysql-sensible-defaults-8-0":
   726: resource "aws_db_parameter_group" "mysql-sensible-defaults-8-0" {

Timezone information

The Django docs about using MySQL say you need to load the timezone tables when creating a new MySQL instance. Both the docker containers we use in dev and the AWS RDS MySQL instances take care of loading this data automatically.

Notes from MySQL 8 Docs

https://dev.mysql.com/doc/refman/8.0/en/charset-server.html The server character set and collation are used as default values if the database character set and collation are not specified in CREATE DATABASE statements. They have no other purpose.

The values you see for show variables like '%collation%'; will vary depending on the session details. The database you are in (e.g. one of the user schemas vs in mysql or information_schema) will affect what you see for collation_database and character_set_database. And how you connected will affect the values you see for collation_connection and character_set_connection, character_set_results, and character_set_client. It looks to me like the controlling variable there is character_set_client.

Managing Databases with Docker Compose

We could probably use a single database with multiple schemas for all our development work. That’s how we have been doing it for years. But sometimes that leads to us making compromises - like running a slightly different version of the database in dev vs production. And it leads to being somewhat conservative when trying out different database options. Now that we are using Docker for deploying our applications, it makes sense to use docker-compose to create all the services our apps use: relational databases, ElasticSearch, caches. The docker-compose file also manages mounted volumes (for preserving the data in our development databases) AND a private network for each set of containers.

Version 1

Creating a database and user

If we want to take full advantage of docker-compose’s automated workflow, we need to be able to recreate our databases - including loading dev data - automatically. The official MySQL database container image supports a really easy way to do this. If you set a handful of environment variables in your docker-compose.yml file, the first time you start the database container, it will create the database instance and set the root password to the value from MYSQL_ROOT_PASSWORD. If you include values for MYSQL_DATABASE, MYSQL_USER and MYSQL_PASSWORD, the first startup of the database will create that database and grant all privileges on that database to the specified user. Excellent! That gets us most of the way there. Now if we could only load some initial data….

Loading initial data

The MySQL image provides for this too. In the section “Initializing a fresh instance”:

When a container is started for the first time, a new database with the specified name will be created and initialized with the provided configuration variables. Furthermore, it will execute files with extensions .sh, .sql and .sql.gz that are found in /docker-entrypoint-initdb.d. Files will be executed in alphabetical order. You can easily populate your mysql services by mounting a SQL dump into that directory and provide custom images with contributed data. SQL files will be imported by default to the database specified by the MYSQL_DATABASE variable.

So the database section of our standard docker-compose.yml looks like:

  mysql:
    image: mysql:8.0.23
    container_name: "db-vote"
    ports:
      - "3306:3306"
    environment:
      MYSQL_ROOT_PASSWORD: root_password
      MYSQL_DATABASE: vote
      MYSQL_USER: vote
      MYSQL_PASSWORD: password
    cap_add:
      - SYS_NICE  # removes error messages like: mbind: Operation not permitted
    volumes:
      - ./sql/docker/mysql-data:/var/lib/mysql
      - ./sql/docker/my.cnf:/etc/mysql/conf.d/dev.cnf
      - ./sql/docker:/docker-entrypoint-initdb.d

And our project’s sql/docker/ directory has:

  sql/
    docker/
      my.cnf
      initial_data.sql
      mysql-data/
        ....various files and directories for the actual database

And don’t forget to exclude that mysql-data directory from your image by including it in your .dockerignore file:

  .git
  .idea

  # Ignore the mysql data files that the dev mysql server creates.
  sql/docker/mysql-data

Version 2a - custom user creation

The stock setup above works great - until you have some super old code that you don’t want to have to upgrade to recent MySQL libraries. If you want to connect those projects to MySQL 8, you need to tell the database that this user will be using an older authentication plugin. So you need to issue the user create yourself. One option is to only do the user creation and privilege setting in your script, leaving the database creation and loading as above. Put the following in sql/docker/init.sql:

  create user 'vote' identified with mysql_native_password by 'password';
  grant all privileges on vote.* to 'vote';

And then remove MYSQL_USER and MYSQL_PASSWORD from the docker-compose.yml environment.

Version 2b - fully custom script

The version above works fine, but if you need additional configuration of the database - or just want to take full control of the user and database creation and data loading - you can use the following.

First, move the initial_data.sql file out of the project’s sql/docker/ directory:

  sql/
    initial_data.sql
    docker/
      my.cnf
      mysql-data/
        ....various files and directories for the actual database

Then remove MYSQL_DATABASE, MYSQL_USER, and MYSQL_PASSWORD variables from your docker-compose.yml and add another volume to mount the directory where you put initial_data.sql. I just moved the file up one level and then mounted /sql as /sql_data in the container. The docker-compose.yml should now look like this:

  mysql:
    image: mysql:8.0.23
    container_name: "db-vote"
    ports:
      - "3306:3306"
    environment:
      MYSQL_ROOT_PASSWORD: root_password
    cap_add:
      - SYS_NICE  # removes error messages like: mbind: Operation not permitted
    volumes:
      - ./sql/docker/mysql-data:/var/lib/mysql
      - ./sql/docker/my.cnf:/etc/mysql/conf.d/dev.cnf
      - ./sql/docker:/docker-entrypoint-initdb.d
      - ./sql:/sql_data

Then, in sql/docker/init.sql, create your user and database; then load your data from the mounted file:

  create database 'vote';
  create user 'vote' identified with mysql_native_password by 'password';
  grant all privileges on vote.* to 'vote';

  -- load initial data
  use vote
  source /sql_data/initial.sql