CNK's Blog

Tuning Django REST Framework Serializers

One problem that often comes up when you are using an object-relational mapper is called the N+1 query problem - inadvertently doing a query and then doing a separate query for the related objects for each row. When building sites using Ruby on Rails, the framework logs all SQL queries (while you are in development mode). So one tends to fix these inefficient queries as you are developing - if nothing else, in self-defense so you can actually see the things you care about in your logs.

Django, on the other hand, does not log anything except the timestamp, request, response_code, and response size. Its default logging configuration doesn’t log any request parameters or database queries. So it’s easy to overlook inefficient queries. So when we finally put a reasonable amount of test data into our staging server, we found that several of our API endpoints were agonizingly slow. So, time for some tuning!

Setup

Lots of people use the django debug toolbar but I really prefer log files. So I installed and configured Django Query Inspector. That was helpful for identifying some of the worst offenders but for the real tuning, I needed this stanza to log all database queries:

    LOGGING = {
        'version': 1,
        'disable_existing_loggers': False,
        'handlers': {
            'console': {
                'level': 'DEBUG',
                'class': 'logging.StreamHandler',
            }
        },
        'loggers': {
            'django.db.backends': {
                'handlers': ['console'],
                'level': 'DEBUG',
            },
        }
    }

Once I had that going, I started looking at some of my nested serializers. With a couple of well placed “select_related”s on the queries in my views, I was able to get rid of most of the excess queries but I was consistently seeing an extra query that I couldn’t figure out - until I started to write up an issue to post on IRC.

The extra query was coming in because I was using DRF’s browsable API to do my query tuning. The browsable API includes a web form for experimenting with the create and update actions in a ModelViewSet and that form has a select menu for each foreign key relationship that needs to be created. So when I made a request in the browser, I saw:

    (0.000) QUERY = '
    SELECT "project_goal"."id", "project_goal"."name",
           "project_goal"."metagoal_id", "project_metagoal"."id",
           "project_metagoal"."name", "project_metagoal"."project_id"
    FROM "project_goal" INNER JOIN "project_metagoal"
      ON ("project_goal"."metagoal_id" = "project_metagoal"."id" )
    WHERE "project_goal"."id" = %s' - PARAMS = (3,); args=(3,)

    (0.000) QUERY = '
    SELECT "project_metagoal"."id",
           "project_metagoal"."name", "project_metagoal"."project_id"
    FROM "project_metagoal"' - PARAMS = (); args=()

    [SQL] 2 queries (0 duplicates), 0 ms SQL time, 101 ms total request time
    [15/Jul/2016 01:40:53] "GET /api/goals/3/ HTTP/1.1" 200 10565

But when I made the same request using curl, I only see the one join query that I was expecting:

    $ curl http://127.0.0.1:8000/api/goals/3/ | jq .
    {"id": 3,
     "url": "http://127.0.0.1:8000/api/goals/3/",
     "name": "Subgoal 3",
     "metagoal": "http://127.0.0.1:8000/api/metagoals/1/"
    }

    (0.000) QUERY = '
    SELECT "project_goal"."id", "project_goal"."name",
           "project_goal"."metagoal_id", "project_metagoal"."id",
           "project_metagoal"."name", "project_metagoal"."project_id"
    FROM "project_goal" INNER JOIN "project_metagoal"
      ON ("project_goal"."metagoal_id" = "project_metagoal"."id" )
    WHERE "project_goal"."id" = %s' - PARAMS = (3,); args=(3,)

    [SQL] 1 queries (0 duplicates), 0 ms SQL time, 12 ms total request time
    [15/Jul/2016 01:40:47] "GET /api/goals/3/ HTTP/1.1" 200 5398

Bash_it using git diff as diff

I used Kitchenplan to set up my new mac. There is newer configuration option based on Ansible by the same author - Superlumic. I would like to try it but didn’t have time to experiment with this time around.

The big plus for using Kitchenplan was that our small development team ended up with Macs that are all configured more or less the same way. Another plus is it installs bash_it which does a lot more shell configuring than I have ever bothered to do. The only thing I have found not to like is that it wants to invoke git’s diff tool instead of the regular unix diff. To shut that off, I just edited the place where that was set up. In /etc/bash_it/custom/functions.bash (line 72) I commented out:

    # Use Git’s colored diff when available
    hash git &>/dev/null
    if [ $? -eq 0 ]; then
      function diff() {
        git diff --no-index --color-words "$@"
      }
    fi

Testing File Uploads (in Django)

I am trying to improve the test coverage of our work project and needed to test the avatar upload that is associated with creating users in our project. I didn’t find any place that laid out how to test file uploads. Fortunately the tests for the easy-thumbnails app we use are pretty good and I was able to piece something together using their code as a model.

In case anyone else is looking for something like this, I updated my easy thumbnails example project to include a couple of tests.

    from PIL import Image
    from django.core.files.base import ContentFile
    from django.core.files.uploadedfile import SimpleUploadedFile
    from django.test import TestCase, Client
    from django.core.urlresolvers import reverse
    from django.utils.six import BytesIO
    from .factories import UserFactory
    from .models import UserProfile


    # "borrowed" from easy_thumbnails/tests/test_processors.py
    def create_image(storage, filename, size=(100, 100), image_mode='RGB', image_format='PNG'):
        """
        Generate a test image, returning the filename that it was saved as.

        If ``storage`` is ``None``, the BytesIO containing the image data
        will be passed instead.
        """
        data = BytesIO()
        Image.new(image_mode, size).save(data, image_format)
        data.seek(0)
        if not storage:
            return data
        image_file = ContentFile(data.read())
        return storage.save(filename, image_file)


    class UserTests(TestCase):
        def setUp(self):
            self.user = UserFactory(username='me')

        # deleting the user will remove the user, the user_profile, AND the avatar image
        def tearDown(self):
            self.user.delete()

        def test_adding_an_avatar_image(self):
            # make sure we start out with no UserProfile (and thus no avatar)
            self.assertIsNone(UserProfile.objects.filter(user_id=self.user.id).first())
            myClient = Client()
            myClient.login(username=self.user.username, password='password')

            # set up form data
            avatar = create_image(None, 'avatar.png')
            avatar_file = SimpleUploadedFile('front.png', avatar.getvalue())
            form_data = {'avatar': avatar}

            response = myClient.post(reverse('avatar_form'), form_data, follow=True)
            self.assertRegex(response.redirect_chain[0][0], r'/users/profile/$')
            # And now there is a user profile with an avatar
            self.assertIsNotNone(self.user.profile.avatar)

        def test_uploading_non_image_file_errors(self):
            # make sure we start out with no UserProfile (and thus no avatar)
            self.assertIsNone(UserProfile.objects.filter(user_id=self.user.id).first())
            myClient = Client()
            myClient.login(username=self.user.username, password='password')

            # set up form data
            text_file = SimpleUploadedFile('front.png', b'this is some text - not an image')
            form_data = {'avatar': text_file}

            response = myClient.post(reverse('avatar_form'), form_data, follow=True)
            self.assertFormError(response, 'avatar_form', 'avatar',
                                 'Upload a valid image. The file you uploaded was either not an image or a corrupted image.')

Django's GenericForeignKeys and GenericRelations

I am working on a project that has two separate but interrelated Django web sites (projects in Django’s parlance). In an earlier blog post, I described setting up the second project (mk_ai) to have read-only access to the first project’s database (mk_web_core) in dev but then getting around those access restrictions for testing. The main thing I need for testing is a big, set of hierarchical data to be loaded into the first project’s test database. I can use the manage commands dumpdata and loaddata to preserve date in my development environment, but when I tried to load that same data into the test database, I ran into problems.

We are using GenericForeignKeys and GenericRelations. Django implements GenericForeignKeys by creating a database foreign key into the django_content_type table. In our mixed database setup, my django_content_type table is in the mk_ai schema. So, even if I set up my database router to allow_relation across databases AND the postgres database adapter would even attempt to make that join, the content types in the references in mk_web_core would not be in mk_ai’s django_content_type table. So we can’t use Django’s GenericForeignKeys. What shall we do instead?

Rails implements a similar type of relationship with a feature it calls Polymorphic Associations. Django stores the object’s id + a FK link to row in the content_type table representing the the object’s model. Rails store’s the object’s id + the object’s class name in a field it calls

_type. I decided to use the Rails method to set up my database representations. That replaces the GenericForiegnKey aspect. To replace the GenericRelation part, I just created a case statement that allows queries to chain in the approrpriate related model based on the ... content type. Perhaps showing an example will make this clearer. The original way, using Django's GenericForeignKey:
    class PageBlock(models.Model):
        page = models.ForeignKey('Page')
        position = models.PositiveSmallIntegerField()
        allowed_block_types = models.Q(app_label='materials', model='text') | \
                models.Q(app_label='materials', model='video') | \
                models.Q(app_label='course_materials', model='image')
        block_type = models.ForeignKey(ContentType, limit_choices_to=allowed_block_types)
        object_id = models.PositiveSmallIntegerField()
        material = GenericForeignKey(block_type', 'object_id')
The 'rails' way, using a block_type name field that can be read directly in the mk_ai schema.
    class PageBlock(models.Model):
        """
        This is a mapping table to all us to access collections of
        blocks regardless of their actual type.

        TODO:
        Figure out how to make the object_id options fill a select
        list once the user chooses a block_type in the form on the
        admin interface.
        """
        BLOCK_TYPE_NAMES = [('text', 'TextBlock'),
                            ('video', 'VideoBlock'),
                            ('image', 'ImageBlock'),
                           ]
        page = models.ForeignKey('Page')
        position = models.PositiveSmallIntegerField()
        block_type_name = models.CharField(max_length=100, choices=BLOCK_TYPE_NAMES)
        # The block_id would be a ForeignKey field into a Video, Image... if we were mapping to just one model
        block_id = models.PositiveSmallIntegerField()

        @property
        def block(self):
            if self.block_type == 'TextBlock':
                return TextBlock.objects.get(pk=self.block_id)
            if self.block_type == 'VideoBlock':
                return VideoBlock.objects.get(pk=self.block_id)
            if self.block_type == 'ImageBlock':
                return ImageBlock.objects.get(pk=self.block_id)
GenericForeignKey and GenericRelation are two sides of the coin - they allow you to easily make queries both directions. In our domain, I don't really have much occaision to go from Block to Page, so I don't really need to GenericRelation. However, if you need to replace it, you can create a method to do the appropriate query.
    # ORIGINALLY
    class VideoBlock(models.Model):
        title = models.CharField(max_length=256)
        content = models.FileField(upload_to='videos/')
        page_block = GenericRelation(PageBlock,
                                     object_id_field='object_id',
                                     content_type_field='page_block')
        @property
        def model_name(self):
           return "VideoBlock"

    # AFTER REMOVING THE GenericForeignKey
    class VideoBlock(models.Model):
        title = models.CharField(max_length=256)
        content = models.FileField(upload_to='videos/')

        @property
        def model_name(self):
            return "VideoBlock"

        @property
        def page_block(self):
            return self.PageBlock.objects.filter(block_type_name='VideoBlock',
                                                 object_id=self.id)

Review of 'React Under the Hood'

We are using React at work. The official documentation is really good - especially the Thinking in React section. But I could still use some additional examples, ideas, etc. so when I saw an offer for several books on React, including one from Arkency that I had been considering buying for a while, I broke down and bought the bundle. The first one I read is “React Under the Hood” by Freddy Rangel.

Overall it is a really good book. The author points out that some of the ideas in react come from the game development world. To emphasize that, the example code for the book is a Star Trek ‘game’. The author provides a git repository you can clone to get started. The project is set up to use babel and webpack and a node dev server - all of which just work out of the box. I need to dig into one of the other books from the Indie Bundle, Survive JS, to learn more about how to set these up. You build almost all of the code - except for the rather hairy navigation and animation parts which are available in the clone you are encouraged to use to get started.

The example stresses good engineering practices - especially having one or two smart components that control all state mutation and lots of well separated dumb components that just render the approprite info for a given state. I really liked the EditableElement component and will probably steal it for a play project I want to do after completing this book.

The author did not use ES6 syntax because it might be unfamiliar to some people. I actually find the new syntax easier so I translated most things into using ‘let’ instead of var and all seemed to go just fine. The other change I made throughout is to the module.exports for each .jsx file. The book suggests starting each class like this:

    module.exports = React.createClass({
      render: function() {
        //whatever
      },
    });

If you do this, the React plugin for the Chrome Developer tools just labels each component as ... which means you have to dig around for the section of rendered code you want to inspect. The project I am on at work uses a slightly different syntax but one that is a lot easier to read and understand:

    let Something = React.createClass({
      render: function() {
        //whatever
      },
    });

    module.exports = Something;

If you do this, then the React debugging tab now shows this component as <Something attr1=xxx…></Something> which makes it a LOT easier to find the code you want to inspect.

The example was good but the best part was the great material in the final chapter. It discusses

  1. PropTypes (which I had heard of but forgotten).

  2. getDefaultState and getDefaultProps (haven’t used them, but they might come in handy).

  3. How to profile you code with Perf - and then some suggestions about what to do about what you find. Good information about how to improve performance of components that are basically render only (per the design espoused in the rest of the book) using a React add on called PureRenderMixin. I am going to have to look into mixins.