VSCode Adventures - Episode 1

January 10, 2021

But Why?

I have been (was?) a longtime emacs user - ever since I got a Unix account in graduate school. In the first dot com boom, I worked for a company that was mainly populated by emacs users and picked up a few tricks - including using emacs without a mouse. Unlike most emacs users, I never did very much customization. So in some people’s eyes, I was never really an emacs power user. But that did have the advantage that I could sit down to any computer, install a fresh emacs build, and immediately be completely comfortable with the environment.

Well…. except for the fact that on a Mac keyboard, the alt key (on a Mac, that’s the option key), is in the wrong place. For a decade, I simply swapped the keyboard bindings for the option and command keys. Worked great for me but it meant that no one could sit down to my keyboard and do anything. A few years ago I started working with a new team and thought we might do some pair programming. I didn’t want to learn Vim and I couldn’t expect them to switch to Emacs, so we all settled on SublimeText - and I quit remapping my keys at the OS level.

SublimeText worked pretty well for our team and there are a number of things I really like about it. I use the multiline editing a lot - and code folding can be pretty convenient. I know emacs can do both of those things, but I never learned how to set up either of those features in emacs. On the other hand, I never learned to rewrap paragraphs in Sublime (alt-q in emacs). And if I have to do any heavy search and replace, I still prefer using the keyboard shortcuts in emacs to the equivalent replace function in Sublime.

I recently decided it was time for some professional self-improvement and that configuring and really learning one text editor is one thing I need to do. VSCode appears to be where all the cool kids are moving these days, so I decided it was time for me to try it too.

Plugins and Configuration

Emacs Keymap

Since I used emacs without a mouse for literally decades, I use the emacs keybindings to move around - even in a terminal. Having more of my familiar keybindings available would make me a lot happier. I never installed them in Sublime because the point of moving to Sublime was to have a unified editor experience with my team. Unfortunately that team is no longer together, so no reason for me not to set up my text editor to suit me and only me!

I initially found Emacs Keymap but later found a Reddit that pointed out that plugin had not been updated in a while. I took Reddit’s advice and did a little comparison shopping. Awesome Emacs Keymap has excellent ratings and recent updates. So let’s learn to use that. Based on the instructions, I set editor.find.seedSearchStringFromSelection to false. I also need the command key to work as Alt, so I set emacs-mcx.useMetaPrefixMacCmd to true.

Nice key bindings:

Cmd-s and C-x C-s BOTH work to save a file!
C-a (and C-e) behave in an interesting way. The first time you use it, it goes to the start of the visible line; the second time you use it, it goes to the start of the logical line. Not what I am used to from emacs but super handy!

Things I need to get used to or fix:

VSCode shares the kill ring with the OS (BIG WIN!), however, once you get into VSCode, paste is NOT Cmd-v, it’s C-y. I like C-y working, but switching keyboard mappings in mid-operation may break my brain. We’ll have to see if I get used to it.
I am not sure what emacs-mcx.cursorMoveOnFindWidget is. Sounds like I might need to adjust it but not sure yet.

In the mean time, I have printed out the cheat sheet for keybindings.

Rewrap

One thing I miss in Sublime Text is the ability to rewrap paragraphs using M-q. I use that A LOT when writing prose such as documentation or blog posts - or when I edit a block comment and want to reflow the text. This VSCode extension, Rewrap, looks like what I want - it even says “Similar to wrap/fill paragraph in Sublime (alt+q) Emacs (M-q) or Vim (gq); but more powerful.”. The only wrinkle is that it’s default keybinding is Option-q; M-q is already bound to VSCode’s workspace.action.quit. A little searching told me how to remove that keybinding (find it, right click, and choose “remove keybinding” from the dropdown). Then I searched for the Rewrap plugin’s keybindings and swapped in my usual M-q for rewrapping a paragraph.

I also turned on auto wrap (like emacs auto-file mode) and adjusted the editor.wordWrapColumn to 100 since I think the default (80) is a bit narrow these days. We’ll see what I think of that. If I don’t like it, I can either set per file type widths OR try the ‘wrap to rulers’ option.

According to the docs, I should be able to rewrap code comments without messing with doc comments within the comment and without messing with the code itself. It will be very interesting to see if that works.

Spell Right

I am a very poor speller. In emacs, I use ispell. So what can I replace that with in VSCode? Searching for spellcheckers in extensions sows “Spell Right” is popular and will use my Mac’s built-in dictionary. That will be super handy since I can add words like this: https://www.makeuseof.com/tag/add-remove-words-mac-dictionary/

Sync Settings

Now for my last trick, I would like to have the same setup on my work computer and on my personal laptop. VSCode allows me to save my settings - using a private GIST attached to my GitHub account. There are a bunch of docs about how to do partial syncs and how to resolve conflicts. But since this is a new install on both computers, I just clicked the menu item for turning sync on, told GitHub to allow VSCode to store information for me, and now I have matching settings on both my computers.

Bonus!

There is an in-editor Markdown previewer!! So I can see how this post renders as I compose it!

Customizing Wagtail Privacy Options

November 23, 2020

Out of the box, Wagtail comes with privacy options for pages and files (anything stored in a Collection, e.g. documents and images). The options that come built in include:

accessible to anyone (aka Public)
accessible to people in specific groups
accessible to anyone who can log into your site at all
accessible if they have a password you enter on that form

That would seem like a fairly comprehensive list, but at work, we often restrict things to anyone coming from specific IP addresses. So when we rolled out our new CMS, we had requests for it to support the “on campus” option - just like the old ones had.

Adding the “On Campus” option comes in two parts: adding it to the options offered to the editor and then enforcing that restriction when serving pages or documents.

Adding a New Restriction Choice

The privacy options are defined in the BaseViewRestriction class in wagtail.core.models. We will be deciding if a browser is coming from on or off campus with a middleware, so we will not have to add any columns to the tables defined by classes inheriting from BaseViewRestriction. But we do need to add “on campus” to the options offered to the site editor.

To override the stock RESTRICTION_CHOICES from wagtail.core.models.BaseViewRestriction with our own, we need to monkeypatch the class. We have a number of small customizations in our wagtail site, so we collect them all into their own app, wagtail_patches, which we include in INSTALLED_APPS.

    our_site
      .... other apps ...
      wagtail_patches
        monkey_patches.py
        wagtail_hook_patches.py

    # In our settings.py
    INSTALLED_APPS = [
        # Our apps...
        'www',
        # The app containing our monkey patches
        'wagtail_patches',
        # Wagtail apps.
        'wagtail.embeds',
        'wagtail.sites',
        'wagtail.users',
        'wagtail.snippets',
        'wagtail.documents',
        'wagtail.images',
        'wagtail.search',
        'wagtail.admin',
        'wagtail.core',
    ]

And now the monkeypatching code:

    # In wagtail_patches/monkey_patches.py

    RESTRICTION_CHOICES = (
        (NONE, \_("Public")),
        (LOGIN, \_("Private, accessible to logged-in users")),
        (ON_CAMPUS, \_("Private, accessible to users on campus or on VPN")),
        (PASSWORD, \_("Private, accessible with the following password")),
        (GROUPS, \_("Private, accessible to users in specific groups")),
    )
    wagtail.core.models.BaseViewRestriction.ON_CAMPUS = 'on_campus'
    wagtail.core.models.BaseViewRestriction.RESTRICTION_CHOICES = RESTRICTION_CHOICES

That will add the ON_CAMPUS choice to our form. Since there are no additional parameters needed for this restriction, you wouldn’t think we would have do make any additional changes to the form or form validations. But as of Django 3, we also need to patch the model level validations. We do that like this:

    # In wagtail_patches/monkey_patches.py

    def patched_PageViewRestriction_clean_fields(self, exclude=None):
        """
        Clean all fields and raise a ValidationError containing a dict
        of all validation errors if any occur.
        """
        if exclude is None:
            exclude = []

        errors = {}
        for f in self._meta.fields:
            # BEGIN PATCH
            if f.attname == 'restriction_type':
                f.choices = RESTRICTION_CHOICES
            # END PATCH
            if f.name in exclude:
                continue
            # Skip validation for empty fields with blank=True. The developer
            # is responsible for making sure they have a valid value.
            raw_value = getattr(self, f.attname)
            if f.blank and raw_value in f.empty_values:
                continue
            try:
                setattr(self, f.attname, f.clean(raw_value, self))
            except ValidationError as e:
                errors[f.name] = e.error_list

        if errors:
            raise ValidationError(errors)
    wagtail.core.models.PageViewRestriction.clean_fields = patched_PageViewRestriction_clean_fields

Enforcing Our New Restriction

In our setup, we have split enforcement into two parts, a middleware that determines if a request is “on campus” or not and then code that uses that information to show or not show the private page or file. We took this approach because we already have shared library that does the “on campus” checking. If you do not need to share the code that checks for on vs off campus, you may want to put that check directly into the code that enforces the rule.

On Campus Middleware

Define the following middleware somewhere in your project - customizing it with your own IP addresses.

    class OnCampusMiddleware(MiddlewareMixin):
        """
        Middleware sets ON_CAMPUS session variable to True if the request
        came from an campus IP or if the user is authenticated.
        """
        CAMPUS_ADDRESSES = [
            r'192\.168\.\d{1,3}\.\d{1,3}',
        ]

        def check_ip(self, request):
            client_ip = get_client_ip(request)

            if client_ip:
                for ip_regex in self.CAMPUS_ADDRESSES:
                    if re.match(ip_regex, client_ip):
                        return True
            return False

        def process_request(self, request):
            # A user is considered "on campus" if they are visiting from a campus IP, or are logged in to the site.
            request.session['ON_CAMPUS'] = request.user.is_authenticated or self.check_ip(request)
            return None

Then add this to the MIDDLEWARE list in your Django settings file. Since this middleware is a silent pass through in both directions (only the side effect of setting the ON_CAMPUS session variable to True or False matters), you can put this line anywhere in the list.

Updating the Enforcement Code

The meat of the restriction enforcement is in BaseViewRestriction’s accept_request method, so we need to add our new on-campus check:

    def patched_accept_request(self, request):
        if self.restriction_type == BaseViewRestriction.PASSWORD:
            passed_restrictions = request.session.get(self.passed_view_restrictions_session_key, [])
            if self.id not in passed_restrictions:
                return False

        elif self.restriction_type == BaseViewRestriction.LOGIN:
            if not request.user.is_authenticated:
                return False

        # BEGIN PATCH
        # Add a privacy mode that allows only on-campus visitors.
        elif self.restriction_type == wagtail.core.models.BaseViewRestriction.ON_CAMPUS:
            if not request.session['ON_CAMPUS']:
                return False
        # END PATCH

        elif self.restriction_type == BaseViewRestriction.GROUPS:
            if not request.user.is_superuser:
                current_user_groups = request.user.groups.all()

                if not any(group in current_user_groups for group in self.groups.all()):
                    return False

        return True
    wagtail.core.models.BaseViewRestriction.accept_request = patched_accept_request

What should happen when accept_request returns False? That depends on which restriction triggers the failure. For example, if a user fails the LOGIN restriction, they should be directed to log in - but if they fail the ON_CAMPUS restriction, they should get an error message. The correct actions for the built-in restriction types are handled in a before_serve_page hook called check_view_restrictions Since we already have monkey patched some other hooks, what we did was to monkey patch check_view_restrictions:

    # In wagtail_patches/wagtail_hook_patches.py
    from django.urls import reverse
    from wagtail.core.hooks import _hooks, get_hooks
    from wagtail.core.models import PageViewRestriction
    from wagtail.core.wagtail_hooks import require_wagtail_login


    def patched_check_view_restrictions(page, request, serve_args, serve_kwargs):
        """
        Check whether there are any view restrictions on this page which are
        not fulfilled by the given request object. If there are, return an
        HttpResponse that will notify the user of that restriction (and possibly
        include a password / login form that will allow them to proceed). If
        there are no such restrictions, return None
        """
        for restriction in page.get_view_restrictions():
            if not restriction.accept_request(request):
                if restriction.restriction_type == PageViewRestriction.PASSWORD:
                    from wagtail.core.forms import PasswordViewRestrictionForm
                    form = PasswordViewRestrictionForm(instance=restriction,
                                                       initial={'return_url': request.get_full_path()})
                    action_url = reverse('wagtailcore_authenticate_with_password', args=[restriction.id, page.id])
                    return page.serve_password_required_response(request, form, action_url)

                elif restriction.restriction_type in [PageViewRestriction.LOGIN, PageViewRestriction.GROUPS]:
                    return require_wagtail_login(next_url=request.get_full_path())
                # Begin patch: Added a code path for the on_campus restriction.
                elif restriction.restriction_type == PageViewRestriction.ON_CAMPUS:
                    # We set request.is_preview like Page.serve() would have done, since this code bypasses it.
                    request.is_preview = getattr(request, 'is_preview', False)
                    # Render the on_campus_only.html template, instead of the usual page template.
                    return TemplateResponse(request, 'core/on_campus_only.html', page.get_context(request))
                # end patch


    def patch_hooks():
    """
    This function replaces various wagtail hook implementations with our own versions.
    """
    for ndx, _ in enumerate(get_hooks('before_serve_page')):
        func = _hooks['before_serve_page'][ndx][0]
        if func.__module__ == 'wagtail.core.wagtail_hooks':
            _hooks['before_serve_page'][ndx] = (patched_check_view_restrictions, 0)

But looking at the source code for the page serve view, I don’t think we need to replace the existing check_view_restrictions. I think we can just add an additional before_serve_page hook that returns our “sorry you need to be on campus to see this page” message. If I were doing this from scratch, I would put the following code into one of my wagtail_hooks.py files (either in the wagtail_patches app or in that app that contains most of my page models).

    # In wagtail_patches/wagtail_hooks.py
    from wagtail.core import hooks

    @hooks.register('before_serve_page')
    def enforce_on_campus_restriction(page, request, serve_args, serve_kwargs):
        if not restriction.accept_request(request):
            for restriction in page.get_view_restrictions():
                if restriction.restriction_type == PageViewRestriction.ON_CAMPUS:
                    # We set request.is_preview like Page.serve() would have done, since this code bypasses it.
                    request.is_preview = getattr(request, 'is_preview', False)
                    # Render the on_campus_only.html templates
                    return TemplateResponse(request, 'core/on_campus_only.html', page.get_context(request))

Postgres Makes Scheduling Easy

May 11, 2017

At work, we need to build a scheduling system. We want to present the user with a list of possible dates - and then the possible slots on that date. I don’t want to have all the possible empty slots in the database so I thought I would have to build them procedurally using Python.

    import calendar
    from datetime import timedelta
    from pytz import timezone as pytz_timezone

    AVAILABLE_DAYS = ['Monday', 'Wednesday', 'Friday']
    AVAILABLE_START_TIME = {'hours': 8, 'minutes': 0, 'timezone': 'UTC'}
    # start + estimated duration must be earlier than this
    AVAILABLE_END_TIME = {'hours': 20, 'minutes': 0, 'timezone': 'UTC'}

    def possible_times(start_date, end_date, estimated_duration, granularity=60):
        '''
        Returns a list of times when a user may start a reservation between start_date and end_date (inclusive)
        By default reservations may start hourly from AVAILABLE_START_TIME onwards;
        you may adjust how frequently reservations may start by setting the 'granularity' (in minutes)
        '''
        possibles = []
        date = _first_slot(start_date)
        while date <= end_date:
            if not _is_possible_day(date):
                # skip this day
                date += timedelta(days=1)
                continue

            # find slots on day
            last_slot = _last_slot(date, estimated_duration)
            while date <= last_slot:
                possibles.append(date)
                date += timedelta(minutes=granularity)

            # go to next day
            date = _first_slot(date + timedelta(days=1))

        return possibles


    # ############## helper methods #######################

    def _is_possible_day(date, available_days=None):
        if not available_days:
            available_days = AVAILABLE_DAYS
        return calendar.day_name[date.weekday()] in available_days


    def _first_slot(date, start_time=None):
        '''Returns the first slot of the day'''
        if not start_time:
            start_time = AVAILABLE_START_TIME
        first_slot = date.replace(hour=start_time['hours'],
                                  minute=start_time['minutes'],
                                  tzinfo=pytz_timezone(start_time['timezone']))
        return first_slot


    def _last_slot(date, duration, end_time=None):
        if not end_time:
            end_time = AVAILABLE_END_TIME
        last_slot = date.replace(hour=end_time['hours'],
                                 minute=end_time['minutes'],
                                 tzinfo=pytz_timezone(end_time['timezone']))
        last_slot -= duration
        return last_slot

The code above loops over the days in the range - and then on available days, loops over the hours in that day and returns a list of datetimes. There is a lot of ugly adding of Python timedelta objects and resetting the time to start iterating on a new day. It works - but the next step, eliminating slots that are already full, is going to be even uglier - lots of tedious “does this interval overlap with existing scheduled events”.

    from datetime import datetime, timezone, timedelta
    from django.test import TestCase
    from ..utils import possible_times

    class ReservationUtilsTests(TestCase):
        # ############### integration tests ##########################
        def test_no_possible_times_on_sunday(self):
            start_date = datetime(2017, 4, 30, hour=8, minute=0, tzinfo=timezone.utc)
            end_date = datetime(2017, 4, 30, hour=23, minute=0, tzinfo=timezone.utc)
            duration = 60  # in minutes
            slots = possible_times(start_date, end_date, duration)
            self.assertEqual(slots, [])

        def test_hourly_possible_times_end_earlier_if_takes_longer(self):
            start_date = datetime(2017, 4, 30, hour=8, minute=0, tzinfo=timezone.utc)
            end_date = datetime(2017, 5, 2, hour=23, minute=0, tzinfo=timezone.utc)
            expected = [datetime(2017, 5, 1, 8, 0, tzinfo=timezone.utc),
                        datetime(2017, 5, 1, 9, 0, tzinfo=timezone.utc),
                        datetime(2017, 5, 1, 10, 0, tzinfo=timezone.utc),
                        datetime(2017, 5, 1, 11, 0, tzinfo=timezone.utc),
                        datetime(2017, 5, 1, 12, 0, tzinfo=timezone.utc),
                        datetime(2017, 5, 1, 13, 0, tzinfo=timezone.utc),
                        datetime(2017, 5, 1, 14, 0, tzinfo=timezone.utc),
                        datetime(2017, 5, 1, 15, 0, tzinfo=timezone.utc),
                        datetime(2017, 5, 1, 16, 0, tzinfo=timezone.utc),
                        datetime(2017, 5, 1, 17, 0, tzinfo=timezone.utc),
                        datetime(2017, 5, 1, 18, 0, tzinfo=timezone.utc),
                        datetime(2017, 5, 1, 19, 0, tzinfo=timezone.utc),
                        ]
            slots = possible_times(start_date, end_date, estimated_duration=timedelta(minutes=60))
            self.assertEqual(len(slots), 12)
            self.assertEqual(slots, expected)
            slots2 = possible_times(start_date, end_date, estimated_duration=timedelta(minutes=120))
            self.assertEqual(len(slots2), 11)
            self.assertEqual(slots2, expected[0:11])

When I started looking into how to check the overlap, I started to looking into checking overlaps in the database - and found that a) Postgres has a date range data type (tstzrange), b) Django’s Postgres extensions has a field that wraps the Postgres tstzrange field (DateTimeRangeField), and c) the Postgres docs even have an example of how to create indexes that prevent you from scheduling more than one person to occupy a specific room at one time. All that ugly python, turns into:

    import calendar
    from django.db import connection

    AVAILABLE_DAYS = ['Monday', 'Wednesday', 'Friday']
    AVAILABLE_START_TIME = '08:00'
    # start + estimated duruation must be earlier than this
    AVAILABLE_END_TIME = '20:00'


    def possible_times(start_date, end_date, estimated_duration, granularity=60):
        allowed_days = [list(calendar.day_name).index(day) + 1 for day in AVAILABLE_DAYS]

        cursor = connection.cursor()
            sql = '''
                  SELECT *
                  FROM   generate_series (timestamp %(start_date)s
                                          , timestamp %(end_date)s - interval '%(duration)s minutes'
                                          , interval '%(granularity)sm') h
                  WHERE  EXTRACT(ISODOW FROM h) in %(allowed_days)s
                    AND    h::time >= %(start_time)s
                    AND    h::time <= %(end_time)s - interval '%(duration)s minutes'
                    ;
                  '''
            cursor.execute(sql, {'start_date': start_date,
                                 'start_time': AVAILABLE_START_TIME,
                                 'end_date': end_date,
                                 'end_time': AVAILABLE_END_TIME,
                                 'duration': estimated_duration,
                                 'granularity': granularity,
                                 'allowed_days': tuple(allowed_days),
                                 })

        slots = [row[0] for row in cursor.fetchall()]
        return slots

The only slightly tricky part of that was restricting allowed days to MWF. I want my constant to use the day names, not the integers Postgres uses for days of the week. So I needed to import Python’s calendar module to convert “Monday” to an integer. Python uses 0 for Monday, but Postgres thinks Monday is 1, so add 1. Then it took me a little while to figure out how to pass a list into the query in a way that everything is properly interpolated and quoted; the trick: tuple(allowed_days).

Now I just need to join to my reservations table to exclude slots where the schedule is already full.

Forcing Django to Make a Subquery

November 11, 2016

Django has extensive documentation for it’s ORM but somehow I still end up surprised by some of the queries it builds. The default logging configuration doesn’t log queries in the way Rails does (in its development environment) so when a query appears to give the correct results, I don’t usually check the SQL. But I recently had a page start to fail; I finally tracked it down to a specific query but couldn’t immediately see why I was not getting the row I expected so I printed the query and Django was not building the query I thought it had been. The python is:

    Material.objects.filter(goal_id=goal_key,
                            material_state='deployed',
                            category_id=category) \
                    .exclude(individualizations__user__id=user_id,
                             individualizations__material_state__in=('done',)) \
                    .select_related('category') \
                    .order_by('category__position', 'created_at')

This produces the following SQL (edited slightly to make it easier to read):

    SELECT "appname_materials".*, "appname_categories"."category_key", "appname_categories"."position"
    FROM "appname_materials" INNER JOIN "appname_categories"
      ON ( "appname_materials"."category_key" = "appname_categories"."category_key" )
    WHERE ("appname_materials"."material_state" = 'deployed'
           AND "appname_materials"."goal_key" = 'goal1'
           AND "appname_materials"."category_key" = 'transition'
           AND NOT ("appname_materials"."material_key" IN (SELECT U1."material_key" AS Col1
                                                           FROM "appname_material_individualizations" U1
                                                           WHERE U1."material_state" IN ('done'))
                    AND "appname_materials"."material_key" IN (SELECT U1."material_key" AS Col1
                                                               FROM "appname_material_individualizations" U1
                                                               WHERE U1."user_id" = 1))
          )
    ORDER BY "appname_categories"."position" ASC, "appname_materials"."created_at" ASC

Hmmm that’s not what I want. I don’t want 2 subqueries, one for each condition. I want one subquery, with two two conditions. If I had wanted 2 subqueries, I would have written 2 excludes, like this:

    Material.objects.filter(goal_id=goal_key,
                            material_state='deployed',
                            category_id=category) \
                    .exclude(individualizations__user__id=user_id) \
                    .exclude(individualizations__material_state__in=('done',)) \
                    .select_related('category') \
                    .order_by('category__position', 'created_at')

But both of those QuerySet definitions produce the same SQL. So how can I produce the following SQL using the Django ORM:

    SELECT "appname_materials".*, "appname_categories"."category_key", "appname_categories"."position"
    FROM "appname_materials" INNER JOIN "appname_categories"
      ON ( "appname_materials"."category_key" = "appname_categories"."category_key" )
    WHERE ("appname_materials"."material_state" = 'deployed'
           AND "appname_materials"."goal_key" = 'goal1'
           AND "appname_materials"."category_key" = 'transition'
           AND NOT "appname_materials"."material_key" IN (SELECT U1."material_key" AS Col1
                                                          FROM "appname_material_individualizations" U1
                                                          WHERE U1."material_state" IN ('done')
                                                          AND U1."user_id" = 1)
           )
    ORDER BY "appname_categories"."position" ASC, "appname_materials"."created_at" ASC

I tried a couple of things using Q but mostly ended up with syntax errors. Fortunately I finally found this Stack Overflow thread with references the bug report for this problem AND the solution. You can force Django to build the desired subquery by writing the subquery explicitly:

    Material.objects.filter(goal_id=goal_key,
                            material_state='deployed',
                            category_id=category) \
                    .exclude(
                         material_key__in=(
                             MaterialIndividualization.objects.values_list('material_id', flat=True)
                            .filter(user__id=user_id, material_state__in=('done',))
                         )
                     ) \
                    .select_related('category') \
                    .order_by('category__position', 'created_at')

It’s a little verbose, but it is actually a little clearer in some respects - it is more like a direct python translation of the desired SQL.

Tuning Django REST Framework Serializers

July 14, 2016

One problem that often comes up when you are using an object-relational mapper is called the N+1 query problem - inadvertently doing a query and then doing a separate query for the related objects for each row. When building sites using Ruby on Rails, the framework logs all SQL queries (while you are in development mode). So one tends to fix these inefficient queries as you are developing - if nothing else, in self-defense so you can actually see the things you care about in your logs.

Django, on the other hand, does not log anything except the timestamp, request, response_code, and response size. Its default logging configuration doesn’t log any request parameters or database queries. So it’s easy to overlook inefficient queries. So when we finally put a reasonable amount of test data into our staging server, we found that several of our API endpoints were agonizingly slow. So, time for some tuning!

Setup

Lots of people use the django debug toolbar but I really prefer log files. So I installed and configured Django Query Inspector. That was helpful for identifying some of the worst offenders but for the real tuning, I needed this stanza to log all database queries:

    LOGGING = {
        'version': 1,
        'disable_existing_loggers': False,
        'handlers': {
            'console': {
                'level': 'DEBUG',
                'class': 'logging.StreamHandler',
            }
        },
        'loggers': {
            'django.db.backends': {
                'handlers': ['console'],
                'level': 'DEBUG',
            },
        }
    }

Once I had that going, I started looking at some of my nested serializers. With a couple of well placed “select_related”s on the queries in my views, I was able to get rid of most of the excess queries but I was consistently seeing an extra query that I couldn’t figure out - until I started to write up an issue to post on IRC.

The extra query was coming in because I was using DRF’s browsable API to do my query tuning. The browsable API includes a web form for experimenting with the create and update actions in a ModelViewSet and that form has a select menu for each foreign key relationship that needs to be created. So when I made a request in the browser, I saw:

    (0.000) QUERY = '
    SELECT "project_goal"."id", "project_goal"."name",
           "project_goal"."metagoal_id", "project_metagoal"."id",
           "project_metagoal"."name", "project_metagoal"."project_id"
    FROM "project_goal" INNER JOIN "project_metagoal"
      ON ("project_goal"."metagoal_id" = "project_metagoal"."id" )
    WHERE "project_goal"."id" = %s' - PARAMS = (3,); args=(3,)

    (0.000) QUERY = '
    SELECT "project_metagoal"."id",
           "project_metagoal"."name", "project_metagoal"."project_id"
    FROM "project_metagoal"' - PARAMS = (); args=()

    [SQL] 2 queries (0 duplicates), 0 ms SQL time, 101 ms total request time
    [15/Jul/2016 01:40:53] "GET /api/goals/3/ HTTP/1.1" 200 10565

But when I made the same request using curl, I only see the one join query that I was expecting:

    $ curl http://127.0.0.1:8000/api/goals/3/ | jq .
    {"id": 3,
     "url": "http://127.0.0.1:8000/api/goals/3/",
     "name": "Subgoal 3",
     "metagoal": "http://127.0.0.1:8000/api/metagoals/1/"
    }

    (0.000) QUERY = '
    SELECT "project_goal"."id", "project_goal"."name",
           "project_goal"."metagoal_id", "project_metagoal"."id",
           "project_metagoal"."name", "project_metagoal"."project_id"
    FROM "project_goal" INNER JOIN "project_metagoal"
      ON ("project_goal"."metagoal_id" = "project_metagoal"."id" )
    WHERE "project_goal"."id" = %s' - PARAMS = (3,); args=(3,)

    [SQL] 1 queries (0 duplicates), 0 ms SQL time, 12 ms total request time
    [15/Jul/2016 01:40:47] "GET /api/goals/3/ HTTP/1.1" 200 5398

← Older Blog Archives Newer →