CNK's Blog

Postgres Makes Scheduling Easy

At work, we need to build a scheduling system. We want to present the user with a list of possible dates - and then the possible slots on that date. I don’t want to have all the possible empty slots in the database so I thought I would have to build them procedurally using Python.

    import calendar
    from datetime import timedelta
    from pytz import timezone as pytz_timezone

    AVAILABLE_DAYS = ['Monday', 'Wednesday', 'Friday']
    AVAILABLE_START_TIME = {'hours': 8, 'minutes': 0, 'timezone': 'UTC'}
    # start + estimated duration must be earlier than this
    AVAILABLE_END_TIME = {'hours': 20, 'minutes': 0, 'timezone': 'UTC'}

    def possible_times(start_date, end_date, estimated_duration, granularity=60):
        '''
        Returns a list of times when a user may start a reservation between start_date and end_date (inclusive)
        By default reservations may start hourly from AVAILABLE_START_TIME onwards;
        you may adjust how frequently reservations may start by setting the 'granularity' (in minutes)
        '''
        possibles = []
        date = _first_slot(start_date)
        while date <= end_date:
            if not _is_possible_day(date):
                # skip this day
                date += timedelta(days=1)
                continue

            # find slots on day
            last_slot = _last_slot(date, estimated_duration)
            while date <= last_slot:
                possibles.append(date)
                date += timedelta(minutes=granularity)

            # go to next day
            date = _first_slot(date + timedelta(days=1))

        return possibles


    # ############## helper methods #######################

    def _is_possible_day(date, available_days=None):
        if not available_days:
            available_days = AVAILABLE_DAYS
        return calendar.day_name[date.weekday()] in available_days


    def _first_slot(date, start_time=None):
        '''Returns the first slot of the day'''
        if not start_time:
            start_time = AVAILABLE_START_TIME
        first_slot = date.replace(hour=start_time['hours'],
                                  minute=start_time['minutes'],
                                  tzinfo=pytz_timezone(start_time['timezone']))
        return first_slot


    def _last_slot(date, duration, end_time=None):
        if not end_time:
            end_time = AVAILABLE_END_TIME
        last_slot = date.replace(hour=end_time['hours'],
                                 minute=end_time['minutes'],
                                 tzinfo=pytz_timezone(end_time['timezone']))
        last_slot -= duration
        return last_slot

The code above loops over the days in the range - and then on available days, loops over the hours in that day and returns a list of datetimes. There is a lot of ugly adding of Python timedelta objects and resetting the time to start iterating on a new day. It works - but the next step, eliminating slots that are already full, is going to be even uglier - lots of tedious “does this interval overlap with existing scheduled events”.

    from datetime import datetime, timezone, timedelta
    from django.test import TestCase
    from ..utils import possible_times

    class ReservationUtilsTests(TestCase):
        # ############### integration tests ##########################
        def test_no_possible_times_on_sunday(self):
            start_date = datetime(2017, 4, 30, hour=8, minute=0, tzinfo=timezone.utc)
            end_date = datetime(2017, 4, 30, hour=23, minute=0, tzinfo=timezone.utc)
            duration = 60  # in minutes
            slots = possible_times(start_date, end_date, duration)
            self.assertEqual(slots, [])

        def test_hourly_possible_times_end_earlier_if_takes_longer(self):
            start_date = datetime(2017, 4, 30, hour=8, minute=0, tzinfo=timezone.utc)
            end_date = datetime(2017, 5, 2, hour=23, minute=0, tzinfo=timezone.utc)
            expected = [datetime(2017, 5, 1, 8, 0, tzinfo=timezone.utc),
                        datetime(2017, 5, 1, 9, 0, tzinfo=timezone.utc),
                        datetime(2017, 5, 1, 10, 0, tzinfo=timezone.utc),
                        datetime(2017, 5, 1, 11, 0, tzinfo=timezone.utc),
                        datetime(2017, 5, 1, 12, 0, tzinfo=timezone.utc),
                        datetime(2017, 5, 1, 13, 0, tzinfo=timezone.utc),
                        datetime(2017, 5, 1, 14, 0, tzinfo=timezone.utc),
                        datetime(2017, 5, 1, 15, 0, tzinfo=timezone.utc),
                        datetime(2017, 5, 1, 16, 0, tzinfo=timezone.utc),
                        datetime(2017, 5, 1, 17, 0, tzinfo=timezone.utc),
                        datetime(2017, 5, 1, 18, 0, tzinfo=timezone.utc),
                        datetime(2017, 5, 1, 19, 0, tzinfo=timezone.utc),
                        ]
            slots = possible_times(start_date, end_date, estimated_duration=timedelta(minutes=60))
            self.assertEqual(len(slots), 12)
            self.assertEqual(slots, expected)
            slots2 = possible_times(start_date, end_date, estimated_duration=timedelta(minutes=120))
            self.assertEqual(len(slots2), 11)
            self.assertEqual(slots2, expected[0:11])

When I started looking into how to check the overlap, I started to looking into checking overlaps in the database - and found that a) Postgres has a date range data type (tstzrange), b) Django’s Postgres extensions has a field that wraps the Postgres tstzrange field (DateTimeRangeField), and c) the Postgres docs even have an example of how to create indexes that prevent you from scheduling more than one person to occupy a specific room at one time. All that ugly python, turns into:

    import calendar
    from django.db import connection

    AVAILABLE_DAYS = ['Monday', 'Wednesday', 'Friday']
    AVAILABLE_START_TIME = '08:00'
    # start + estimated duruation must be earlier than this
    AVAILABLE_END_TIME = '20:00'


    def possible_times(start_date, end_date, estimated_duration, granularity=60):
        allowed_days = [list(calendar.day_name).index(day) + 1 for day in AVAILABLE_DAYS]

        cursor = connection.cursor()
            sql = '''
                  SELECT *
                  FROM   generate_series (timestamp %(start_date)s
                                          , timestamp %(end_date)s - interval '%(duration)s minutes'
                                          , interval '%(granularity)sm') h
                  WHERE  EXTRACT(ISODOW FROM h) in %(allowed_days)s
                    AND    h::time >= %(start_time)s
                    AND    h::time <= %(end_time)s - interval '%(duration)s minutes'
                    ;
                  '''
            cursor.execute(sql, {'start_date': start_date,
                                 'start_time': AVAILABLE_START_TIME,
                                 'end_date': end_date,
                                 'end_time': AVAILABLE_END_TIME,
                                 'duration': estimated_duration,
                                 'granularity': granularity,
                                 'allowed_days': tuple(allowed_days),
                                 })

        slots = [row[0] for row in cursor.fetchall()]
        return slots

The only slightly tricky part of that was restricting allowed days to MWF. I want my constant to use the day names, not the integers Postgres uses for days of the week. So I needed to import Python’s calendar module to convert “Monday” to an integer. Python uses 0 for Monday, but Postgres thinks Monday is 1, so add 1. Then it took me a little while to figure out how to pass a list into the query in a way that everything is properly interpolated and quoted; the trick: tuple(allowed_days).

Now I just need to join to my reservations table to exclude slots where the schedule is already full.