CNK's Blog

Using Multiple Databases in Django

I am currently working on a project that has a main public web site (mk_web_core) and then a separate AI (mk_ai) application that needs access to a large percentage of the information in the public site’s database. Since the AI only makes sense in the context of the public web site, one option might be to make them a single application. However, we are planning to experiment with different versions of the AI, so it seems sensible to separate them and develop an API contract between the two halves.

My first thought was to completely separate the two - separate code bases and separate databases. However, the main application has a deeply nested hierarchical schema and the AI needs to know all the gorey details of that structure. So if we completely separate the two apps, we need to build something to keep the two views of that hierarchy in sync. We will eventually need to do that - and then build an extract, transform, and load (ETL) process for keeping the AI in sync with the main site. But for now, we are going to put that off and instead allow the AI read-only access to the information it needs from the main site.

Django has built in support for multiple database connections so getting things set up so my AI site could read from the mk_web_core database was pretty straightforward. The documentation on multiple datbases indicated that one should create a database router for each database and then in my settings.py file give DATABASE_ROUTERS a list containing the two routers. After setting up the database configuration, I copied the model files from the mk_web_core project into corresponding app locations in the mk_ai project. I did not want the mk_ai project to make any changes to the mk_web_core schema, so I added managed = False to the Meta class for each model class.

Tests That Depend On The “Read-Only” Database

The original two database router configuration seemed to work but then I decided I really had to write some unit tests for the mk_ai application. The mk_web_core application already has unit tests. And since it is fairly independent - it only interacts with the AI system through a single “next recommendation” API call - it is easy to mock out the way it depends on mk_ai without compromising my confidence in the tests. However, the behavior AI application depends strongly on the data from the mk_web_core application. So to create any meaningful tests, we really need to be able to create specific data in a test version of the mk_web_core database. So all of the configuration I did to prevent the AI application from writing to the mk_web_core schema made it impossible to set up my test database. Hmmm.

So I removed the managed = False from each model’s Meta class and tried to figure out how to set up my routers so that I can write to the mk_web_core database test database, but not the mk_web_core production database. I struggled for a while and then I found this blog post from NewCircle. After some trial and error, this router appears to do what I need:

    from django.conf import settings

    class DefaultDatabaseRouter(object):
        def db_for_read(self, model, **hints):
            """
            This is the fall through. If the table isn't found in mk_web_core, it must be here.
            """
            if model._meta.app_label in ['accounts', 'materials']:
                return 'mk_web_core'
            else:
                return 'default'

        def db_for_write(self, model, **hints):
            """
            This is the fall through. All writes should be directed here.
            """
            if model._meta.app_label in ['accounts', 'materials]:
                if settings.TESTING:
                    return 'mk_web_core'
                else:
                    raise Exception('Attempt to write to mk_web_core from mk_ai when settings.TESTING not true!')
            else:
                return 'default'

        def allow_relation(self, obj1, obj2, **hints):
            """
            Relations between objects are allowed if both objects are in the same pool.
            """
            return obj1._state.db == obj2._state.db

        def allow_migrate(self, db, app_label, model=None, **hints):
            """
            Write to test_mk_web_core when we are running unit tests.

            The check for model is because the contenttypes.0002_remove_content_type_name migration fails
            with message: AttributeError: 'NoneType' object has no attribute '_meta'
            """
            if app_label in ['accounts', 'materials']:
                if db == 'mk_web_core' and settings.TESTING:
                    return True
                else:
                    return False
            else:
                # Shortcut, we do import into default (mk_ai) but not into mk_web_core
                return db == 'default'

It is somewhat confusing that even though the tables for migrations from the materials app are not created, the migrations from materials are listed when you run python manage.py showmigrations and are recorded in the django_migrations table.

Django has support for test fixtures in its TestCase. But the fixtures are loaded and removed for each and every test. That is excessive for my needs and will make out tests very slow. I finally figured out how to load data into mk_web_core once at the beginning our tests - using migrations:

    from django.db import migrations
    from django.core import management

    def load_test_data(apps, schema_editor):
        management.call_command('loaddata', 'materials/test_materials, verbosity=2, database='mk_web_core')

    class Migration(migrations.Migration):
        dependencies = [('accounts', '0001_initial'),
                        ('materials', '0001_initial'),
                       ]
        operations = [
            migrations.RunPython(load_test_data),
        ]

Image Upload with Thumbnailing and S3 Storage

Django has great API documentation - as do most of the libraries and apps in the ecosystem. But I have been having a hard time finding examples that put all the pieces together. So as an aid to myself - and anyone else who is having trouble stringing image upload, thumbnail creation and S3 storage together, I put together a minimal project that supports uploading a user avatar in a Django 1.8 project. (Sorry, there are no unit tests, but the tests in the django-cleanup repository might be useful examples.)

The example is here: https://github.com/cnk/easy_thumbnails_example

ERB in IRB

Someday I may expand this to a longer post on Ruby debugging, but until then I am writing it down so I don’t have to search for it on Stack Overflow again.

If you need to debug an erb snippet in irb, this function gives you a handy shortcut for combining the template and arbitrary instance variables. Copy it into your irb session:

    require 'erb'
    require 'ostruct'

    def erb(template, vars)
      ERB.new(template).result(OpenStruct.new(vars).instance_eval { binding })
    end

And then you can use it as follows:

    erb("Hey, <%= first_name %> <%= last_name %>", :first_name => "James", :last_name => "Moriarty")
     => "Hey, James Moriarty"

Kind of handy especially if you need to test out some ruby inside the <%= %> tag.

Adding Library Code to Your Chef Runs

Using Mixins Inside of Chef Recipes

There are a couple of wrinkles about mixing ruby modules into your chef code. The first appears to be that the chef DSL takes over everything (or nearly everything) - including the include command. So using normal ruby like include MyModule inside a recipe file causes a compile time error:

    ..... omitted ...
    resolving cookbooks for run list: ["testcookbook::default"]
    Synchronizing Cookbooks:
      - testcookbook
      Compiling Cookbooks...
      [2015-07-09T13:53:52-07:00] WARN: this should log node info bar from 'Chef::Log'

    ================================================================================
    Recipe Compile Error in .../testcookbook/recipes/default.rb
    ================================================================================

    NoMethodError
    -------------
    No resource or method named `include' for `Chef::Recipe "default"'

That seems odd - but it is probaby a good thing since if it worked, it might end up including your library code into the wrong place. With nearly all libraries, you want your module code available in some specific execution scope - which is probably not the compile time scope. To use our UsefulMethods module in our recipe context, we need the following in our recipe file:

    ::Chef::Recipe.send(:include, UsefulMethods)

This blog post on the Chef.io site does a really nice job of explaining how (and why) to write libraries and then use them in your recipes. In their example code, the library code needs to be used inside the user resource: Chef::Resource::User.

Creating Modules Inside a Namespace

The second example in the custom libraries section of Customizing Ruby shows another option for how to get your library code exactly where you want it. Instead of defining a generic module and then including it in your recipe, as above, you can set up your library within the namespace in which you want to use it. In the case of our UsefulModules code, we rewrite the library as a class inside the Chef::Recipe namespace:

    class Chef::Recipe::StopFile
        def self.stop_file_exists?
            ::File.exists?("/tmp/stop_chef")
        end
    end

And then in our recipe file we don’t have to send any message to include the new class. Because it was created inside the Chef::Recipe namespace, it gets loaded into our recipe context when the library file is loaded at the beginning of the chef run. We can just call the class method like so:

    if StopFile.stop_file_exists?
       ....

Logging in Chef

There are a couple of different techniques for logging during a chef client run. The simplest option for debugging things in any programming language is by adding print statements - or in the case of Ruby, puts statements (print with a newline added). However, in order for print statements to work, they need to be executed in a context where stdout is available AND where you, the user, can see stdout. When running chef manually (either using chef-client or via test kitchen’s ‘kitchen converge’ command), you are watching output go by on the console. So you can do things like:

    puts "This is normal Ruby code inside a recipe file."

And in a client run, you will see that output - in the compile phase.

    $ chef-client --once --why-run --local-mode \
                  --config /Users/cnk/Code/sandbox/customizing_chef/part3_examples/solo.rb
                  --override-runlist testcookbook::default

    Starting Chef Client, version 12.3.0
    [2015-07-09T16:25:06-07:00] WARN: Run List override has been provided.
    [2015-07-09T16:25:06-07:00] WARN: Original Run List: []
    [2015-07-09T16:25:06-07:00] WARN: Overridden Run List: [recipe[testcookbook::default]]
    resolving cookbooks for run list: ["testcookbook::default"]
    Synchronizing Cookbooks:
      - testcookbook
      Compiling Cookbooks...
      This is normal Ruby code inside a recipe file.  ########### this is the message ##########
      Converging 0 resources

    Running handlers:
      Running handlers complete
      Chef Client finished, 0/0 resources would have been updated

You can get nearly the same functionality - but with a timestamp and some terminal coloring, if you use Chef::Log in the same context:

    puts "This is a puts from the top of the default recipe; node info: #{node['foo']}"
    Chef::Log.warn("You can log node info #{node['foo']} from a recipe using 'Chef::Log'")

Gives:

     $ chef-client --once --why-run --local-mode \
                   --config /Users/cnk/Code/sandbox/customizing_chef/part3_examples/solo.rb \
                   --override-runlist testcookbook::default

     Starting Chef Client, version 12.3.0
     [2015-07-09T16:33:44-07:00] WARN: Run List override has been provided.
     [2015-07-09T16:33:44-07:00] WARN: Original Run List: []
     [2015-07-09T16:33:44-07:00] WARN: Overridden Run List: [recipe[testcookbook::default]]
     resolving cookbooks for run list: ["testcookbook::default"]
     Synchronizing Cookbooks:
       - testcookbook
       Compiling Cookbooks...
       This is a puts from the top of the default recipe; node info: bar
       [2015-07-09T16:33:44-07:00] WARN: You can log node info bar from a recipe using 'Chef::Log'
       Converging 0 resources
    Running handlers:
      Running handlers complete
      Chef Client finished, 0/0 resources would have been updated

NB the default log level for chef-client writing messages to the terminal is warn or higher. So if you try to use Chef::Log.debug('something') you won’t see your message unless you have turned up the verbosity. This unexpected feature, caused me a bit of grief initially as I couldn’t find my log messages anywhere. Now what I do is use Chef::Log.warn while debugging locally and then plan to take the messages out before I commit the code.

From my experiments, just about anywhere you might use puts, you can use Chef::Log. I think the later is probably better because it will probably put information into actual log files in contexts like test kitchen that write log files for examining later.

If you need something logged at converge time instead of compile time, you have 2 options, use the log resource, or wrap Chef::Log inside a ruby_block call. In either case, during the compile phase, a new resource gets created and added to the resouce collection. Then during the converge phase, that resource gets executed. Creating a Chef::Log statement inside a ruby_block probably isn’t too useful on its own, though it may be useful if you have created a ruby_block for some other reason. This gist has some example code and the output: https://gist.github.com/cnk/e5fa8cafea8c2953cf91