I am currently working on a project that has a main public web site (mk_web_core) and then a separate AI (mk_ai) application that needs access to a large percentage of the information in the public site’s database. Since the AI only makes sense in the context of the public web site, one option might be to make them a single application. However, we are planning to experiment with different versions of the AI, so it seems sensible to separate them and develop an API contract between the two halves.
My first thought was to completely separate the two - separate code bases and separate databases. However, the main application has a deeply nested hierarchical schema and the AI needs to know all the gorey details of that structure. So if we completely separate the two apps, we need to build something to keep the two views of that hierarchy in sync. We will eventually need to do that - and then build an extract, transform, and load (ETL) process for keeping the AI in sync with the main site. But for now, we are going to put that off and instead allow the AI read-only access to the information it needs from the main site.
Django has built in support for multiple database connections so
getting things set up so my AI site could read from the mk_web_core
database was pretty straightforward. The
documentation on multiple datbases
indicated that one should create a database router for each database
and then in my settings.py file give DATABASE_ROUTERS a list
containing the two routers. After setting up the database
configuration, I copied the model files from the mk_web_core project
into corresponding app locations in the mk_ai project. I did not want
the mk_ai project to make any changes to the mk_web_core schema, so I
added managed = False
to the Meta class for each model class.
Tests That Depend On The “Read-Only” Database
The original two database router configuration seemed to work but then I decided I really had to write some unit tests for the mk_ai application. The mk_web_core application already has unit tests. And since it is fairly independent - it only interacts with the AI system through a single “next recommendation” API call - it is easy to mock out the way it depends on mk_ai without compromising my confidence in the tests. However, the behavior AI application depends strongly on the data from the mk_web_core application. So to create any meaningful tests, we really need to be able to create specific data in a test version of the mk_web_core database. So all of the configuration I did to prevent the AI application from writing to the mk_web_core schema made it impossible to set up my test database. Hmmm.
So I removed the managed = False
from each model’s Meta class and
tried to figure out how to set up my routers so that I can write to
the mk_web_core database test database, but not the mk_web_core
production database. I struggled for a while and then I found this
blog post from NewCircle.
After some trial and error, this router appears to do what I need:
It is somewhat confusing that even though the tables for migrations
from the materials app are not created, the migrations from
materials are listed when you run python manage.py
showmigrations
and are recorded in the django_migrations table.
Django has support for test fixtures in its TestCase. But the fixtures are loaded and removed for each and every test. That is excessive for my needs and will make out tests very slow. I finally figured out how to load data into mk_web_core once at the beginning our tests - using migrations: