Using CouchDB in Rookeries – Part 1 – Creating CouchDB Test Fixtures Using Bulk Updates

Back Story

I’ve been working on adding database persistence support to Rookeries. Instead of writing down my findings and losing them somewhere, I plan on documenting my findings and thoughts in a series of blog posts.

In the case of Rookeries that means connecting to and storing all of the journal, blog and page content as CouchDB documents. Since I want to implement this properly, I intend on adding tests to make sure I can manage CouchDB documents and databases properly. Rather than writing a number of tests that mock out CouchDB, I want to use a test database along with known test data fixtures for my tests.

Python CouchDB Integration for Rookeries

When looking at different CouchDB-Python binding libraries for Rookeries, I settled on py-couchdb. Manipulating CouchDB essentially means communicating with its REST API, so it is important that a Python binding library uses the sane approach to communicate with HTTP REST API. Unfortunately the more popular CouchDB-Python library uses only Python standard library and implements its HTTP mechanism in using standard library’s unintuitive modules. In contrast py-couchdb uses requests for querying the CouchDB server, making it a much more maintainable library.

Also py-couch offers Python query views, which I very much enjoy using at work. I still need to verify how well the library’s Python query server works in practise, but I will write a future blog post about my findings. py-couchdb lacks CouchDB-Python’s mapping functionality, which behaves similar to sqlalchemy’s ORM. However I am still debating on how I want to map between CouchDB documents and Pythonic domain objects.

Creating and Deleting CouchDB

Creating and deleting a database in a CouchDB server amounts to issuing a HTTP PUT or DELETE request against the server. This REST API provides no safety net nor confirmation about deleting a database, so one needs to be careful. py-couchdb provides a nice and simple API to create or delete a database as well.

Using cURL

# Create a CouchDB database
curl -X PUT http://admin:password@localhost:5984/my_database/

# DELETE a CouchDB database
curl -X DELETE http://admin:password@localhost:5984/my_database/

Using py-couchdb

# Create a CouchDB database
server = pycouchdb.client.Server('http://admin:password@localhost:5984')

# DELETE a CouchDB database

Inserting Fixture Data

Now that I can create a temporary test database, I need to populate it with some test data. Fortunately it turns out that CouchDB has a neat and fast way to insert data in bulk using its _bulk_docs API. With this API can easily come up with a number of documents that I want to input as test data.

Fixture Data Format

The format for inserting a mass of documents is:

  "docs": [
    {"_id": "1", "a_key": "a_value", "b_key": [1, 2, 3]},
    {"_id": "2", "a_key": "_random", "b_key": [5, 6, 7]},
    {"_id": "5", "a_key": "__etc__", "b_key": [1, 5, 5]}

Note that adding a _id specifies the CouchDB ID for the document.

Using cURL

# Bulk doc insert/update using the JSON data file.  One can also do this manually with a string.
curl -d @sample_data.json -X POST -H 'Content-Type: application/json' \

Using py-couchdb

UPDATED: 2015-Aug-22 I was totally wrong about the format of doing bulk updates to py-couchdb. Rather than the JSON format needed for CURL, a simple list of Python dictionaries works with the save_bulk() method. I’ve updated the code example.

import io
import json

# Best practice for writing unified Python 2 and 3 compatible code is 
# to use as a context manager. 
with'sample_data.json') as json_file:
    my_docs = json.load(json_file)
database = server.database('my_database')
# See my update note above, about the format save_bulk expects.


And with that, I have what I need to have repeatable tests. Hopefully this will land in Rookeries in the next couple of days.

Other Resources