Back Story
I’ve been working on adding database persistence support to Rookeries. Instead of writing down my findings and losing them somewhere, I plan on documenting my findings and thoughts in a series of blog posts.
In the case of Rookeries that means connecting to and storing all of the journal, blog and page content as CouchDB documents. Since I want to implement this properly, I intend on adding tests to make sure I can manage CouchDB documents and databases properly. Rather than writing a number of tests that mock out CouchDB, I want to use a test database along with known test data fixtures for my tests.
Python CouchDB Integration for Rookeries
When looking at different CouchDB-Python binding libraries for Rookeries, I settled on py-couchdb. Manipulating CouchDB essentially means communicating with its REST API, so it is important that a Python binding library uses the sane approach to communicate with HTTP REST API. Unfortunately the more popular CouchDB-Python library uses only Python standard library and implements its HTTP mechanism in using standard library’s unintuitive modules. In contrast py-couchdb uses requests for querying the CouchDB server, making it a much more maintainable library.
Also py-couch offers Python query views, which I very much enjoy using at work. I still need to verify how well the library’s Python query server works in practise, but I will write a future blog post about my findings. py-couchdb lacks CouchDB-Python’s mapping functionality, which behaves similar to sqlalchemy’s ORM. However I am still debating on how I want to map between CouchDB documents and Pythonic domain objects.
Creating and Deleting CouchDB
Creating and deleting a database in a CouchDB server amounts to issuing a HTTP PUT or DELETE request against the server. This REST API provides no safety net nor confirmation about deleting a database, so one needs to be careful. py-couchdb provides a nice and simple API to create or delete a database as well.
Using cURL
# Create a CouchDB database curl -X PUT http://admin:password@localhost:5984/my_database/ # DELETE a CouchDB database curl -X DELETE http://admin:password@localhost:5984/my_database/
Using py-couchdb
# Create a CouchDB database server = pycouchdb.client.Server('http://admin:password@localhost:5984') server.create('my_database') # DELETE a CouchDB database server.delete('my_database')
Inserting Fixture Data
Now that I can create a temporary test database, I need to populate it with some test data. Fortunately it turns out that CouchDB has a neat and fast way to insert data in bulk using its _bulk_docs API. With this API can easily come up with a number of documents that I want to input as test data.
Fixture Data Format
The format for inserting a mass of documents is:
{ "docs": [ {"_id": "1", "a_key": "a_value", "b_key": [1, 2, 3]}, {"_id": "2", "a_key": "_random", "b_key": [5, 6, 7]}, {"_id": "5", "a_key": "__etc__", "b_key": [1, 5, 5]} ] }
Note that adding a _id specifies the CouchDB ID for the document.
Using cURL
# Bulk doc insert/update using the JSON data file. One can also do this manually with a string. curl -d @sample_data.json -X POST -H 'Content-Type: application/json' \ http://admin:password@localhost:5984/my_database/_bulk_docs
Using py-couchdb
UPDATED: 2015-Aug-22 I was totally wrong about the format of doing bulk updates to py-couchdb. Rather than the JSON format needed for CURL, a simple list of Python dictionaries works with the save_bulk() method. I’ve updated the code example.
import io import json # Best practice for writing unified Python 2 and 3 compatible code is # to use io.open as a context manager. with io.open('sample_data.json') as json_file: my_docs = json.load(json_file) database = server.database('my_database') # See my update note above, about the format save_bulk expects. database.save_bulk(my_docs['docs'])
Conclusion
And with that, I have what I need to have repeatable tests. Hopefully this will land in Rookeries in the next couple of days.