Using CouchDB in Rookeries – Part 2 – Setting Up a Remote CouchDB Server

Overview

In the second instalment of my series on adding CouchDB support to
Rookeries, I’ll be talking about how I provisioned CouchDB on my remote
server.

Now it sounds counter-intuitive why I would talk about creating and
populating CouchDB databases first before writing about installing
CouchDB. The reason for this backwards step, is that I already have
CouchDB installed locally. At my daytime job at
Points
we use CouchDB extensively, so I already have
CouchDB installed locally on my workstation. I have also worked with the
Operations team to provision CouchDB servers. However it is a different
story when trying to provision and configure CouchDB yourself on your
own servers. This blog post details some of the things I learned along the way.

Since the setup of Couch is a bit involved, I will divide this up over two blog
posts.

Provisioning Rookeries with Ansible

One of the stated goals of Rookeries is create a developer-friendly
blogging platform that is easier to install and setup than WordPress.
That is a tall order for a Python WSGI app, since there is some more
setup involved than just installing Apache and mod_php and unzipping
Wordpress into a folder. (Even with WordPress there is more involved
when doing a proper and maintainable setup.)

So while putting up a production ready Python WSGI app is more involved
technically, this does not mean the end-user needs to experience this.
That is where the Rookeries Ansible
role
comes into
play. I created that Ansible role to encapsulate the complexity of the
installing Rookeries. (This role uses [the nginx-uwsgi-supervisord Ansible
role which I wrote to handle the actual setup of a WSGI app on an bare-bones
Ubuntu server]
(https://bitbucket.org/dorianpula/ansible-nginx-uwsgi-supervisor).) All of the
details concerning the setup and configuration of a CouchDB server for a
Rookeries installation is included in the Rookeries Ansible role.

Installing Latest CouchDB on Ubuntu Linux

I use the latest Ubuntu LTS (14.04) for both my development and
deployment environments. Having the same environment reduces the effort for meI
to take Rookeries from development to production. However the
latest version of CouchDB for Ubuntu 14.04 is 1.5.0 and I wanted to use
the latest stable version of CouchDB. While upgrading between CouchDB
versions is straightforward, I know that I am less likely to upgrade to the
latest version of CouchDB once Rookeries stabilizes. And there is no
point on starting off with an older version of your database right from
the start of a project.

Fortunately the CouchDB devs distribute the latest stable version of
CouchDB via a convenient
PPA
. The
instructions on how to install CouchDB via the PPA is right on the
Launchpad page.

Installing via Console

# add the ppa
sudo add-apt-repository ppa:couchdb/stable -y
# update cached list of packages    
sudo aptitude update -y
# remove any existing couchdb binaries
sudo aptitude remove couchdb couchdb-bin couchdb-common -yf
# install the latest
sudo aptitude install couchdb

Provisioning via Ansible

The Rookeries Ansible role translates those instructions (minus the
removal of existing packages) to:

- name: add the couchdb ppa repository
  apt_repository: repo="ppa:couchdb/stable" state=present

- name: install couchdb
  apt: pkg=couchdb state=present
  with_items:
    - couchdb
    - couchdb-bin
    - couchdb-common

Running CouchDB

Now that we have CouchDB installed, we need to control it like we would any
other service on Linux server. Surprisingly enough when I tried to find the
packaged CouchDB service scripts (using the service command), I did not find
anything!

> sudo service --status-all
# ... A lot of entries but no couchdb ...

Turns out that CouchDB package comes with an Upstart script rather than
a traditional System V initrc script. (That itself is probably not a bad
thing.)

> sudo status couchdb
couchdb start/running, process 5311
# There it is.

Starting and stopping service through Upstart is done via the ‘start’ and
‘stop’ commands. There are also ‘reload’ and ‘restart’ commands.

> sudo restart couchdb
couchdb start/running, process 15987

Side Note About Upstart vs Services vs Systemd

Update: I found an article that explains the evolution and the current situation of Linux service management. It explains things much better than I do and in much more detail. I learn quite a bit from it.

If you follow Linux developments and news, you might have heard about the development and controversy around new init systems. I will try to explain \nthese developments briefly here since we are on the topic of service scripts.

The old System V style for service scripts (in /etc/init.d/ or\n/etc/rc.d/) is not flexible when it comes to managing dependencies and running outside of the prescribed run-levels that happen during boot and shutdown.
However there is disagreement about what would would be a better alternative. Upstart was Canonical/Ubuntu’s attempt to create a more flexible system for managing services. However Debian and many other Linux distributions have recently switched over to another such system called systemd. Part of the controversy about systemd stems from the architectural design of systemd (which seems monolithic at first glance as it tries to solve service management, logging and few other seemly unrelated system level issues).

Another part of the controversy stems from how the project lead’s handled his previous project: PulseAudio. I will admit that my first experiences with PulseAudio were pretty rocky, and I missed how well using plain old ALSA worked. However these issues have since gone away, and I can not think of any PulseAudio or any audio issues I’ve encountered in Linux recently. (Ironically Windows 7 gives me more grief with sounds issues than Linux nowadays.)

I personally don’t know enough about systemd to form an opinion. Sure I am a bit anxious to see how this all plays out. However this is a case of wait and see. In the meantime be aware that the exact semantics on how you interact with services will change in the near future.

Update #2: An interview with Lennart Poettering about systemd, its design and intentions

Provisioning with Ansible

Fortunately Ansible does not make a distinction of what the underlying
service script setup is used. The Ansible service module works with initrc,
service, Upstart and systemd services without complaint.

In the Rookeries Ansible restarting the CouchDB service becomes a single
task.

- name: stop couchdb server
  service: name=couchdb state=restarted

Next Up

In the next blog post I’ll write up about configuring and securing
CouchDB.

Using CouchDB in Rookeries – Part 1 – Creating CouchDB Test Fixtures Using Bulk Updates

Back Story

I’ve been working on adding database persistence support to Rookeries. Instead of writing down my findings and losing them somewhere, I plan on documenting my findings and thoughts in a series of blog posts.

In the case of Rookeries that means connecting to and storing all of the journal, blog and page content as CouchDB documents. Since I want to implement this properly, I intend on adding tests to make sure I can manage CouchDB documents and databases properly. Rather than writing a number of tests that mock out CouchDB, I want to use a test database along with known test data fixtures for my tests.

Python CouchDB Integration for Rookeries

When looking at different CouchDB-Python binding libraries for Rookeries, I settled on py-couchdb. Manipulating CouchDB essentially means communicating with its REST API, so it is important that a Python binding library uses the sane approach to communicate with HTTP REST API. Unfortunately the more popular CouchDB-Python library uses only Python standard library and implements its HTTP mechanism in using standard library’s unintuitive modules. In contrast py-couchdb uses requests for querying the CouchDB server, making it a much more maintainable library.

Also py-couch offers Python query views, which I very much enjoy using at work. I still need to verify how well the library’s Python query server works in practise, but I will write a future blog post about my findings. py-couchdb lacks CouchDB-Python’s mapping functionality, which behaves similar to sqlalchemy’s ORM. However I am still debating on how I want to map between CouchDB documents and Pythonic domain objects.

Creating and Deleting CouchDB

Creating and deleting a database in a CouchDB server amounts to issuing a HTTP PUT or DELETE request against the server. This REST API provides no safety net nor confirmation about deleting a database, so one needs to be careful. py-couchdb provides a nice and simple API to create or delete a database as well.

Using cURL

# Create a CouchDB database
curl -X PUT http://admin:password@localhost:5984/my_database/

# DELETE a CouchDB database
curl -X DELETE http://admin:password@localhost:5984/my_database/

Using py-couchdb

# Create a CouchDB database
server = pycouchdb.client.Server('http://admin:password@localhost:5984')
server.create('my_database')

# DELETE a CouchDB database
server.delete('my_database')

Inserting Fixture Data

Now that I can create a temporary test database, I need to populate it with some test data. Fortunately it turns out that CouchDB has a neat and fast way to insert data in bulk using its _bulk_docs API. With this API can easily come up with a number of documents that I want to input as test data.

Fixture Data Format

The format for inserting a mass of documents is:

{
  "docs": [
    {"_id": "1", "a_key": "a_value", "b_key": [1, 2, 3]},
    {"_id": "2", "a_key": "_random", "b_key": [5, 6, 7]},
    {"_id": "5", "a_key": "__etc__", "b_key": [1, 5, 5]}
 ]
}

Note that adding a _id specifies the CouchDB ID for the document.

Using cURL

# Bulk doc insert/update using the JSON data file.  One can also do this manually with a string.
curl -d @sample_data.json -X POST -H 'Content-Type: application/json' \
   http://admin:password@localhost:5984/my_database/_bulk_docs

Using py-couchdb

UPDATED: 2015-Aug-22 I was totally wrong about the format of doing bulk updates to py-couchdb. Rather than the JSON format needed for CURL, a simple list of Python dictionaries works with the save_bulk() method. I’ve updated the code example.

import io
import json

# Best practice for writing unified Python 2 and 3 compatible code is 
# to use io.open as a context manager. 
with io.open('sample_data.json') as json_file:
    my_docs = json.load(json_file)
database = server.database('my_database')
# See my update note above, about the format save_bulk expects.
database.save_bulk(my_docs['docs'])

Conclusion

And with that, I have what I need to have repeatable tests. Hopefully this will land in Rookeries in the next couple of days.

Other Resources