The Open Knowledge Data Packager is a web app for quickly creating and publishing Tabular Data Packages. For more information, see the blog post on CKAN.org and the about page on datapackager.okfn.org.
To install Data Packager:
Get an Ubuntu 12.04, 64-bit server or virtual machine, ssh into it and follow the steps below.
Make sure Ubuntu’s package index is up to date:
sudo apt-get update
Install the Ubuntu packages that Data Packager requires:
sudo apt-get install -y nginx apache2 libapache2-mod-wsgi libpq5 postgresql solr-jetty
Download the latest version of the Data Packager package from the releases page, for example (replacing the URL with the URL of the deb file for the latest release):
(If you get
wget: command not found then do
sudo apt-get install wget
and try again.)
Install the package, for example (replacing the filename with the name of the file you downloaded):
sudo dpkg -i datapackager_X.Y.Z-I_amd64.deb
If you get this error:
Syntax error on line 1 of /etc/apache2/sites-enabled/ckan_default: Invalid command 'WSGISocketPrefix', perhaps misspelled or defined by a module not included in the server configuration Action 'configtest' failed. The Apache error log may have more information. ...fail!
It means that for some reason the Apache WSGI module was not enabled. Enable it by running these commands:
sudo a2enmod wsgi sudo service apache2 restart
Then try again.
Follow these instructions to setup Solr for CKAN: http://docs.ckan.org/en/ckan-2.2/install-from-source.html#setting-up-solr
Follow these instructions to setup PostgreSQL for CKAN:
then edit the
sqlalchemy.url setting in your
/etc/ckan/default/production.ini file and set the correct password,
database and database user.
Initialize your CKAN database:
sudo ckan db init
Follow these instructions to enable file uploads: http://docs.ckan.org/en/ckan-2.2/filestore.html#setup-file-uploads
Restart Apache and Nginx:
sudo service apache2 restart sudo service nginx restart
You’re done! Data Packager should be running on port 80 on your server.
If you’ve already installed Data Packager and now you want to upgrade to a new release, follow these steps:
Download the new release from the releases page, for example (replacing the URL with the URL of the deb file for the latest release):
Install the new release, for example (replacing the filename with the name of the file you downloaded):
sudo dpkg -i datapackager_X.Y.Z-I_amd64.deb
sudo service apache2 restart
Note: You’ll need a build machine capable of 64-bit virtualization to do this.
To build a new version of the Data Packager Ubuntu package containing the latest code:
ckan-packaging git repo, checking out the
git clone -b datapackager https://github.com/ckan/ckan-packaging.git
Change to the directory that you cloned
Update the version number in
--version argument to
fpm command). We use Semantic Versioning to
decide the version numbers.
Create and boot the virtual machine:
This will create a virtual machine and run the Vagrantfile in the
ckan-packaging directory, which in turn runs an Ansible playbook which
builds the Data Packager package.
You’ll be prompted for a build ‘iteration’. This should normally be
but if for some reason you have to publish a new version of the same package
(i.e. nothing in the code changed, but something went wrong in the packaging
so the package has to be rebuilt), then you should increase the iteration
number. When building the first package for a new version of Data Packager,
the iteration number should be reset to
When typing the iteration number, your input will not be output to the console as there is a current bug with Ansible and Vagrant.
The build process may take some time. Once it has completed, there will be a
datapackager_X.Y.Z-I_amd64.deb in the
Once you’ve followed the above process once, you can re-use the same virtual
machine to rebuild the package (using the latest code from each git repo).
Just update the version number in
ckan-packaging/package.yml, then do:
If you want to hack on Data Packager, follow these instructions to install a development version.
Data Packager is implemented as a CKAN extension (ckanext-datapackager).
To install the latest development version on Ubuntu 12.04, first install CKAN 2.2 from source and then (assuming you installed CKAN in the default location):
sudo apt-get install build-essential # Install packages needed to build ckanext-datapackager . /usr/lib/ckan/default/bin/activate # Activate your CKAN virtualenv cd /usr/lib/ckan/default/src/ckan git checkout datapackager # ckanext-datapackager currently requires this custom CKAN branch pip install -e 'git+https://github.com/ckan/ckanext-datapackager.git#egg=ckanext-datapackager' pip install -r ../ckanext-datapackager/requirements.txt
pip install -r command may take a while to run, because it installs
pandas which requires a lot of compiling.
datapackager to the
ckan.plugins setting in your CKAN config file, and
restart your web server.
You’ll need to install the dev requirements to run the tests:
pip install -r dev-requirements.txt
To run the tests:
nosetests --nologcapture --ckan --with-pylons=test.ini
Note that ckanext-datapackager’s
test.ini file assumes that the relative path from it
test-core.ini file is
../ckan/test-core.ini, i.e. that you have
CKAN and ckanext-datapackager installed next to each other in the same directory. This
would normally be the case if you’ve done development installs of CKAN and
To run the tests with test coverage reporting, first install
coverage in your
pip install coverage
Then run nosetests like this from the top-level
nosetests --nologcapture --ckan --with-pylons=test.ini --with-coverage --cover-package=ckanext.datapackager --cover-inclusive --cover-erase .
To get a nice, HTML-formatted coverage report in
cover/index.html add the