The Open Knowledge Data Packager is a web app for quickly creating and publishing Tabular Data Packages. For more information, see the blog post on CKAN.org and the about page on datapackager.okfn.org.
To install Data Packager:
Get an Ubuntu 12.04, 64-bit server or virtual machine, ssh into it and follow the steps below.
Make sure Ubuntu’s package index is up to date:
sudo apt-get update
Install the Ubuntu packages that Data Packager requires:
sudo apt-get install -y nginx apache2 libapache2-mod-wsgi libpq5 postgresql solr-jetty
Download the latest version of the Data Packager package from the releases page, for example (replacing the URL with the URL of the deb file for the latest release):
wget 'https://github.com/ckan/ckanext-datapackager/releases/download/vX.Y.Z/datapackager_X.Y.Z-I_amd64.deb'
(If you get wget: command not found
then do sudo apt-get install wget
and try again.)
Install the package, for example (replacing the filename with the name of the file you downloaded):
sudo dpkg -i datapackager_X.Y.Z-I_amd64.deb
If you get this error:
Syntax error on line 1 of /etc/apache2/sites-enabled/ckan_default:
Invalid command 'WSGISocketPrefix', perhaps misspelled or defined by a module not included in the server configuration
Action 'configtest' failed.
The Apache error log may have more information.
...fail!
It means that for some reason the Apache WSGI module was not enabled. Enable it by running these commands:
sudo a2enmod wsgi
sudo service apache2 restart
Then try again.
Follow these instructions to setup Solr for CKAN: http://docs.ckan.org/en/ckan-2.2/install-from-source.html#setting-up-solr
Follow these instructions to setup PostgreSQL for CKAN:
http://docs.ckan.org/en/ckan-2.2/install-from-source.html#postgres-setup,
then edit the sqlalchemy.url
setting in your
/etc/ckan/default/production.ini
file and set the correct password,
database and database user.
Initialize your CKAN database:
sudo ckan db init
Follow these instructions to enable file uploads: http://docs.ckan.org/en/ckan-2.2/filestore.html#setup-file-uploads
Restart Apache and Nginx:
sudo service apache2 restart
sudo service nginx restart
You’re done! Data Packager should be running on port 80 on your server.
If you’ve already installed Data Packager and now you want to upgrade to a new release, follow these steps:
Download the new release from the releases page, for example (replacing the URL with the URL of the deb file for the latest release):
wget 'https://github.com/ckan/ckanext-datapackager/releases/download/vX.Y.Z/datapackager_X.Y.Z-I_amd64.deb'
Install the new release, for example (replacing the filename with the name of the file you downloaded):
sudo dpkg -i datapackager_X.Y.Z-I_amd64.deb
Restart Apache:
sudo service apache2 restart
Note: You’ll need a build machine capable of 64-bit virtualization to do this.
To build a new version of the Data Packager Ubuntu package containing the latest code:
Install Vagrant 1.4, Virtualbox 4.3, Git and Ansible.
Clone the ckan-packaging
git repo, checking out the datapackager
branch:
git clone -b datapackager https://github.com/ckan/ckan-packaging.git
Change to the directory that you cloned ckan-packaging
into:
cd ckan-packaging
Update the version number in package.yml
(the --version
argument to
the fpm
command). We use Semantic Versioning to
decide the version numbers.
Create and boot the virtual machine:
vagrant up
This will create a virtual machine and run the Vagrantfile in the
ckan-packaging
directory, which in turn runs an Ansible playbook which
builds the Data Packager package.
You’ll be prompted for a build ‘iteration’. This should normally be 0
,
but if for some reason you have to publish a new version of the same package
(i.e. nothing in the code changed, but something went wrong in the packaging
so the package has to be rebuilt), then you should increase the iteration
number. When building the first package for a new version of Data Packager,
the iteration number should be reset to 0
again.
When typing the iteration number, your input will not be output to the console as there is a current bug with Ansible and Vagrant.
The build process may take some time. Once it has completed, there will be a
file called datapackager_X.Y.Z-I_amd64.deb
in the ckan-packaging
directory.
Once you’ve followed the above process once, you can re-use the same virtual
machine to rebuild the package (using the latest code from each git repo).
Just update the version number in ckan-packaging/package.yml
, then do:
vagrant provision
If you want to hack on Data Packager, follow these instructions to install a development version.
Data Packager is implemented as a CKAN extension (ckanext-datapackager).
To install the latest development version on Ubuntu 12.04, first install CKAN 2.2 from source and then (assuming you installed CKAN in the default location):
sudo apt-get install build-essential # Install packages needed to build ckanext-datapackager
. /usr/lib/ckan/default/bin/activate # Activate your CKAN virtualenv
cd /usr/lib/ckan/default/src/ckan
git checkout datapackager # ckanext-datapackager currently requires this custom CKAN branch
pip install -e 'git+https://github.com/ckan/ckanext-datapackager.git#egg=ckanext-datapackager'
pip install -r ../ckanext-datapackager/requirements.txt
The final pip install -r
command may take a while to run, because it installs
pandas which requires a lot of compiling.
Then add datapackager
to the ckan.plugins
setting in your CKAN config file, and
restart your web server.
You’ll need to install the dev requirements to run the tests:
pip install -r dev-requirements.txt
To run the tests:
nosetests --nologcapture --ckan --with-pylons=test.ini
Note that ckanext-datapackager’s test.ini
file assumes that the relative path from it
to CKAN’s test-core.ini
file is ../ckan/test-core.ini
, i.e. that you have
CKAN and ckanext-datapackager installed next to each other in the same directory. This
would normally be the case if you’ve done development installs of CKAN and
ckanext-datapackager.
To run the tests with test coverage reporting, first install coverage
in your
virtualenv:
pip install coverage
Then run nosetests like this from the top-level ckanext-datapackager
directory:
nosetests --nologcapture --ckan --with-pylons=test.ini --with-coverage --cover-package=ckanext.datapackager --cover-inclusive --cover-erase .
To get a nice, HTML-formatted coverage report in cover/index.html
add the
--cover-html
option.