This repository is currently unmaintained, and as a result we are unable to provide more than guidance in using it. It should be feasible to use CKAN in combination with the many available extensions to obtain the same result.
This repo provides scripts to install a copy of data.gov.uk’s website to your own server. Rebrand it and you have a fully-featured government open data portal.
NB This used to be the ‘togo’ branch, but that has been removed now - use master.
The UK Government has contributed Data.gov.uk To Go to Github to kick-start the use and development of common open data portal software, beyond the basic CKAN. UK wants to develop it in partnership with other providers of Open Data portals, through the usual Open Source / Github model of forking, pull requests, issues etc. that everyone is encouraged to contribute to.
If you question or issue installing, please refer to open Github issues before creating a new one: https://github.com/datagovuk/dgu-vagrant-puppet/issues
Here are some useful docs: data.gov.uk guidance
David Read [email protected]
Here is an overview of the install process:
data.gov.uk runs on a single machine specified as follows:
We’ve not needed to make it work on a lesser machine, but no doubt it could.
For single-user testing, you can certainly run it in less. e.g. we run it on dev VMs with 8 GB RAM.
There are two options - you can either use Vagrant to create a virtual machine, or you can use an Ubuntu machine that already exists. Either way, Puppet will be used to do basic set-up of users, install packages and CKAN itself.
NB We have had issues running this in VMWare and suggest you stick with (free) VirtualBox, using 4.3.14 or later.
NB This setup does not work with a Windows host machine (since it relies on symbolic links).
Before creating the virtual machine, clone this repo to the host machine:
git clone https://github.com/datagovuk/dgu-vagrant-puppet
cd dgu-vagrant-puppet
Use the script to clone all the CKAN source repos onto your host machine:
cd src
./git_clone_all.sh
cd ..
Using Vagrant and Puppet, launch a fully provisioned Virtual Machine as described in this repo:
vagrant up
Now a great deal should happen. Expect these key stages:
You can generally ignore these warnings if they come up:
At this point the shell text goes green and it does the “provision”. If this does not start automatically, start it manually (from the host box):
vagrant provision
The provision is:
install_puppet_dependancies.sh
) -
install git, update all Ubuntu packages, install ruby and
librarian-puppetProvisioning will take a while, and you can ignore warnings that are listed in the section of this document titled ‘Puppet warnings’. If you should suffer errors, please see the section below ‘Puppet errors’.
NB If there is an error and you want to restart the provisioning, from the host box you should do:
vagrant provision
Now you can log into the new VM (“host” machine):
vagrant ssh
The prompt will change to show your terminal is connected to the VM, you will be logged in as the vagrant user. All further steps are from this ssh session on the VM after you have changed your user to ‘co’ with:
sudo su co
Instead of using a virtual-machine it is perfectly fine alternative to use a non-virtual machine, freshly installed with Ubuntu 12.04. The Puppet scripts assume the name of the machine is ‘ckan’, so you need to login to it and rename it:
sudo hostname ckan
sudo vim /etc/hosts
# ^ add "127.0.0.1 ckan" to hosts...
Puppet will assume the home user is called ‘co’, so create it with some particular options:
sudo adduser co -u 510 --group sudo
sudo su co
All further steps are to be carried out from the ssh session under the user ‘co’ on this target machine.
You need to install some dependencies. Firstly git:
sudo apt-get install git
Now install ruby and ‘librarian-puppet’:
curl -L get.rvm.io | bash -s stable
source ~/.rvm/scripts/rvm
rvm requirements
rvm install 1.8.7
sudo gem install puppet -v 2.7.19
sudo gem install highline -v 1.6.1 # need this older version for librarian compatibility with this version of ruby
sudo gem install librarian-puppet -v 1.0.3
Clone this repo to the machine in /vagrant (to match the vagrant install):
sudo mkdir /vagrant
sudo chown co /vagrant
sudo chgrp co /vagrant
cd /vagrant
git clone https://github.com/datagovuk/dgu-vagrant-puppet
cd /vagrant/dgu-vagrant-puppet
Use the script to clone all the CKAN source repos.
ln -s /vagrant/dgu-vagrant-puppet/src /vagrant/src
ln -s /vagrant/dgu-vagrant-puppet/puppet/ /vagrant/puppet
ln -s /vagrant/dgu-vagrant-puppet/pypi /vagrant/pypi
ln -s /vagrant/src /src
cd /src
./git_clone_all.sh
Puppet is used to install and configure the main software packages (Apache, Postgres, SOLR etc) and setup linux users.
To provision an existing machine, install the puppet modules:
sudo /vagrant/puppet/install_puppet_dependancies.sh
and then execute the site manifest now at /etc/puppet/:
sudo puppet apply /vagrant/puppet/manifests/site.pp
Provisioning will take a while, and you can ignore warnings that are listed in the section of this document titled ‘Puppet warnings’. If you should suffer errors, please see the section below ‘Puppet errors’.
To automatically activate your CKAN python virtual environment on log-in, it is recommended to add this line to your .bashrc:
source ~/ckan/bin/activate && cd /src/ckan
and also add this line for the ruby to work properly:
source ~/.rvm/scripts/rvm
(This extra setup will be usefully puppetized in the future)
For the auth-theming used by the harvesters you need to install this corpus:
/home/co/ckan/bin/python -m nltk.downloader stopwords
Harvester needs a backend, and the default is Redis (installed by puppet).
You need to create the gather and fetch queues by running the consumers briefly:
sudo -u www-data /home/co/ckan/bin/paster --plugin=ckanext-harvest harvester gather_consumer --config=/var/ckan/ckan.ini
sudo -u www-data /home/co/ckan/bin/paster --plugin=ckanext-harvest harvester fetch_consumer --config=/var/ckan/ckan.ini
The queues should be left running, either in screen sessions, or preferably using supervisord.
Meanwhile you need the harvester run
cron job to run every 10 minutes:
*/10 * * * * www-data /home/co/ckan/bin/paster --plugin=ckanext-harvest harvester run --config=/var/ckan/ckan.ini
To enable the resource cache, broken link checker and 5 star checker:
Unless you’re just testing the site locally, change the
ckan.cache_url_root
setting in /var/ckan/ckan.ini to reflect the
domain where you will host your site. e.g. for data.gov.uk we have:
ckan.cache_url_root = http://data.gov.uk/data/resource_cache/
Keep these two processes running in the background, using screen or ideally supervisord:
sudo -u www-data /home/co/ckan/bin/paster --plugin=ckan celeryd run concurrency=1 --queue=priority --config=/var/ckan/ckan.ini
sudo -u www-data /home/co/ckan/bin/paster --plugin=ckan celeryd run concurrency=4 --queue=bulk --config=/var/ckan/ckan.ini
Trigger the weekly refreshes using this cron setting:
0 22 * * 5 www-data /home/co/ckan/bin/paster --plugin=ckanext-archiver archiver update --config=/var/ckan/ckan.ini
The Archiver and QA extensions are explained later on in this guide.
IMPORTANT You must activate the CKAN virtual environment when working on the VM. Eg.:
source ~/ckan/bin/activate
And make sure you run paster commands as co
user from the /src/ckan
or /vagrant/src/ckan
directory.
After running puppet, a fresh database is created for you. If you need to create it again then you can do it like this:
createdb -O dgu ckan --template template_postgis
Now you need to create the tables for the various extensions:
sudo -u www-data /home/co/ckan/bin/paster --plugin=ckanext-packagezip packagezip init --config=/var/ckan/ckan.ini
sudo -u www-data /home/co/ckan/bin/paster --plugin=ckanext-issues issues init_db --config=/var/ckan/ckan.ini
Sample data is provided to demonstrate CKAN. It comprises 5 sample datasets and is loaded like this:
sudo -u www-data /home/co/ckan/bin/paster --plugin=ckanext-dgu create-test-data --config=/var/ckan/ckan.ini
The sample data looks like this:
At data.gov.uk we transfer database by first creating a dump (using pg_dump and gzip) and transfer it to a test server or local machine for development. Here is an example transfer - adapt the commands to transfer your own database dumps from your own server.
mkdir -p /vagrant/db_backup
rsync --progress [email protected]:/var/ckan/backup/ckan.2014-09-18.pg_dump.gz /vagrant/db_backup/
Then load the dump in (ensure you are logged in as the co user):
export CKAN_DUMP_FILE=`ls /vagrant/db_backup/ -t |head -n 1` && echo $CKAN_DUMP_FILE
sudo apachectl stop
dropdb ckan
createdb -O dgu ckan --template template_postgis
pv /vagrant/db_backup/$CKAN_DUMP_FILE | funzip \
| PGPASSWORD=pass psql -h localhost -U dgu -d ckan
sudo apachectl start
sudo -u www-data /home/co/ckan/bin/paster --plugin=ckan db upgrade --config=/var/ckan/ckan.ini
sudo -u www-data /home/co/ckan/bin/paster --plugin=ckan search-index rebuild --config=/var/ckan/ckan.ini
Note: expect the pv
command to produce a number of non-fatal errors
and warnings. At the start there are several pages of errors before it
starts creating tables:
...
ERROR: must be owner of type public.geometry or type bytea
ERROR: must be owner of type public.geometry or type public.geography
ERROR: must be owner of type public.geometry or type text
ERROR: must be owner of type text or type public.geometry
SET
SET
SET
CREATE TABLE
ALTER TABLE
CREATE TABLE
ALTER TABLE
...
There are also a few more errors later on to be expected a few times:
ERROR: relation "geometry_columns" already exists
ERROR: must be owner of relation geometry_columns
ERROR: relation "spatial_ref_sys" already exists
ERROR: must be owner of relation spatial_ref_sys
For test purposes you can add a CKAN admin user. Remember to reset the password before making the site live.
sudo -u www-data /home/co/ckan/bin/paster --plugin=ckan user add admin email=admin@ckan password=pass --config=/var/ckan/ckan.ini
sudo -u www-data /home/co/ckan/bin/paster --plugin=ckan sysadmin add admin --config=/var/ckan/ckan.ini
You can test CKAN on the command-line:
curl http://localhost/data/search
And try a browser to connect to the machine. If its running in Vagrant then the address (from the Vagrantfile) will be: http://192.168.11.11/data/search
You should get CKAN HTML. It’s worth checking the logs for errors too:
less /var/log/ckan/ckan-apache.error.log
Working correctly you should see something like this:
[Fri Sep 19 13:43:49 2014] [error] 2014-09-19 13:43:49,484 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory
[Fri Sep 19 13:43:49 2014] [error] 2014-09-19 13:43:49,491 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist
[Fri Sep 19 13:43:49 2014] [error] 2014-09-19 13:43:49,502 DEBUG [ckanext.harvest.model] Harvest tables defined in memory
[Fri Sep 19 13:43:49 2014] [error] 2014-09-19 13:43:49,505 DEBUG [ckanext.harvest.model] Harvest tables already exist
[Fri Sep 19 13:43:50 2014] [error] 2014-09-19 13:43:50,025 CRITI [ckan.lib.uploader] Please specify a ckan.storage_path in your config
[Fri Sep 19 13:43:50 2014] [error] for your uploads
For Drupal you will need to complete the configuration of the LAMP stack and get a working drush installation, as explained below. For more detailed requirements, please refer to https://drupal.org/requirements .
For more details about installation of Drush, see here: https://github.com/drush-ops/drush
First get Composer:
curl -sS https://getcomposer.org/installer | php
sudo mv composer.phar /usr/local/bin/composer
Now install the latest Drush:
composer global require drush/drush
And add it to the path:
sed -i '$a\export PATH="$HOME/.composer/vendor/bin:$PATH"' $HOME/.bashrc
source $HOME/.bashrc
You can install the DGU Drupal Distribution with the following commands:
`bash sudo mkdir /var/www/drupal sudo chown co:www-data /var/www/drupal cd /src/dgu_d7/ drush make distro.make /var/www/drupal/dgu mysql -u root --execute "CREATE DATABASE dgu;" mysql -u root --execute "CREATE USER 'co'@'localhost' IDENTIFIED BY 'pass';" mysql -u root --execute "GRANT ALL PRIVILEGES ON *.* TO 'co'@'localhost';" cd /var/www/drupal/dgu drush --yes --verbose site-install dgu --db-url=mysql://co:pass@localhost/dgu --account-name=admin --account-pass=admin --site-name='something creative'
This will install Drupal, download all the required modules and
configure the system. In the site-install
command you can ignore two
errors at the end about sending e-mails, due to sendmail being missing.
E-mail functionality will need to be fixed for a production system.
After this step completes successfully, you should enable some modules:
drush --yes en dgu_app dgu_blog dgu_consultation dgu_data_set dgu_data_set_request dgu_footer dgu_forum dgu_glossary dgu_idea dgu_library dgu_linked_data dgu_location dgu_moderation dgu_notifications dgu_organogram dgu_print dgu_reply dgu_search dgu_services dgu_user ckan
You will need to configure drupal with the url of your CKAN instance. We use the following drush commands:
drush vset ckan_url 'http://data.gov.uk/api/';
drush vset ckan_apikey 'xxxxxxxxxxxxxxxxxxxxx';
You may also check and modify these settings in the admin menu: configuration->system->ckan.
Now fix permissions:
sudo chown -R co:www-data /var/www/drupal/dgu/sites/default/files
Otherwise you’ll get messages such as “The specified file temporary://fileKrLiDX could not be copied, because the destination directory is not properly configured. This may be caused by a problem with file or directory permissions. More information is available in the system log.”
Drupal uses a second SOLR core for the search. The configuration of this is to be provided soon.
Those evaluating this distribution will probably want to use the sample content, which creates some sample blog posts, apps etc. This is installed like this:
zcat /src/dgu_d7/sample/dgud7_default_db.sql.gz | mysql -u root dgu
NB This will delete all other Drupal content and users.
You can now log-in by executing ‘drush uli’ in Drupal root folder. This command generates one time login link, you can change admin password once logged in.
If you get the message “The website encountered an unexpected error. Please try again later.” please see the section below “Debugging Drupal”.
For a live deployment it is important to change the passwords from the sample ones. The passwords to change are:
Drupal accounts, particularly admin
and ‘jason’ users (if using
the sample database). Log-in as admin and edit the users here:
/admin/people
CKAN admin
account. Change it with:
sudo -u www-data /home/co/ckan/bin/paster --plugin=ckan user setpass admin --config=/var/ckan/ckan.ini
HTTP Basic Auth around Drupal services. Change the password CKAN
uses to contact the Drupal services API by editing in
/var/ckan/ckan.ini
the value for dgu.xmlrpc_password
to be a new
password:
dgu.xmlrpc_password = newpassword
And then set that same password to be the one accepted by the API using:
sudo htpasswd /var/www/api_users ckan
and reboot Apache:
sudo apachectl restart
MySQL database for both the root
and co
. Use these commands:
mysql -u root --execute "SET PASSWORD = PASSWORD('new root password');"
mysql -u -p root --execute "SET PASSWORD FOR 'co'@'localhost' = PASSWORD('new co password');"
And change password in your Drupal settings
/var/www/drupal/dgu/sites/default/settings.php
and reboot Apache:
sudo apachectl restart
Postgres database:
sudo -u postgres psql -c "ALTER USER Postgres WITH PASSWORD 'new postgres password';"
sudo -u postgres psql -c "ALTER USER co WITH PASSWORD 'new co password';"
And change password in your CKAN sqlalchemy setting in
/var/ckan/ckan.ini
:
sqlalchemy.url = postgresql://dgu:pass@localhost/ckan
and reboot Apache:
sudo apachectl restart
SSH authentication. The install provides ssh access to the data.gov.uk team, and clearly this should be changed for other organizations. Remove the irrelevant people’s lines from this file:
/home/co/.ssh/authorized_keys
Drupal needs to get data from CKAN for forms creating Data Requests and Apps (for example).
It is suggested that this data is synchronized hourly with a cron.
To install the dependencies for the syncing:
cd /var/www/drupal/dgu
drush composer-rebuild
cd /var/www/drupal/dgu/sites/default/files/composer
composer install
You need to create a sysadmin user in CKAN that Drupal can use to get the data:
sudo -u www-data /home/co/ckan/bin/paster --plugin=ckan user add frontend [email protected] password=`cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1`
sudo -u www-data /home/co/ckan/bin/paster --plugin=ckan sysadmin add frontend
Note the apikey from the output of the first command e.g.:
'apikey': u'17a4a2fa-edf9-479e-bd71-1c0620fe457d'
Now configure how Drupal contacts CKAN: Browse to:
/admin/config/system/ckan (On vagrant it is:
http://192.168.11.11/admin/config/system/ckan
) And configure the URL for CKAN (adding /api/
) and the apikey
from
the previous step. e.g.
CKAN API URL = http://192.168.11.11/api/
API key = 17a4a2fa-edf9-479e-bd71-1c0620fe457d
CKAN editor role = data publisher
CKAN admin role = data publisher
(NB: leave the revision options the same)
To (re)sync all publishers you can execute:
drush ckan_resync_publisher all
These sync commands create a lock to avoid parallel execution. If you
stop the command (ctrl+c) this lock isn’t remove it, to remove it please
append --kill
to the command:
drush ckan_resync_publisher all --kill
You can also resync a single publisher:
drush ckan_resync_publisher 041e93f9-bf4e-48ec-b779-6bda9588ef55
There is also similar command for syncing datasets:
drush ckan_resync_dataset
and for datasets and publishers in one go:
drush ckan_resync_all
(NB If you have no dataset in CKAN, then you’ll get an SQL error when syncing them.)
It is likely that you’ll want to set-up caching in front of Apache, to massively speed up common requests. This can be achieved with Varnish or Nginx in front of Apache. We suggest:
(flags|SESS[a-z0-9]+|NO_CACHE|auth_tkt|ckan|session_api_[a-z]+)
SESS[a-z0-9]+
/assets/css/datagovuk.min.css?1411377399236
, so whenever Grunt
runs, a new number is given and the cache will be bypassed because
of the new number.The Google Analytics data is shown here: http://data.gov.uk/data/site-usage To set this up, you need to:
Setup Google Analytics account & tracking - see: https://github.com/datagovuk/ckanext-ga-report/blob/master/README.md#setup-google-analytics
Add the configuration to your ckan.ini, customizing the values for the first 2 options:
googleanalytics.id = UA-1010101-1
googleanalytics.account = Account name (e.g. data.gov.uk, see top level item at https://www.google.com/analytics)
googleanalytics.token.filepath = /var/ckan/ga_auth_token.dat
ga-report.period = monthly
ga-report.bounce_url = /data/search
Create the database tables:
sudo -u www-data /home/co/ckan/bin/paster --plugin=ckanext-ga-report initdb --config=/var/ckan/ckan.ini
Enable the extension by adding it to the list of ckan.plugins
in
ckan.ini:
ckan.plugins = ... ga-report
Generate an OAUTH token using the instructions: https://github.com/datagovuk/ckanext-ga-report/blob/master/README.md#authorization The paster command is:
sudo -u www-data /home/co/ckan/bin/paster --plugin=ckanext-ga-report getauthtoken --config=/var/ckan/ckan.ini
mv token.dat /var/ckan/ga_auth_token.dat
Now you can load the GA data into CKAN. Run it the first time on the command-line to check it works:
sudo -u www-data /home/co/ckan/bin/paster --plugin=ckanext-ga-report loadanalytics latest --config=/var/ckan/ckan.ini
Then you can add it as a cron job. e.g. add it to /etc/cron.d/ckan
0 22 * * * www-data /home/co/ckan/bin/paster --plugin=ckanext-ga-report loadanalytics latest --config=/var/ckan/ckan.ini
When running CKAN paster commands, you should ensure that:
You can see that the virtual environment is activated by the presence of
the (ckan)
prefix in the prompt. e.g.:
(ckan)co@precise64:/src/ckan$
Note you do need to specify –config because although ckan now gets it from the CKAN_INI environment variable (this is due to a recently introduced change to ckan), that is not available when you sudo.
Examples:
sudo -u www-data /home/co/ckan/bin/paster search-index rebuild --config=/var/ckan/ckan.ini
sudo -u www-data /home/co/ckan/bin/paster user user_d1 --config=/var/ckan/ckan.ini
sudo -u www-data /home/co/ckan/bin/paster --plugin=ckanext-dgu create-test-data --config=/var/ckan/ckan.ini
sudo -u www-data /home/co/ckan/bin/paster --plugin=ckanext-dgu celeryd run concurrency=1 --queue=priority --config=/var/ckan/ckan.ini
You can add --help
to list commands and find out more about one. Find
full details of the CKAN paster commands is here:
http://docs.ckan.org/en/ckan-2.2/paster.html
The ckan config file is /var/ckan/ckan.ini
. If you change any options,
for them to take effect in the web interface you need to restart apache:
sudo /etc/init.d/apache2 graceful
The main CKAN log file is: /var/log/ckan/ckan.log
Errors go to: /var/log/ckan/ckan-apache.error.log
The log levels are set in /var/ckan/ckan.ini, so to get the debug
logging from ckan you can change the level in the logger_ckan
section.
i.e. change it to:
[logger_ckan]
level = DEBUG
handlers = console, file
qualname = ckan
propagate = 0
(and obviously restart apache to take effect)
The Celery queues workers (Archiver & QA) log to:
/var/log/ckan/celeryd.log
It can happened that you may see CKAN return ‘500 Internal Server Error’ and when looking at the log /var/log/ckan/ckan.log you see this error:
IOError: [Errno 13] Permission denied: '/var/log/ckan/ckan.log
This can happen when running paster commands and forgetting run them as
the www-data
user as directed. Normally the CKAN logfile is created
and written to by apache and hence is owned by user www-data
. However
when running paster commands as the co user it will also write to the
log, and if the log happens to roll-over at this time then the co user
will now own the logfile. To rectify this, change the ownership:
sudo chown www-data:www-data /var/log/ckan/ckan.log
The fix for this issue is in the pipeline.
Data.gov.uk uses Grunt to do pre-processing of Javascript and CSS scripts as well as images and it writes timestamps to help with cache versioning.
Puppet will have installed a recent version of NodeJS (0.10.32+) and npm (1.4.28+) plus Grunt. There are two repos with assets which if you change you need to run Grunt before they will be used by CKAN.
Grunt runs on puppet provision, and you can manually run it like this:
cd /vagrant/src/ckanext-dgu
grunt
cd /vagrant/src/shared_dguk_assets
grunt
There is more about Grunt use here: https://github.com/datagovuk/shared_dguk_assets/blob/master/README.md P
The reports at /data/report should be pre-generated nightly using a cron. e.g.:
0 6 * * * www-data /home/co/ckan/bin/paster --plugin=ckanext-report report generate --config=/var/ckan/ckan.ini
For harvesting to work you need a cron running every few minutes to put the latest jobs onto the gather queue:
*/10 * * * * www-data /home/co/ckan/bin/paster --plugin=ckanext-harvest harvester run --config=/var/ckan/ckan.ini
The ‘Archiver’ extension downloads all the data files and notes if the link is ‘broken’ or not. The ‘QA’ extension examines the downloaded data files, mainly to determine the format, and give the dataset a rating against the 5 Stars of Openness (“Openness Score”).
The ‘Archiver’ is triggered when a dataset is created or modified, and that in turn triggers the ‘QA’. In addition, to links going rotten at a later date, it is sensible to trigger the Archival (and thus QA) on a weekly basis using a cron job.
Archiver and QA work asynchronously from the rest of CKAN. Jobs for them are put onto a celery queue, and by ‘running’ the queue the Archiver and QA carry out their jobs. So for the Archiver and QA to work, you need to have two Celery processes running all the time, either in a screen session or preferably using supervisord.
The list of jobs in the queue are stored in Redis (previously the jobs
were stored in the kombu_message
table in the database - if this is
still being used you need to add the [app:celery]
section to your ckan
config - see ckan.ini.erb
).
In fact there are two queues for the jobs - ‘priority’ deals with the trickle of new and updated datasets and ‘bulk’ deals with the weekly refresh and other longer updates.
To see how many jobs are on a queue:
redis-cli -n 1 LLEN priority
redis-cli -n 1 LLEN bulk
To clean a queue (delete all of its the queued jobs):
redis-cli -n 1 DEL priority
redis-cli -n 1 DEL bulk
To schedule a dataset to be archived (and then QA’d):
sudo -u www-data /home/co/ckan/bin/paster --plugin=ckanext-archiver archiver update cabinet-office-energy-use --config=$CKAN_INI
or to archive all of a publisher’s datasets (goes onto bulk queue):
sudo -u www-data /home/co/ckan/bin/paster --plugin=ckanext-archiver archiver update cabinet-office --config=$CKAN_INI
You can follow the logs of the Archiver & QA in
/var/log/ckan/celeryd.log
.
The gov_daily.py script performs a number of nightly jobs including creating backups and getting the Site Analytics Google Analytics info. Read through and see if you need it in all or part. You can specify a parameter to just do the backup for example. It could be scheduled in the cron:
0 23 * * * root /home/co/ckan/bin/python /vagrant/src/ckanext-dgu/ckanext/dgu/bin/gov_daily.py backup /var/ckan/ckan.ini
When developing CKAN it is often helpful to use the pdb debugging tool. For this to work, you need to run CKAN in paster (instead of apache).
Run CKAN in paster:
stty echo; sudo -u www-data /home/co/ckan/bin/paster serve /var/ckan/ckan.ini --reload
In the code insert your pdb breakpoint (e.g. in the data controller):
import pdb; pdb.set_trace()
In your browser access the site via port 5000 (e.g. for vagrant):
http://192.168.11.11:5000/data/search
Occasionally when working with pdb you will find it goes into a mode where nothing you type appears on the screen. The solution without having to start a new terminal is to type on the command-line (blind):
stty echo
You can get a python shell which has the database loaded:
sudo -u www-data /home/co/ckan/bin/paster --plugin=pylons shell /var/ckan/ckan.ini
The core ckan tests can be run, but need to use the core ckan solr schema, for which you need to set-up a new solr core.
sed 's/8983\/solr/8983\/solr\/ckan-2.2/g' test-core.ini > test-core-dread.ini
TBC
To find out what the error is behind this web error page, as long as it is not a public machine you can increase the debug level using this command:
cd /var/www/drupal/dgu
drush vset -y error_level 2
and request the page again.
These messages may be seen during provisioning with Puppet, and are harmless:
warning: Could not retrieve fact fqdn
stdin: is not a tty
dpkg-preconfigure: unable to re-open stdin: No such file or directory
warning: Scope(Class[Python]): Could not look up qualified variable '::python::install::valid_versions'; class ::python::install has not been evaluated at /etc/puppet/modules/python/manifests/init.pp:73
warning: Scope(Class[Python]): Could not look up qualified variable '::python::install::valid_versions'; class ::python::install has not been evaluated at /etc/puppet/modules/python/manifests/init.pp:73
The directory '/home/vagrant/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
duplicated key at line 165 ignored: :queue_type
==> default: /home/co/ckan/local/lib/python2.7/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:318: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#snimissingwarning.
==> default: SNIMissingWarning
==> default: /home/co/ckan/local/lib/python2.7/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:122: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
==> default: InsecurePlatformWarning
Despite aiming to keep these scripts working without error, ‘Puppet apply’ might possibly fail.
If ‘puppet apply’ fails (e.g. during ‘provision’) then you see it end with this red text:
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.
At this point you will usually see lots of yellow warnings “Skipping because of failed dependencies” peppered amongst the blue lines. The art of finding out the cause of the failure is to scroll up to find the first of these yellow warnings and look for the error in the line or two above this.
It is always worth trying running puppet again (either with
vagrant provision
or puppet apply - see below) in case it was a
one-off problem.
Depending on the order which puppet installs the python packages, you may well get an error to do with installing Pylons, PasteScript and PasteDeploy. e.g.:
err: /Stage[main]/Dgu_ckan/Dgu_ckan::Pip_package[Pylons==0.9.7]/Exec[pip_install_Pylons==0.9.7]/returns: change from notrun to 0 failed: /home/co/ckan/bin/pip install --no-index --find-links=file:///vagrant/pypi --log-file /home/co/ckan/pip.log Pylons==0.9.7 returned 1 instead of one of [0] at /etc/puppet/modules/dgu_ckan/manifests/pip_package.pp:23
It is a known problem and can usually be solved if you simple rerun the ‘puppet apply’ / ‘vagrant provision’ step. You can also solve it manually on the box:
/home/co/ckan/bin/pip install --no-index --find-links=file:///vagrant/pypi PasteScript==1.7.5
/home/co/ckan/bin/pip install --no-index --find-links=file:///vagrant/pypi Pylons==0.9.7
We’ve seen an issue where SOLR doesn’t work properly the first time and when puppet tries to run ‘paster db init’ style commands you see this error:
WARNI [ckan.lib.search] Problems were found while connecting to the SOLR server
ERROR [ckan.lib.search.common] HTTP code=503, reason=Service Unavailable
This can usually be fixed by restarting SOLR, via its java environment ‘jetty’:
sudo service jetty restart
and check whether the start-up log:
less /usr/share/solr/solr-4.3.1/example/logs/solr.log
is full of errors or succeeds with something like:
Started [email protected]:8983
When tinkering with the Puppet configuration and rerunning it, it can be
frustrating the the vagrant provision
takes several minutes to run.
Much of the time there is no need to have librarian check the puppet
module dependencies, and in this case there is a short cut.
You can manually install an updated Puppet CKAN module like this (on the guest):
sudo -u vagrant rsync -r /vagrant/puppet/modules/dgu_ckan/ /etc/puppet/modules/dgu_ckan/
And run ‘puppet apply’ as the vagrant user like this:
sudo FACTER_fqdn=ckan.home puppet apply --modulepath=/etc/puppet/modules /vagrant/puppet/manifests/site.pp