DDI extension for CKAN for the Worldbank.
Features:
Requirement: This extensions runs on CKAN 2.3 or higher.
Use pip
to install this plugin. This example installs it in
/home/www-data
source /home/www-data/pyenv/bin/activate
pip install -e git+https://github.com/liip/ckanext-ddi.git#egg=ckanext-ddi --src /home/www-data
cd /home/www-data/ckanext-ddi
pip install -r requirements.txt
python setup.py develop
Make sure to add ddi_schema
, ddi_theme
and nada_harvester
to
ckan.plugins
in your config file. If you don’t want to use the
frontend-part of this extension, you can omit the ddi_theme
.
To use the nada_harvester
, make sure the ckanext-harvest
extension is installed as
well.
Four options are available:
ckanext.ddi.config_file = /path/to/my/config.yml
ckanext.ddi.default_license = CC0-1.0
ckanext.ddi.allow_duplicates = True
ckanext.ddi.override_datasets = False
The config_file
is simply the path to the DDI-specific configuration
of this extension (see below). The default_license
allows a user to
configure a license that is used for all DDI imports, if the license is
not specified explicitly. The allow_duplicates
option is used to
determine, if duplicate datasets are allowed or not. Duplicates are
determined by the unique id_number
attribute (defaults to False
).
With override_datasets
you can specify, if you import a dataset that
already exists, if a new dataset should be created or if the existing
one should be overridden (defaults to False
).
The display and structure of the DDI fields can be configured individually. A separate YAML config file is used for that.
There are 3 sections:
sections
: describes different section, used to group together
fields
vocabularies
: describes availables controlled vocabularies, that
can be referencedfields
: describes all fields, their type and how they are
displayed. Only the predefined fields can be used:
id
name
title
url
author
author_email
maintainer
maintainer_email
license_id
copyright
version
version_notes
notes
tags
abbreviation
study_type
series_info
id_number
description
production_type
production_date
abstract
kind_of_data
unit_of_analysis
description_of_scope
country
geographic_coverage
time_period_covered
universe
primary_investigator
other_producers
funding
sampling_procedure
data_collection_dates
access_authority
conditions
citation_requirement
contact_persons
contact_persons_email
Example:
fields:
identification:
title:
type: text
visible: False
display: Title
url:
type: url
display_field: title
visible: True
display: Source
overview:
abstract:
type: markdown
visible: True
display: Abstract
kind_of_data:
type: vocabulary
visible: True
display: Kind of Data
contact:
contact_persons:
type: text
visible: False
display: Contact Persons
contact_persons_email:
type: email
display_field: contact_persons
visible: True
display: Contact person
internal_display: Contact Email
vocabularies:
kind_of_data:
- Sample survey data [ssd]
- Census/enumeration data [cen]
- Administrative records data [adm]
- Aggregate data [agg]
- Clinical data [cli]
- Event/transaction data [evn]
- Observation data/ratings [obs]
- Process-produced data [pro]
sections:
identification: Identification
overview: Overview
contact: Contact information
Based on this configuration the web UI is generated:
Data can be imported either via command line or using the web interface.
If you are logged in and you have the appropriate permissions, you find a new button “Import Dataset from DDI/XML” on the dataset page.
This buttons leads you to an import page, where a DDI XML can either be uploaded or specified as URL.
Instead of importing the DDI data, you can manually add datasets just like you would on any CKAN instance. The “Add Dataset” form is modified, so you can find all the fields from your DDI configuration (see above).
This plugin provides the possibility to import DDI XML files using a paster command.
source /home/www-data/pyenv/bin/activate
paster --plugin=ckanext-ddi ddi import <path_or_url> [<license>] -c <path to config file>
<path_or_url>
is a required parameter and - as the name implies -
it can can either be a local file or a publicly accessible URL.<license>
is an optional parameter to specify the license of the
dataset. Ideally this is a value from the configured license group
file.To add a harvester for a
NADA instance, you
should be logged in and visit /harvest
on your CKAN installation (e.g.
http://my.ckaninstance.org/harvest).
There you can add a new harvest source with the type “NADA harvester for
DDI”.
In the URL field, specify the base URL of your NADA instance. If the start page of your NADA instance is my.nada-instance.org/index.php/home, then please specify http://my.nada-instance.org as the URL for the harvester.
You can specify the configuration as JSON:
user
: the CKAN user to perform the harvesting (default: harvest
)license
: A default license to apply to all harvested datasets
(default: empty). If this is not specified the config value
ckanext.ddi.default_license
is used (see above).access_type
: Parameter for NADA to specify the the data access
type of the datasets, that should be harvester (default:
public_use
)Possible values for access_type
:
""
(empty string, i.e. all data access types are allowed)"direct_access"
"public_use"
(default)"licensed"
"data_enclave"
"data_external"
"no_data_available"
"open_data"
JSON example:
{"user": "harvest", "access_type": "public_use", "license": "CC-BY-SA-4.0"}
This CKAN extensions uses flake8 to ensure basic code quality.
You can add a pre-commit hook when you have installed flake8:
flake8 --install-hook
Travis CI is used to check the code for all PRs.
This module was developed with support from the World Bank to provide a solution for National Statistical Offices (NSOs) that need to publish data on CKAN platforms.