Despite good intentions, there are times when data sharing appears to be an added task in many organisations. The issue gets complicated when the data in question is statistical in nature. Statistical data is inherently complex. It is highly structured and comes with a related set of metadata. Sharing such data requires an agreement between parties involved, which essentially remains about following a pre-defined standard for data exchange, attempting primarily to minimise possible confusion. It is complicated!
Statistics offices know this.
So, in late 2011, when Rwanda was selected to be one of the project countries to share Millennium Development Goals (MDGs) indicators data with the United Nations Statistics Division (UNSD), the National Institute of Statistics of Rwanda (NISR) was pleased to note that, the exchange would adhere to – SDMX (Statistical Data and Metadata eXchange) – a standard for the exchange and processing of statistical data and metadata.
One of the key objectives of this UNSD-DfID supported project was to highlight the discrepancies and possible reasons thereof between data reported by countries and internal estimates by the international agencies for the MDG indicators. For example, the indicator: ‘People living with HIV’, as shown below, the country (Rwanda) reported value is 1% whereas the figure from the international agency is 3% in 2010. The explanation (“Why is there a difference?”) is provided in the following image. [By the way, these images are from the CountryData web portal – part of the UNdata platform – developed for this project. The pages showing Rwanda specific data and metadata can be accessed here and here].
Such analysis of discrepancies required participating countries in the project to provide data and metadata regularly.
I learned early in the project that is adhering to a standard for data sharing was one thing; automating that sharing was another. Appreciatively, the recommended information architecture was perfectly suited to address the capacity situation at the NISR, especially the need to meet the additional reporting obligations. Part of the information architecture, an SDMX registry, providing a unique space on the Internet, where anyone interested and equipped can automatically discover data and metadata that the NISR would publish, became the cornerstone of the project.
Thanks to Abdulla Gozalov from the UNSD, we quickly set up the SDMX registry. Though we experimented with two platforms, Fusion Registry from Metadata Technology and DevInfo SDMX Registry, which come integrated with the DevInfo database and which improved a lot during the project period, the final choice of the latter was driven mainly due to extensive use of the DevInfo database within the NISR.
In fact, since 2009, the NISR has been using the ‘DevInfo Rwanda’ database – an adaptation of the DevInfo database – to disseminate the MDG indicators data. Supporting the organisation, storage and dissemination of data structured by indicators, time periods and geographic areas and containing extensive metadata, this database was the perfect fit for the project. The information architecture finally looked like this.
Standardising and (hence) facilitating a clear understanding of shared data, an artefact of SDMX, the Data Structure Definition (DSD) is worth noting. It is a logical description of a collection of data, classified according to several properties of interest (dimensions). Within this project, data exchange was governed by the ‘CountryData’ DSD explicitly developed for the project by the UNSD, based on the MDG DSD developed by the Interagency and Expert Group for MDG Indicators.
This and the fact that, in the information architecture, the DevInfo database was an integral part necessitated the development of a mapping tool for mapping the DevInfo database structures to the CountryData DSD. This tool greatly lowered the barrier to entry to the SDMX paradigm as it also supported the reference metadata in addition to data through user-friendly interfaces.
In the end, we are pleased to have our data made available to a larger audience (of course, including machines!) and maintaining it is relatively easy, leaving our scarce resources free for other pressing needs.
That leaves me to wonder, with the Sustainable Development Goals (SDGs) just around the corner, if this approach could be scaled-up to be used for timely and comprehensive monitoring of the aggregate data and associated metadata of the SDGs too?