In many organizations, in spite of good intentions, there are times, when data sharing appears to be an added task. The issue gets complicated, when the data in question, is statistical in nature. Statistical data is inherently complex. It is highly structured and comes with related set of metadata. Sharing such data, requires an agreement between parties involved, which essentially remains about following a pre-defined standard for data exchange, attempting primarily, to minimize possible confusions. It is complicated!
Statistics offices know this.
So, in late 2011, when Rwanda was selected to be one of the project countries to share Millennium Development Goals (MDGs) indicators data with the United Nations Statistics Division (UNSD), the National Institute of Statistics of Rwanda (NISR) was pleased to note that, the exchange would adhere to – SDMX (Statistical Data and Metadata eXchange) – a standard for the exchange and processing of statistical data and metadata.
One of the key objectives of this UNSD-DfID supported project was to highlight the discrepancies and possible reasons thereof, between data reported by countries and internal estimates by the international agencies, against the MDG indicators. For example, against the indicator: ‘People living with HIV’, as shown below, the country (Rwanda) reported value is 1% whereas the figure from the international agency is 3% in 2010. The explanation (“Why is there a difference?”) is provided in the subsequent image. [By the way, these images are from the CountryData web portal – part of the UNdata platform – developed for this project. The pages showing Rwanda specific data and metadata can be accessed here and also here].
Such analysis of discrepancies required participating countries in the project, to provide data and metadata on a regular basis.
I learned quite early on in the project, that adhering to a standard for data sharing was one thing, automating that sharing, was another. Appreciatively, the recommended information architecture was perfectly suited to address the capacity situation in the NISR, specially the need to meet the additional reporting obligations. Part of the information architecture, an SDMX registry, providing a unique space on the Internet, where anyone interested and equipped, can automatically discover data and metadata that the NISR would publish, became the cornerstone of the project.
Thanks to Abdulla Gozalov from the UNSD, we quickly started with setting-up the SDMX registry. Though, we experimented with two platforms, Fusion Registry from Metadata Technology and DevInfo SDMX Registry which came integrated with the DevInfo database and which improved a lot during the project period, the final choice of the later, was driven mainly due to extensive use of the DevInfo database within the NISR.
In fact, since 2009, the NISR has been using the ‘DevInfo Rwanda’ database – an adaptation of DevInfo database – to disseminate the MDG indicators data. Supporting the organization, storage and dissemination of data structured by indicators, time periods and geographic areas, and containing extensive metadata, this database was the perfect fit for the project. The information architecture finally looked like this.
Standardizing and (hence) facilitating clear understanding of shared data, an artifact of SDMX, worth noting is the Data Structure Definition (DSD). It is a logical description of a collection of data, classified according to several properties of interest (dimensions). Within this project, data exchange was governed by the ‘CountryData’ DSD developed specifically for the project by the UNSD, based on the MDG DSD developed by the Interagency and Expert Group for MDG Indicators.
This and the fact that, in the information architecture, the DevInfo database was an integral part necessitated the development of a mapping tool for mapping the DevInfo database structures to the CountryData DSD. This tool, greatly lowered the barrier to entry to the SDMX paradigm as it supported also the reference metadata in addition to data through user-friendly interfaces.
At the end, we are pleased to have our data made available to a larger audience (of course including machines!) and maintaining it is relatively easy, leaving our scarce resources free for other pressing needs.
That just leaves me to wonder, with the Sustainable Development Goals (SDGs) just round the corner, if this approach could be scaled-up to be used for timely and comprehensive monitoring of the aggregate data and associated metadata of the SDGs too?