[Summary: In this blog post, I argue that, the three fundamental ways to improve data dissemination, using digital platforms in National Statistics Offices (NSOs) could be; instilling comprehensive data curation processes, automated data transfers from aggregated tables to the online statistical data dissemination platforms; and their seamless integration with Open Data portals.]
Dissemination is considered as the last leg of the entire process of statistical work in a typical National Statistics Office (NSO). Over the years, at the National Institute of Statistics of Rwanda (NISR), I’ve witnessed the dissemination process evolve with time, undergoing significant changes and yet I can’t help but think of challenges remaining.
The paradigm shift, had started long before though, I think, marked by the beginning of – data stored on the digital media – being made available to the data seekers. Before that, the medium was only paper-based publications. In other words, the big change in dissemination was the shift from paper-based publications to the digital media underpinned by the prospects of new contents (e.g. microdata) and then, in online data delivery mechanisms (Web-based) primarily shaped by the Internet, hitherto not possible.
In fact, I believe that, it was only after statistical data started getting accessed on the Web, did the realization dawn in NSOs on the enormity of the impact of this change. [Before that, the takeaway CD-ROMs or flash memory sticks (or other digital storage media) or even desktop versions of data dissemination software applications, were in use – digital, but still continuing some of the old pain points deeply associated with physical distribution (such as, logistical complexities and/or higher costs) in addition to, or in lieu of, distributing paper-based publications.]
At NISR, providing data access, on the Web, started with making available online, the PDF copies of paper-based publications (A format, that often receives criticism from the open data proponents in the context of constrained reuse of data contained in these). However, the game changing functionalities offered by the mix of – digital media and the Web – was appropriately realized only after the introduction of online statistical data dissemination platforms (such as DevInfo, Prognoz, NADA, Redatam (IMIS) and others), offering machine readable data through databases online.
The platforms brought many accolades, but soon, challenges started to emerge.
In case of ‘indicators’ type data (or aggregated data, as opposed to microdata) made available through the online platforms (localized adaptations) – DevInfo and Prognoz, the immediate challenge encountered was to maintain the harmony between online data vis-à-vis the data contained in the paper-based publications!
Because when data is entered manually (or in some cases, semi-manually), voluminous data is difficult to enter, these online platforms, having this limitation, consequently, failed to have in them, all data that is produced – giving rise to selective data entry practice (based on the expressed or perceived needs of data seekers). The manual entry brought in sporadic human errors too. Furthermore, the staggered data entry, adversely affected the timeliness of data updates online. Also, the simultaneous presence of two distinct platforms meant that the data entry had to take place twice for both the platforms – multiplying the human efforts required to keep them in sync with each other.
Thankfully though, in parallel with these online platforms, the PDF copies of paper-based publications containing comprehensive data, were also available online serving the data seekers, albeit rather inconveniently.
Developed over a long period of time, the traditional dissemination regime in NSOs, underpinned by paper-based publications, has had a robust mechanism in place to ensure required due diligence, enabling many eyes to view the results before final publication. In other words, these reports go through draft stages where errors are flagged and corrected before release and wider dissemination. However, with the new set-up of IT tools for data dissemination, the old processes are struggling to keep up with the challenges posed by the new environment in ensuring the required quality assurance and in building trust in data made available through the online platforms.
These processes are required to be updated to reflect the changes in the scenario.
The situation, in fact, offers an opportunity to analyse the entire data management process, appropriate changes in which, hold the potential to fully utilise the functionalities offered by the modern tools and technologies in efficiently and effectively meeting the data dissemination obligation of NSOs.
In this context, I would argue that the introduction of data curation practice – as a distinct function – is an imminent step for the NSOs which would seek to manage data through its “lifecycle of interest and usefulness” – maintaining the quality, adding value and enabling its re-use through discovery and retrieval – online.
Use of digital means (with application of software such as SPSS, Stata and R etc.) has had been long around for ‘data analysis’ – [ultimately resulting in statistical tables or indicators contained therein (as ‘born digital’ data)] – but with ‘data dissemination’ also becoming digital by adopting distribution channels such as DevInfo and Prognoz, these two processes can and should be now integrated digitally.
The image below describes the current flow of the process (As-is), showing from where aggregated data or indicators (contained in statistical tables) are populated in the online statistical data dissemination platforms (DevInfo and Prognoz). It also illustrates the alternative way (To-be) of directly feeding data to the online platforms from the same ‘source’ as the printed reports.
In the current flow (As-is), for the indicators contained in statistical tables to appear in online databases, it requires manually entering data from the printed reports into DevInfo and Prognoz, as also illustrated below by ‘manual data entry’ between stage 1 and stage 2.
Therefore, an automated system of subsuming indicators directly from the statistical tables may not only reduce the time taken for comprehensively populating the online statistical data dissemination platforms, but will also eliminate (or minimize) the errors currently caused by the manual data entry.
There is another leg in the dissemination chain which demands a mention here – the government sponsored Open Data portals. Many indicators produced by the NSOs are typically the candidates also for the Open Data portals. However, there also remains a gap as illustrated in the image above denoted by ‘manual data entry’ between stage 2 and stage 3.
The data transfer from online statistical data dissemination platforms to Open Data platforms is not currently seamless in most instances. This is another leapfrogging opportunity, with the potential of addressing two key issues deeply associated with some Open Data portals – lack of regular data updates (e.g. the case of Kenya’s Open Data portal) and missing metadata.
Complications remains though, such as, the difference in the ways, indicators are organised in multidimensional structure (aka ‘data cube’) defined by a set of dimensions and observation values in these different online platforms (DevInfo/Prognoz), and that leads to varying degrees of complexities with which, interactions with these tools are performed. In contrast, a typical Open Data portal (in most cases, developed using either of these tools: CKAN, DKAN, OGPL, Junar and Socarata etc.) houses only a ‘slice’ of the ‘data cube’, requiring independent and ad-hock updates.
Bridging the data transfer gap between these two sets of tools is going to be challenging but hugely rewarding for sure!
As we prepare to meet the challenges of ‘data revolution’ in post-2015 development debate, more than just means, these ‘solutions’, I think, could be the development imperatives in their own rights under the broader theme of improving the availability of and access to data and statistics (See the ‘data, monitoring and accountability’ section of the Goal 17 in the proposal of the Open Working Group for Sustainable Development Goals).
NSOs with comprehensive data curation processes, online data dissemination platforms obtaining data ‘directly’ from aggregated tables and which also provide seamless data access through ‘Open Data’ interfaces, will be a good fit in improving statistical data dissemination in the digital realm.