Getting data used

In late 2003, when I was looking for contents and services for the telecentres (or ICT Kiosks) in Orissa-India, one of things that kept on coming up in discussions within our team was, ensuring their usage. And that meant, among many other things, making certain that the content or services provided are relevant to the community, which the telecentre is intended to serve.

Usage was the key driving force! It shaped our work around the telecentres.

And there were many aspects identified as obstacles to it (some coming from demand side and some coming from supply side). Our efforts were concentrated to mitigate them – as much as possible.

Today, almost a decade later, while working with official statistics, my one of the concerns is to ensure its wider reach and development impact – and I’m faced with the same question – how to ensure its usage?

Recently, I read an article in The Guardian (online): about how a student used open data to beat national rail enquiries at its own game. The story is about a graduate student who created a user friendly service of rail enquiry, based on the data made open recently by the Association of Train Operating Companies in the UK. However, what attracted my attention was a comment below this article from a gentleman in Netherlands. He highlighted a similar situation where a student made a very user friendly iPhone app from the data of the Dutch National Railways, but in this case, the student had to “scrape the data off their site “.

The ‘difference’ in the main story and the comment below, is in the ease of re-use of the data!

In the main article – the data was made available in ‘format’ which can easily be used in developing new applications, however, in the case of the comment – ‘scrapping’ happened. Which is a process of culling data out from – not so machine readable formats such as PDFs.

In both the cases – data got reused but in one case it was easy and in another, difficult.

And there are many more examples – where it gets illustrated, that just getting data across does not ensure its use or reuse.

Lately in NISR, the realization that the official statistics produced were disseminated mostly in PDFs, made us introspect.

Discussions ensued. In one such sessions, I made a presentation to the statisticians and the management encouraging production of ‘application developer friendly’ data and also to actively engage educational institutions, civil society and private sector in developing user friendly information products based on attractive visualization, mashups and  APIs (not just data in tables and charts).

This and many other things led us to identify and understand clearly the types of data that NISR is producing and the optimal ways to package and make them available for their easy access and use.

The identified data types are: Publications (print and in PDFs), Indicators or time series data and Microdata.

For disseminating ‘Publications’ – we decided to use the NISR website’s inbuilt functionalities based on Open Source Content Management System – Drupal. For, ‘Indicator’s or ‘time series data’ – we decided to use specifies tools such as DevInfo and Prognoz – both tools specifically designed to handle this kind of data and ensures data availability in open standards and in machine readable formats.  Lastly for ‘Microdata’ – we decided to use IMIS (a system built on Redatam+SP) and NADA (a web-based survey cataloging system) where datasets are available in NESSTAR and other formats – which are machine readable.

I believe that dissemination of data for public use must follow the principles of open data, where data made available is (also) in open standards and in machine readable formats! And there’s no better place to start doing this than at national statistics office.

Leave a Reply

Your email address will not be published. Required fields are marked *


Please enter the CAPTCHA text