Getting data used

In late 2003, when I was looking for content and services for the telecentres (or ICT Kiosks) in Orissa-India, one of the things that kept on coming up in discussions within our team was, ensuring their usage. And that meant, among many other things, making certain that the content or services provided are relevant to the community which the telecentre is intended to serve.

Usage was the key driving force! It shaped our work around the telecentres.

And there were many aspects identified as obstacles to it (some coming from the demand side and some coming from the supply side). Our efforts were concentrated to mitigate them – as much as possible.

Today, almost a decade later, while working with official statistics, one of my concerns is to ensure its wider reach and development impact – and I’m faced with the same question – how to ensure its usage?

Recently, I read an article in The Guardian (online): about how a student used open data to beat national rail enquiries at its own game. The story is about a graduate student who created a user-friendly service of rail enquiry, based on the data made open recently by the Association of Train Operating Companies in the UK. However, what attracted my attention was a comment below this article from a gentleman in the Netherlands. He highlighted a similar situation where a student made a very user-friendly iPhone app from the data of the Dutch National Railways, but in this case, the student had to “scrape the data off their site “.

The ‘difference’ in the main story and the comment below is in the ease of re-using the data!

In the main article – the data was made available in ‘format’, which can easily be used in developing new applications; however, in the case of the comment – ‘scrapping’ happened, which is a process of culling data out from – not so machine-readable formats such as PDFs.

In both cases – data got re-used, but in one case, it was easy and, in another, difficult.

And there are many more examples – where it gets illustrated that just getting data across does not ensure its use or re-use.

Lately, in NISR, the realization that the official statistics produced were disseminated mostly in PDFs made us introspect.

Discussions ensued. In one such session, I made a presentation to the statisticians and the management encouraging the production of ‘application developer friendly’ data and also actively engaging educational institutions, civil society and the private sector in developing user-friendly information products based on attractive visualization, mashups and  APIs (not just data in tables and charts).

This and many other things led us to identify and understand clearly the types of data that NISR is producing and the optimal ways to package and make them available for their easy access and use.

The identified data types are Publications (print and in PDFs), Indicators or time-series data and Microdata.

For disseminating ‘Publications’ – we decided to use the NISR website’s inbuilt functionalities based on Open Source Content Management System – Drupal. For ‘Indicator’s or ‘time-series data’ – we decided to use specific tools such as DevInfo and Prognoz – both tools specifically designed to handle this kind of data and ensure data availability in open standards and machine-readable formats. Lastly, for ‘Microdata’ – we decided to use IMIS (a system built on Redatam+SP) and NADA (a web-based survey cataloguing system), where datasets are available in NESSTAR and other formats – which are machine-readable.

I believe that dissemination of data for public use must follow the principles of open data, where data made available is (also) in open standards and in machine-readable formats! And there’s no better place to start doing this than at the national statistics office.

Comments are closed.