Liberating Data: Challenges and Opportunities

An array of presenters at the Open Data Camp discussed the challenges and opportunities of open data and the compromises required to balance private and public interest in the coming data deluge.

Open data and data liberation were the themes for a day-long camp held at the Indian School of Business (ISB) on 22 June 2013. Organised by the Asia Analytics Lab @ ISB to bring academia and practitioners together, the event was dedicated to all aspects of “open” data – access, affordability, security, sanitization, visualization and impact.

According to the event organisers, open data holds the potential to “create more accountability and transparency to improve policies and implementation of projects across industry, citizen sector organisations, and governance,” by putting it in the hands not only of “technologists and researchers” but also “journalists, government workers, designers, and NGOs”. In sum, open data is the movement for putting and deploying data in the public domain, maximizing its benefits for all.

An alternative to this ideal is the notion of “data liberation”. Enterprises, organizations and individuals who compile databases are often reluctant to “open” them up or release them into the public domain due to reasons of confidentiality or business interest. This is where online data visualisation tools, such as those deployed at the ISB portal http://liberation.isb.edu , can help address the dilemmas of information sharing, making available the fruits of data to the public even while keeping it in the private domain.

A varied set of speakers from industry and academia, social entrepreneurs, data scientists and bureaucrats shared their experiences and knowledge from diverse domains such Google Maps, Google Trends, World Bank data, ISB Dataverse, among others.

Varieties of openness

Suren Ruhela, Director, Programme Management, Google India opened up the day with a primer on GIS data, or as he defined it: “data with location as an essential attribute”. While government repositories and data collection exercises generate such data in abundance, issues of accessibility, usability as well as affordability limit its “openness”. For instance, often data is not available in digital or non-proprietary formats and significant investment is required to process it. Against this backdrop, initiatives such as National Land Records Modernization Programme (NLRMP) are attempting to overcome these challenges.

Ruhela also offered a comprehensive case study of applications made possible through Google Maps — in infrastructure planning and management, tracking environmental effects on monuments, directing ambulances to find the nearest hospitals with their specialty, mapping nearby police stations for better jurisdiction management etc. A unique feature of the Google Maps approach to Open Data, according to Ruhela, was the use of crowd-sourcing to create the data in the first place. Google also encouraged open standards and application programming interface (APIs) to encourage broad dissemination.

Galit Shmueli, Professor, ISB, provided a cautionary note to the euphoria around the data deluge in the first of her two talks “Social Media: Promises and Challenges”. To the 3Vs of big data – volume, velocity, variety – she added a fourth V: “volatility”. Though channels such as Youtube or Google Trends have seen a veritable explosion of data, Shmueli’s case study (with Sandeep Khurana, also of the ISB) of the social media presence of the popular 2012 show Satyamev Jayate highlighted how much of a challenge integrating these data sources remains. Often even the meaning of the data across different sites diverges significantly – in the definition for instance of what comprises a “view”. Data availability is also dependent on limited disclosure practices, archive policies and technology limitations. Technologies such as APIs and web crawlers or merging software provide no more than partial fixes in the face of such fundamental incompatibilities.

Apoorva Srivastava from Amity Business School offered an orientation to cloud computing or “IT on demand”. He highlighted the technologies available for data sanitization in the face of concerns over cloud security. Ritesh Tiwari of the Indian Business School examined the uses of open data in academic research.

The innovative CGNet Swara project was the topic for Arjun Venkatraman from the Mojolab Foundation. CGNet Swara provides access to a voice-based portal, freely accessible via mobile phone that allows anyone to report and listen to stories of local interest in the most remote adivasi areas of Chhattisgarh and Andhra Pradesh. Reported stories are moderated by journalists and become available for playback online as well as over the phone. Not only was the project an inspiring example of the deployment of new technologies to unleash the benefits of civic participation, according to Venkatraman, it also provided an interesting test case of media diffusion theories such as Gartner’s Hype Cycle.

The promise of big data

The presentation by S Venkatadesan, Director, Learning Resource Centre (LRC), ISB was aimed at introducing the platform ‘DataVerse’, which allows anyone to create his/her own data source and publishing it to a larger audience. However, along the way, Venkadesan also offered remarkable nuggets about the rise of big data. He noted, for instance, that the annual rate of growth of data was 40 times the human population growth rate. Even in 2010, this data comprised 1.2 Zettabytes or the equivalent of 7.5 billion fully loaded ipads, reaching 339 miles into the sky, way past earth’s atmosphere. In this new data ecosystem, Venkadesan argued that the role of the librarian had shifted to data “curation”. He saw the LRC’s role accordingly as bridging the gap between the data provider and data users.

Another enlightening presentation by Shalmraj of the World Bank showcased how institutional commitment to open data at the Bank has unlocked a veritable treasure trove of research materials and tools for social scientists. In addition, the World Bank’s social media presence actively attempts to engage researchers in its open data initiatives, including through an open data competition.

A number of presentations focused on data analysis and visualization tools. Anirban Sinha and Amol Mundhra from Dell, India presented the business software “Polyvista”, which has streamlined the early warning system in Dell dispatch. Neependra Khare of RedHat led a tour of the Shiny visualization package in the free statistical software environment R.

Reema Verma shared tricks for digging into the world’s biggest open knowledge mine – The Wikipedia. Nimmi Rangaswamy then of Microsoft Research now with the Indian Institute of Technology, Hyderabad offered perhaps the most unusual slant on big data with an “ethnographic perspective”, connecting data to conditions of its generation.

It was however Galit Shmueli who offered the last word. Using the example of cancer data, she showed how a public cancer registry dashboard can help generate insights for doctors, public health practitioners and patients alike. Not as lofty as the open data ideal, data liberation may nonetheless be the shape of the future.

Khemchand Sakaldeepi and Deepak Agrawal of the Centre for Investment, ISB contributed to this report for ISBInsight