Not So Open Data

April 2017 - Information and its Discontents — By on March 31, 2017 12:24 PM
Print Friendly

Hong Kong’s smart city goals thwarted by lack of access to open , user-friendly data

By Rivers Zhang, Brianna To

Anyone who has tried to work out how to get to a new destination will realise there is no one single smartphone application in Hong Kong that will tell you what public transport to take, how long it will take you, and how much it will cost. Despite all the talk about developing Hong Kong into a smart city, the lack of easily accessible and usable open data remains an impediment to that goal.

Charles Mok, the legislative councillor representing the IT functional constituency, says the government should be leading the way when it comes to promoting and providing open data. “Usually the first step [to open up data] should be done by the government, because they possess the largest amount of data,” says Mok.

Charles Mok

Legislator for the IT sector, Charles Mok

Mok says that the government has the most data because it is also the major service provider and is responsible for oversight of data sources such as the Hong Kong Observatory and various public transport franchises and operations.

Four years ago, the government took the initiative to open up data sets through the platform Data One, which has now been renamed data.gov.hk. But Mok says the data sets on the site are neither sufficient nor user-friendly.

According to the Open Data Handbook founded by Open Knowledge International, open data should be free to use, re-use and redistributed by anyone. In addition, the format of data sets is important. The most significant feature of open data is that the data set be “machine readable”.

Mart Van de Ven is a co-founder of Open Data Hong Kong, a group of volunteers who advocate for greater availability and quality of open data. Van de Ven explains that in order to be machine-readable, data should be in a format which can be understood and processed by a computer. Whereas, if for example, data is presented in PDF format, it would have to first be extracted before it can be used. This is costly and time-consuming.

Van de Ven says the government’s annual budget is an example of a document which is in PDF format and is not machine readable. “You can imagine a government budget, if the table has many different columns and many different rows, to get the right level in the right row becomes very problematic,” he says.

As the conversion process is complex and errors are inevitably made during the process, Van de Ven adds, “you will lose confidence about [whether] the data you use is correct”.

Van de Ven says some of the preferred formats for data are  XML, CSV, Excel and Application Programming Interface (API). These formats can be used without conversion in different programming systems. On the other hand, PDFs and even pictures produced by scanning hard copies present problems to those who want to use the data because they require conversion.

He also emphasises the importance of obtaining raw data. Raw data refers to data that has not been converted or processed after it is collected. Processed data can refer to data that has undergone calculations like summarising and average- taking. Taking housing prices in Hong Kong as an example, Van de Ven says web and software developers can often only get processed data sets. That is, they usually get summaries of housing prices in an area that can be as big as Kowloon, the New Territories and Hong Kong Island.

“Ideally, what we want to have is, this house sold for this much on this date and this address, because then you have thousands of data points, and you can make your model, and you can do your own analysis.”

Share
Tags: , , , , ,

Comments are closed.