Lab Manager | Run Your Lab Like a Business

Open Data: Challenges and Opportunities

The challenges and opportunities of open data for the future of science and technology

by Bernard B. Tulsi
Register for free to listen to this article
Listen with Speechify

The growing prominence of data as a valuable and steadily increasing worldwide resource has been heralded by a number of tech leaders, including Michael Bloomberg, the champion of smart cities, and Ginni Rometty, the head of IBM and its big data powerhouse, Watson. With no sign of letting up anytime soon, the pervasive impact of data appeared ubiquitously in almost every aspect of society during the past few decades.

This is particularly evident in the case of open government data, especially during the past eight years. In its Open Data Transition Report—An Action Plan for the Next Administration, the Center for Open Data Enterprise notes that open government data, which are freely accessible for broad application, reuse, and publication represent, “a major public resource for government, citizens, the scientific research community and the private sector.”

Get training in Skills Planning and Succession Planning and earn CEUs.One of over 25 IACET-accredited courses in the Academy.
Skills Planning and Succession Planning Course

Among the myriad uses of government open data, the report highlighted their potential to “empower researchers to advance scientific discovery and drive innovation.” The nonpartisan, informational report was based on comprehensive information gathering. “We consulted with about 70 different experts working in and out of government and identified the areas where open data can be valuable and included them in the report,” says Katherine Garcia, the Center’s director of communications, and project manager for the Open Data Transition Report. Overall, the report evaluated the current open government data setup, and offered ideas on how to improve data programs and policies for the future.

Related Article: Making Data Meaningful

The report states, “The Obama administration has championed open data as an essential part of open, transparent government since the president’s first day in office. Over the past eight years, the federal government has launched a range of new programs showing that open data is more than a tool for good government—it is a critical national resource.” Noting future commitments by the White House on the use of open data to improve government and bolster global data efforts, the report prioritized a number of goals and recommendations that could be leveraged in the future.

“The idea of open data has always been important,” says Garcia. “The premise of open data did not start with the Obama administration, but they made it popular and relevant, and took advantage of the available technology to boost data access, sharing, and circulation— the timing was perfect for open data policies and the mandate to release open data.”

Garcia says that the repository of data from all federal agencies,, was initiated in 2009. The site enables the release of data in machine-readable format, and facilitates timely updates. During this early period, the Obama administration created two key new positions— chief information officer and chief technology officer— along with instituting several structural and organizational changes, according to Garcia.

In support of the open data initiative, in March 2009, the National Science and Technology Council (NSTC) released a plan from its science committee in conjunction with the White House’s Office of Science and Technology Policy (OSTP). According to Debbie Brodt-Giles, digital assets manager and energy data innovator for the National Renewable Energy Laboratory (NREL), the OSTP has been instrumental in coordinating federal entities engaged with and desirous of opening up their data. “They coordinate efforts and share best practices in the federal government, and shine a brighter light on data accessibility in general,” she says. To be sure, a number of other federal agencies have active open data programs. The National Institutes of Health (NIH), for example, has a Public Access policy covering the research it funds. The Centers for Disease Control and Prevention (CDC) also has initiatives to allow greater access to the digital scientific data generated from CDC-funded projects. Furthermore, approaches are being explored to allow more access to de-identified data from clinical studies conducted by industry and academia, especially those done with federal support.

Joel Gurin, president of the Center for Open Data Enterprise, says that the mission of the Center, which has been operational for two years now, is to maximize the value of open data as a public resource. “Our belief is that open government data are tremendously valuable not only for transparency and accountability but also as resources that can be used by citizens and the government itself. The best way to realize that value is by talking to potential users to find out what they need.”

He considers as open, all data that are widely available, usable, easily accessible, downloadable, and in applicable formats. “We focus primarily on open data from government, such as data from federally funded research, and explore ways in which experimental data can be shared openly. “With respect to digital scientific data, the general requirement for openness is that it be shared freely without stipulations and encumbrances,” says Gurin.

Related Article: ACS Publications Partners with Digital Science's Figshare to Promote Open Data Discovery and Use

“Part of the question around open data is not only access to data from research at federal agencies but also federally funded research conducted at independent institutions,” says Gurin. He explains that some federal agencies have done a great job of making their data available, “but the goal is to make that universal among federal research agencies and also to have the same principles apply to university research or other federally funded research.”

Turning to benefits from the open data approach, Gurin says, “With respect to scientific data, the gold standard would be the Human Genome Project, in which all the parties agreed to share data so that everyone benefitted.” He noted that a study by the Massachusetts Institute of Technology showed that the shared approach accelerated progress with human genome data considerably. This is also being seen with the Cancer Moonshot initiative, which was funded by Congress, as well as with satellite data, and with data sets from NASA, which are used around the world for research on climate science, for geospatial studies, for weather analysis, and for other forms of earth observation studies, he says.

Gurin continues, “Open data systems have progressed rapidly. If you go back just six or seven years, a lot of agencies were skeptical and hesitant to release data. They had concerns that their data would be misinterpreted or misused.

“That is much less the case now. There is much more general recognition that data are a real resource and that governments should make them available because of the potential benefits,” he says.

“Scientific data are trickier. Scientists still have real incentives to not share their data. The tenure system, professional recognition, and rewards are built around the publication of peer-reviewed articles, which draw on data created in labs.

“We are not where we want to be with the publication of scientific data. I think that is partly because the incentives have not yet aligned and institutional changes are needed to bring this about,” says Gurin.

Explaining key aspects of practical open data operations in a national lab, Brodt-Giles, who manages a team of about ten to 15 software developers as part of NREL’s Strategic Energy Analysis Center, says, “Back in 2009, our team was created to do open energy information (EI).” “We knew that open data was important, and were one of the first open data platforms. We were cutting edge,” she says. The Department of Energy (DOE) used the platform to satisfy the government initiative launched by President Obama, which urged everyone to be transparent and open with their data, according to Brodt-Giles.

She says that over time many other open data projects were established. “So, today you don’t need to go to open EI to access all the data out there, as the DOE has a lot of open data platforms.

“We are still trying to be the main DOE open data platform, so at least the data sets and data interfaces are catalogued in our system. In addition, we have also built some other data repositories that serve as open EIs.

“There was a need by some of the other projects, particularly the geothermal program and the marine hydrokinetic program. The DOE was providing grant money to industry for some of these projects, and needed a mechanism to collect data from them. So we created repositories for them that were geared to collect and store data and make them available to the public when feasible.”

She says that over the years her team has been able to keep sensitive data securely until they could be released to the public. “It was meant to be something that federal agencies and national labs can access—but also open to the general public.” She says that they make their data available in API format, “so that people can access machine-readable versions of the data to use with their applications.”

Turning to benefits from the open data approach, Brodt-Giles says, “Lots of our data are used by entrepreneurs, analysts, and researchers for business purposes. One of our data sets, for example, is the utility rates database. There are plenty of platforms, models, and tools out there that use them in their daily applications and business operations.

“We try to make our data as openly available as possible to facilitate the needs of businesses and entrepreneurs as well.”

She says, “Prior to being open, these data were difficult to access. Sometimes people did not even know that the data existed. We wanted to change that, because the government invested a lot of money in this research. In the past, to access this data, researchers had to actually know about the research, or know how to contact the researchers to get access to the data—and then request them. But today, we make all that data available from the get-go, for everyone to use, work with, and create.”

Related Article: Collaboration Yields Open Source Technology for Computational Science

Brodt-Giles says, “Today more people are participating in the open data environment. It has become more accepted, versus 2009 when people were nervous about getting their data out. Now, people are seeing how data sharing is more beneficial. I am hopeful that these initiatives will continue even after the presidential transition, because shared open data can provide important benefits for society.

“I think people are becoming more willing to participate in open shared data, at least at a high level. I also think that this approach helps companies to get high-level or aggregated data out there for the world to see. This might even help drive business to them if they put something out to whet the appetites of the users, who may in turn request or even be willing to pay for more robust data sets.”

“Looking out over the next decade or so,” Gurin says, “I believe there is good reason to think that open data will remain a bipartisan issue as it has been in the past with the Open Government Data Act. We are continuing to see growing bipartisan support for this initiative, which is encouraging.”