Managing the Coming Data Flood

Lab managers will face an increasing challenge in the next decade managing a rising tide of data. To realize the full benefit and value of these diverse and voluminous data requires effective data management techniques, institutional arrangements, and policies.

Written byJohn K. Borchardt
| 7 min read
Register for free to listen to this article
Listen with Speechify
0:00
7:00

To Realize the Value of Diverse and Voluminous Data Requires Effective Data Management, Institutional Arrangements and Policies

Lab managers will face an increasing challenge in the next decade—managing a rising tide of data. The amount of data generated annually is forecast to double every two years for the next decade as the cost of computing and networking declines.1 Laboratories will contribute to this data flood as the number of people and data-generating instruments connected to the Internet increases. George Gilder referred to this phenomenon as the “exaflood.” (Gilder is senior fellow at the Discovery Institute and chairman of George Gilder Fund Management, LLC.) An exabyte is equal to one billion gigabytes, or approximately 50,000 times the contents of the U.S. Library of Congress. From the laboratory perspective, some research projects routinely generate terabytes and even pentabytes of data. (A terabyte is 1 trillion bytes; a pentabyte is 1,000 terabytes.) Many other projects result in smaller, heterogeneous collections with valuable attributes. To realize the full benefit and value of these diverse and voluminous data requires effective data management techniques, institutional arrangements, and policies.

Factors promoting the exaflood

From the laboratory perspective, factors promoting the exaflood include globalization of research, open innovation and the increasing use of supercomputers. Global corporations perform research in different countries scattered across the globe. For example, Dow Chemical, the largest U.S. chemical maker, has research facilities in the U.S., China, India, Saudi Arabia and other countries. Global corporations often have shared research projects performed at facilities thousand of miles apart. Data sharing required by open-innovation projects performed at two or more organizations requires sharing data and reports.2 U.S. federal laboratories license technology to the private sector. These and other activities contribute to the increasing volume of scientific and engineering data moving over the Internet.


IBM Roadrunner Supercomputer. Photograph courtesy of Los Alamos National Laboratory

Supercomputers enable the creation and processing of large volumes of data. Supercomputing resources available on the Internet are located at http://userpages. umbc.edu/~jack/supercomputer-resources.html. The federal government’s national laboratories, including Oak Ridge National Laboratory and Sandia National Laboratory, are the home of supercomputers that can perform astonishing numbers of calculations at amazing speed. For example, Oak Ridge’s Jaguar, a Cray supercomputer, often achieves sustained performance of over a pentaflop (a quadrillion mathematical calculations per second).


National Oceanic and Atmospheric Administration (NOAA) Supercomputer. Photograph courtesy of NOAA.

To continue reading this article, sign up for FREE to
Lab Manager Logo
Membership is FREE and provides you with instant access to eNewsletters, digital publications, article archives, and more.

About the Author

  • Dr. Borchardt is a consultant and technical writer. The author of the book “Career Management for Scientists and Engineers,” he writes often on career-related subjects. View Full Profile

Related Topics

CURRENT ISSUE - October 2025

Turning Safety Principles Into Daily Practice

Move Beyond Policies to Build a Lab Culture Where Safety is Second Nature

Lab Manager October 2025 Cover Image