While cloud computing is merely a metaphor to signify the abstraction of technology, resources and locations, the possibility of your laboratory missing out on the biggest technological leap ever is real.
There has been a lot of hype over the last few years concerning the cloud. In the consulting world, it was interesting to watch how clients would respond with awe to cloud-speak (discussions centered around cloud technology). Scott Adams captured the atmosphere with a soon-to-be classic Dilbert strip that read, slightly paraphrased, “[Manager] I hired a consultant to help us evolve our products to cloud computing. [Consultant] Blah blah cloud, blah blah cloud, blah blah platform. [Manager] It’s as if you’re a technologist and a philosopher all in one!”1 The comic was funny, but it was also true. Until recently, companies were amazed by a relatively old idea of mainframe computing. Mainframe technology peaked between 1959 and 1973, carrying out critical operations for government and large corporations. This is where massive computing power was centralized to handle tasks like census statistics and ERP. This phase was followed by the advent of the personal computer, where the landscape shifted to a data-local paradigm until 1995.
From the mid-90s on, we transitioned into a web 2.0, mobile and cloud environment. With history repeating itself, we now have centralized data storage and massive computing power with the ability to access information anywhere. The “dumb terminal” access points have been replaced by 4G and Wi-Fi connected to phones, tablets and laptops. A few key aspects have been optimized for redundancy, security and speed, but under the covers it is still a mainframe-to-terminal concept.
So why, if not leveraged, will this cyclical idea of the cloud have the power to greatly disadvantage your laboratory? The answer is data. The true disruptive technology that is beginning to emerge from the cloud is data—lots of it—coined as Big Data. Big Data is the offspring of the cloud’s main advantage: collaboration. The collaborative potential in the cloud has woven together relationships like nothing the world has ever seen, expelling an hourly exhaust of terabits of data. Big Data may be defined as the analytical crunching of this massive amount of data into meaningful business productivity. The result: a true competitive edge.
Even the slightest change in business habits, from safety to customer service policies, can have enormous effects on productivity and profitability. There are many examples of impact occurring when slight changes were made in precisely the right direction, as seen here: An untested CEO took over one of the largest companies in America. His first order of business was to attack a single pattern among his employees—how they approach work safety— and soon the firm, Alcoa, became the top performer in the Dow Jones. Procter & Gamble was close to canning the blockbuster product Febreze until a small pattern emerged in the consumer data that prompted them to make a change in its marketing. Habits can be changed if we understand how they work.2 Thus for the first time, the answers that are needed to explain habits and advance productivity, product quality and profitability now exist. Big Data will drive the new era.
Data has become a torrent flowing into every area of the global economy.3 The relationship between productivity and IT investments is well established; however, exploration of the link between productivity and data is just beginning. The use of Big Data will become a key basis of competition and growth for laboratories. From the standpoint of competitiveness and the potential capture of value, laboratories need to take Big Data seriously. In most industries, established competitors and new entrants alike will leverage data-driven strategies to innovate and capture value from deep and real-time information.
In order to weather the cloud storm and leverage the truths hidden in Big Data, we should draw upon what history has taught us about IT and its relationship to productivity gains, while understanding that there are two essential preconditions. The first condition is capital deepening—in other words, IT investments gave workers better and faster tools to do their jobs. The second condition is investment in organizational change— i.e., managerial innovations that complemented the IT investments in order to drive true productivity gains. The same preconditions that explain IT’s impact in enabling historical productivity growth currently exist for Big Data.4
Let’s review a few early examples of success that Big Data has yielded, then list a few tools and some potential areas of application for your laboratory.
The Italian Medicines Agency collects and analyzes clinical data on the experience of expensive new drugs as part of a national cost-effectiveness program. The agency can impose “conditional reimbursement” status on new drugs and then reevaluate prices and market-access conditions in light of the results of its clinical data studies.
The California-based integrated managed-care consortium Kaiser Permanente connected clinical and cost data early on, thus providing the crucial data set that led to the discovery of Vioxx’s adverse drug effects and the subsequent withdrawal of the drug from the market.5
In order to handle Big Data analysis, aggregation and management, there are a growing number of tools and general technologies available. Here are a few key tools that have shaped the landscape thus far:
- Big Table — Proprietary distributed database built on the Google File System
- Business Intelligence (BI) — A type of application software designed to report, analyze and present data. BI tools are often used to read data that have been previously stored in a data warehouse or data mart. BI tools can also be used to create standard reports that are generated on a periodic basis, or to display information on real-time management dashboards like a GC or HPLC operational status grid.
- Cassandra — An open-source (free) database-management system designed to handle large amounts of data on a distributed system. Developed originally at Facebook, it is now managed as part of the Apache Software foundation.
- Cloud computing — A computing paradigm in which highly scalable computing resources, often configured as a distributed system, are provided as a service through an app or Web browser network. There are many organizations offering B2B collaborative functionality, from remote sample scheduling to global inventory management.
- Dynamo — Proprietary distributed data storage system developed by Amazon.
- Mashup — An application that uses and combines data presentation or functionality from two or more sources to create new services. These applications are often made available on the Web and frequently use data accessed through open application programming interfaces or from open data sources.
- Google File System — Proprietary distributed file system developed by Google.6
- Visualization — Technologies used for creating images, diagrams or 3D charting to highlight trends in massive amounts of data not easily identified in tabular data formats.
In addition to Big Data tools, there are key techniques that laboratories should consider to increase business and improve client satisfaction for the future:
Association rule learning — A set of techniques for discovering interesting relationships, i.e., “association rules,” among variables in large databases. These techniques consist of a variety of algorithms to generate and test possible rules. One application is market basket analysis, in which a laboratory can determine which test outcomes frequently occur together. The laboratory can use this information to help the client in predictive modeling, and also within internal marketing campaigns, to garner more business.
Natural language processing (NLP) — A set of techniques from a subspecialty of computer science (within a field historically called “artificial intelligence”) and linguistics that uses computer algorithms to analyze human (natural) language. Many NLP techniques are types of machine learning. One application of NLP is using sentiment analysis on social media to determine how prospective clients are reacting to a new product marketing campaign or function within customer service.
Optimization — A portfolio of numerical techniques used to redesign complex systems and processes to improve performance according to one or more objective measures (e.g., cost, speed or reliability). Examples of applications include improving operational processes such as scheduling, test routing and laboratory floor layout, and making strategic decisions such as product range strategy and linked client analysis. Another great example is in inventory management, where there is full transparency at the SKU level while bar code systems linked to automated replenishment processes reduce the incidents of running out of stock.
Regression — A set of statistical techniques to determine how the value of the dependent variable changes when one or more independent variables are modified. Regression is often used for forecasting or prediction. Examples of applications include forecasting test volumes based on various market and economic variables and determining what measurable manufacturing parameters most influence customer satisfaction.
Time series analysis — A set of techniques from both statistics and signal processing for analyzing sequences of data points to extract meaningful characteristics from the data.
Examples of time series analysis include the hourly value of an in-process test to chemical completion and the trending of a result component for a given condition every day. One aspect is to decompose a series into trend, seasonal and residual components, which can be useful for identifying cyclical patterns in the data. An example includes forecasting production control limits and alerting stakeholders when something is going the wrong way.
Last, simply making Big Data more easily accessible to relevant stakeholders in a timely manner can create tremendous value. In the public sector, for example, making relevant data more accessible across otherwise separated departments can sharply reduce search and processing time. In manufacturing, integrating data from R&D, engineering and manufacturing units to enable concurrent engineering can significantly cut time to market and improve quality.
In summary, data is becoming a key competitive asset, thus laboratory leaders must understand their data assets with the right tools and identify data gaps that exist. Laboratories should conduct an inventory of their own proprietary data and also systematically catalog other data to which they could potentially gain access, including publicly available data (e.g., government data, other data that are released into the public domain) and data that can be purchased from data aggregators, or other tools and techniques in a data value chain.
1. Scott Adams. Dilbert. January 7, 2011. http://dilbert.com/strips/ comic/2011-01-07
2. Charles Duhigg. 2012. The Power of Habit. Random House, Inc., New York.
3. “The digital universe decade—Are you ready?” May 2010. All are available at www.emc.com/leadership/ programs/digital-universe.htm
4. Dale Jorgenson, Mun Ho, and Kevin Stiroh. 2008. “A Retrospective Look at the U.S. Productivity Growth Resurgence.,” Journal of Economic Perspectives.
5. Merck was granted FDA approval to market Vioxx (rofecoxib) in May 1999. In the five years that elapsed before Merck withdrew Vioxx from the market, an estimated 80 million patients took the drug, making it a “blockbuster” with more than $2 billion per year in sales. Despite statistical evidence in a number of small-scale studies (analyzed later in a metastudy), it took more than five years before the cardiovascular risks of Vioxx were proven. In August 2004, a paper at an International Pharmacoepidemiology meeting in Bordeaux, France, reported the results of a study involving a large Kaiser Permanente database that compared the risk of adverse cardiovascular events for users of Vioxx against the risk for users of Pfizer’s Celebrex. The study concluded that more than 27,000 myocardial infarction (heart attack) and sudden cardiac deaths occurred between 1999 and 2003 that could have been avoided.
6. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. October 2003. “The Google File System.” 19th ACM Symposium on Operating Systems Principles. Lake George, NY. http://research.google.com/archive/gfs.html