Launching a Big Data Project

Big data is the current hot buzzword in data analysis. Laboratory managers helming such a project don’t necessarily require the expertise to directly implement a big data project, but it is advisable that they have a general understanding of the process to be able to set realistic goals and timetables.

Written byJohn Joyce, PhD
| 7 min read
Register for free to listen to this article
Listen with Speechify
0:00
7:00

Start Small and Take Your Time

Keep in mind that a big data project is not the same as a business intelligence project. While the differences are actually more complex, Eric Brown’s1 thumbnail description does illustrate the main difference in focus.

A good place to start is with an examination of what the term “big data” means. Unfortunately, this is somewhat like stepping off solid ground into quicksand. If you ask people in different organizations what they mean by big data, you are likely to get radically different answers from each of them. It turns out that there is no real consensus of what the term big data means. In reading through a variety of papers, it seems very much like Lewis Carroll’s book Through the Looking Glass, where Humpty Dumpty states, “When I use a word, it means just what I choose it to mean—neither more nor less.”

Part of the reason for this is that big data is such a large umbrella term that it encompasses projects with very different goals. For the purpose of this article, the following criteria apply:

  • Data is generally complex and unstructured.
  • Data is frequently dirty and must be cleaned.
  • Data is difficult to process with existing tools.

While exploring the possible benefits of big data, keep in mind that a big data solution is a technology while a data warehousing is an architecture. When talking with vendors, you may come across some who claim that a data warehouse is unnecessary if you have a big data solution. As the terms are referring to different types of things, the fact that you have one does not eliminate the need for the other, though there are a number of different ways that they can be used together.2 The purpose of a data warehouse is generally to ensure that everyone in the company is using the same data.

To continue reading this article, sign up for FREE to
Lab Manager Logo
Membership is FREE and provides you with instant access to eNewsletters, digital publications, article archives, and more.

CURRENT ISSUE - October 2025

Turning Safety Principles Into Daily Practice

Move Beyond Policies to Build a Lab Culture Where Safety is Second Nature

Lab Manager October 2025 Cover Image