After finishing his postdoc studies at the University of Washington, where he was the recipient of the Moore/Sloan Data Science Fellowship and the WRF Innovation Postdoctoral Fellow in Data Science, Michael Fire envisioned building a lab “dedicated to solving real-world problems using open data and data science tools across different domains, with an emphasis on making social good,” he explains. After becoming an assistant professor in the Software and Information Systems Engineering Department at Ben-Gurion University of the Negev (BGU), Fire “made good” on his vision, and founded the Data Science for Social Good Lab. “Since then, thanks to data science's growing popularity, our lab has rapidly grown. The lab currently has over 20 members that conduct active research encompassing a wide range of research domains,” says Fire.
The team’s work indeed covers a variety of real-world issues and challenges, including inequality and diversity, public health, smart cities, and sustainability, to name a few. Specific projects range from investigating gender biases in clinical trials, to developing an approach to monitor palm trees for insect infestation, to analyzing global dietary habits and the linked economical effects—all of which are driven by collecting and analyzing big data.
One of the team’s most recent endeavors, a collaboration with Dr. Talia Meital Schwartz-Tayri, of the university’s social work faculty, aims at helping at-risk youth “by utilizing historical data from documented records of governmental family services to identify factors that influence their life for the better or worse,” says Fire. “For example, historical records can be used to better predict the types of help services that are more likely to positively impact their lives.” Fire and fellow leaders of the lab, Galit Fuhrmann Alpert, PhD, and Dima Kagan, a PhD student, explain that the large amounts of data they collect for their projects can either be structured—meaning large tables with billions of records—or unstructured—texts, images, and sounds. “To analyze different data types, we use state-of-the-art data science tools to manipulate the data and extract information,” says Fire, Fuhrmann Alpert, and Kagan. “For example, in our recent work with the Palm Trees Infestation, we used deep learning algorithms to automatically detect palm trees in Google Street View Images.”
The team can carry out these data-intensive projects thanks to the lab’s numerous robust high-capacity servers. The lab also has access to university high-performance computing clusters. Amazon Web Services and Google Cloud are also used when needed.
Free access for all
The Data Science for Social Good Lab openly shares its data and codes on its website, in an effort to help other researchers across all types of disciplines with their studies. Examples of public large-scale datasets featured on the site include online social network datasets, such as Facebook, Google+, Academia.edu, and Reddit Communities, as well as time series datasets, and the largest public network evolution dataset, with more than 20,000 networks and a million-plus real-world graphs.
To the layman, the concept of big data can sound overwhelming. So, in addition to sharing data and codes with fellow scientists, the lab team also makes it a priority to connect with the general public. “We do our best to communicate our research to the public by creating short videos, infographics, and publishing blog posts. By doing this, we hope to increase awareness of issues that we believe are important. Our tools and code are highly transferable to different fields, and we hope will be used by the broad community,” says the team.
Sharing knowledge with the next generation
While the lab’s capabilities to collect, analyze, and share massive amounts of data for social good is impressive enough, “the human capital is what makes our lab unique,” says Fire. “We have a fantastic group of researchers from very diverse backgrounds.” Fire brings his background in computer science and theoretical mathematics to the team, while Fuhrmann Alpert comes from the field of computational neuroscience, and Kagan has a strong background in software engineering. The diversity of expertise enhances the lab’s skillset and contributes to innovative problem-solving.
As the lab team keeps growing, their ability to reach and advise students also grows. The lab offers three avenues for student mentorship for those interested in big data and relevant fields. Senior-year undergraduate students can find mentorship among the team for their engineering and research projects, and PhD and MSc students can receive help with their data science-related research as well. Most recently, Fire and the team developed a course that educates students on how to use state-of-the-art data science tools. “One of the lab's most important goals is to train students in the field of applied data science. For this purpose, we designed and are teaching a unique course titled ‘The Art of Analyzing Big Data—The Data Scientist's Toolbox,’" explains Fire. Currently, about 100 students enroll in the course each year, but the team hopes to eventually make the course publicly available to reach thousands more.
Fire, Fuhrmann Alpert, and Kagan are witnessing first-hand how the field of big data research is gaining momentum, and will continue to make a positive impact on society. “Big data and data science are currently changing the world,” says the team. “For example, the Large Hadron Collider generates petabytes of data. Another example, the Gaia project, stores a large amount of data to map our galaxy. In these enormous piles of data, there can be things that will change the world and make our lives better, if we ask and study the right questions.”