The Role of Data Hasn’t Changed, but Our Grasp on It Has
May 28, 2020By Peter Wang
As the COVID-19 pandemic sweeps the globe, many business leaders have been left scrambling to figure out what it means for their industry, and how they should brace for impact. Few business organizations, if any, are exempt from the challenges posed by the novel coronavirus. Even startups in the prosperous tech sector are shaking in their boots—despite the fact that many have built up vast stockpiles of cash, and are well positioned to operate remotely. However, while tech companies operating in fields like transportation and hospitality prepare for sharp drops in demand, there’s at least one segment of the industry that remains as busy as ever: data science.
Even in the midst of unprecedented disruption, today’s enterprise leaders are still relying on data science to guide them as they develop strategies to help them weather the tumultuous weeks and months that lie ahead. Meanwhile, the data scientists themselves are as productive as ever—no surprise given that most are well-accustomed to the ins and outs of telecommuting and remote collaboration. In fact, many are adding even more work to their plates by pitching in on pandemic response efforts. Although data analytics and statistics have always been central in responding to public health crises, COVID-19—as the first global pandemic of the smartphone era—is the first one that humanity is fighting with real-time data science and machine learning.
Why The Coronavirus Pandemic Puts Data Science Center Stage
For decades, the field of data science has grown not only in sophistication, but also in influence. It increasingly shapes the way we manage our governments, our businesses, our families, and our daily routines. Yet, for non-experts, the role of data in everyday life is all but invisible. When they think about big data, they think of endless spreadsheets and impenetrable mathematical functions, rather than the smartwatch on their wrist or the automated prompt in their inbox. The COVID-19 pandemic has changed that dramatically.
Since day one, data science has been at the forefront of the COVID-19 response. All around the world, healthcare workers, scientists, epidemiologists and policymakers are aggregating and sharing incident data, using near real-time COVID-19 trackers to make more informed decisions about fighting the virus. Abstract graph theory concepts have suddenly become terribly concrete, taking the form of global social distancing measures. Meanwhile, complex statistical concepts like R0 and “flattening the curve” have invaded the popular imagination, spawning everything from memes to musical parodies.
How Data Science is Helping Combat The Pandemic
For the first time in human history, we are all fighting a common enemy together using the power of data and mathematics. In recent weeks, our team at the data science company Anaconda has been especially heartened to see how scientists all around the world are using our beloved Python programming language to pitch in, helping public health experts and government officials determine how best to respond to the novel coronavirus. These teams are using open source collaboration and enormous public data sets to help paint a clearer picture of how the virus is spreading, and those insights are making a tangible difference in communities contending with the virus today.
Take the open source project Nextstrain, for example. One of Nextstrain’s most valuable initiatives has been an interactive Python web app called Auspice, which is designed to visualize how COVID-19 is mutating. Like most viruses, COVID-19 mutates over time as small errors are made during the virus’ replication process. With data science, genomic epidemiologists can track these mutations to see how the virus has traveled. Using data from the Global Initiative for Sharing All Influenza Data (GISAID)—a database created by the WHO—the Nextstrain team’s Auspice app generates detailed phylogenetic trees, which a a visual tool used by scientists to illustrate the way a virus mutates over time.
This work may be highly academic, but it has already made a direct impact on the real world. In February, public health officials in Washington State used Nextstrain’s data to guide a number of important policy responses to the coronavirus, which had only recently begun to appear in the States. The Nextstrain team analyzed viral samples collected in Washington state, six weeks apart from each other, and found that both samples contained a mutation rarely observed in samples taken from China. This provided strong evidence that community transmission of COVID-19 was occurring in the state, prompting both the governor and the Mayor of Seattle to declare a state of emergency—a decision that likely saved hundreds of lives.
Data Science Enters The Mainstream
Meanwhile, on the other end of the spectrum, everyday citizens are quickly transforming into amateur data scientists as they contend with the complicated science that now dictates much of our day-to-day lives. There’s little doubt that the coronavirus has had profound and pervasive effects on the language we use in even our most casual conversations. For evidence of this influence, look no further than the Merriam-Webster dictionary: This year, the Merriam-Webster organization made the fastest emergency update in its 200-year history with the term “COVID-19,” which was announced by the WHO on February 11 and added to the dictionary mere weeks later on March 16.
Since then, interest in the technical language surrounding the coronavirus has surged, and the internet has been filled with coronavirus glossaries produced by dozens of press outlets and various scientific organizations. Many of the terms that have become most prevalent are those rooted in data science and statistical analysis. Just a few months ago, the average person might have responded to the proper pronunciation of R0 with confusion or a cheeky “are too!” Now, the once obscure variable is growing increasingly infamous thanks to in-depth news coverage and widespread discussion on social media.
Elsewhere, seemingly dry, mundane expressions taken from the fields of public health and mathematics have become the subject of lengthy, meditative essays by celebrated writers. In The New Yorker, author Karen Russell writes a beautiful passage about the comfort she’s found in phrases like “flatten the curve,” a simple concept rooted in graph theory, which she says has helped her turn “fear into action.” In the LA Times, reporter Mark Z. Barabak explores the ways that words and phrases like “social distancing” are tools that not only keep us informed, but which also keep us safe.
The language and mathematics of coronavirus are also clearly keeping us engaged, if recent headlines are any indication. In the COVID-19 era, analysis generated by math and data science have become the subject of “Breaking News” alerts. We read articles about the coronavirus math that we should really be worried about, and watch videos of experts “doing the math” on the virus’ spread. Arguments over the interpretation of data can balloon into full-on political scandals, and we find ourselves searching through the latest data updates in hopes of finding reasons to remain calm.
Data Science in The Post-Coronavirus Era
Will the increased prominence of and interest in data science, mathematics, and public health topics continue once the coronavirus has passed? Not likely—or at least, not with the same intensity. However, it is possible that the increasingly high standing of data science will have some long term effects. As many have observed in recent decades, Americans have become famous for their mistrust of experts and academics. Fortunately, that trend has been on a multi-year decline, and more recent Pew Research Center polling indicates broad public trust in institutions amid the pandemic. Hopefully, this will only increase as more Americans learn more about the scientific insights that have been our greatest tools in the effort to combat the coronavirus, and the hardworking scientists putting them to use.
If you want to hear more about this topic, I'm sitting down with Alex Woodie from Datanami during our AnacondaCON Day 2 Keynote to discuss data's role in the new normal. You can register for the free, virtual conference at anacondacon.io.