Data remains crucial to understanding the impact and spread of COVID-19 around the globe. Jean-Claude Thill, a Knight Distinguished Professor in the Department of Geography and Earth Sciences, and Rajib Paul, an associate professor in the Department of Public Health Science, discuss an effort by UNC Charlotte's interdisciplinary School of Data Science to leverage data to slow the spread of COVID-19 and future pandemics.
I’m Jeffrey Jones, Director of Executive Education and Professional Development at UNC Charlotte, and this is Charlotte Business Buzz.
Connecting the Queen City’s business community … From UNC Charlotte’s Belk College of Business.... This is Charlotte Business Buzz.
Data remains crucial to understanding the impact and spread of COVID-19 around the globe. Government agencies, institutions and organizations stood up websites and interfaces to showcase raw data around the pandemic.
As the United States battles a drastic resurgence in coronavirus cases, a group of UNC Charlotte researchers aggregated, analyzed and visualized publicly available COVID-19 data to produce an interactive dashboard to better understand the rapid spread of the virus. UNC Charlotte’s interdisciplinary School of Data Science facilitated this project with a twenty-seven thousand dollar grant from a Fortune-500 financial services company that has a strong presence in the Charlotte community.
Joining us to discuss are two of the project’s collaborators: Jean-Claude Till, a Knight Distinguished Professor in the Department of Geography and Earth Sciences, and Rajib Paul, an associate professor in the Department of Public Health Science.
Jean-Claude and Rajib, welcome to the program.
Thank you for having us. Thank you, Jeff.
What did you originally envision for the project?
Like, mainly two goals on this project. One is looking at the basic epidemiologic measures, like how they evolve over the time as well as over the space, and then also another is looking at some ensemble forecasting because when we look at the business perspectives then forecasts in the stock markets are very popular, forecast in weather they are very popular, and now when we come to the health- like health side the forecasting - it is speaking up and one of my colleagues as I said Dr. Sungjne and he will probably term it instead of forecast probably ‘nowcasting’ so because we are not predicting for a long period of time but more like a shorter period of time what we are trying to do here with the pandemic. And then there are several models are running uh on this forecast and then one famous uh statement by George Box was: “all models are wrong and some are useful”. One goal of the project is to put all of them together - coming up with some consensus estimate of the forecast and then more assess where these models agree and where these models disagree and why they agree, why they disagree - so what are the inherent uncertainties associated with it so that it again lead to more informed decisions. The other goal was as I said like looking at the hot spots the clusters of the diseases. And the point that that Rajib just made about for the hot spots is very interesting to me as a geographer - so geographers really look at where things happen and why they happen at certain locations and certainly when you look at a map of incidence or a map of mortality rates or fatality rates resulting from the pandemic it is really very interesting that really it is not a flat surface - it's very spiky as a matter of fact and those spikes really move over time. So really understanding why they occur at a particular location, and why those spikes move and why the valleys move to different locations is really something that really I as a geographer have been particularly interested in trying to make sense of if we do follow already what is going on in the news and reporting in relation to the pandemic, we hear indeed that often government entities, public health officials, governors and so on are trying to anticipate where the pandemic is going and where it's evolving and how it is evolving but largely their measures and the interventions are reactive - so there's a lockdown and the confinement measures and this is really something that's really very interesting about pandemic like this and it's a little bit scary really unfortunately to have to to say that it's interesting to us as researchers but ultimately we're living through a natural experiment in many ways and this is really what makes it extremely interesting to see really how space and time come together to understand really the evolution of this process and of course we need to keep in mind that we we have a disease we have a virus that is propagating over time but we have also individuals and the behavior of the individuals is really what we need to come back to and the way individuals respond to it or fail to respond to it modify their behavior in response to it is what may make the pandemic much more powerful and much more effective in spreading its effect on society but also the way agencies which really means that organized structures government agencies and organizations really respond to this - for instance the fact that really to this day many businesses have deliberately made the choice to not call their workers back to contain the pandemic and make sure that really we keep the effect of the pandemic in terms of the infection or casualties or death ultimately being the worst case scenario as low as possible even though we're hitting obviously some very high - some high numbers among the highest in the world at this stage.
With each of the key researchers coming to this problem from a different perspective, in what ways were you able to collaborate in the creation of the dashboard?
Yeah one thing I can say is really that this team - while it came together fairly quickly - really it was very effective because it did cover with these some of the core expertise of it necessary in order to move forward I think that I certainly given my own background I would not have been able to work on this project alone - a lot of dimensions of it that really tied directly to public health and while I have some knowledge of it I certainly don't have a core expertise in it and certainly i'm not a computer scientist. So certainly bringing together those very different facets together I think has been a tremendously enjoyable experience, and rewarding experience in terms of bringing it to fruition in a short amount of time and whether the students have been fabulous to work tirelessly really to develop a product that is extremely polished and effective in terms of conveying the substance of the message that we wanted to share with the community.
What did you learn along the way as you develop the dashboard?
If I talk about the data, the formats change every day and how the states, how the counties were reporting that evolved and then we have to keep updates on those aspects when we are updating our website to make sure that we have our codes updated when our data sources get changed so that we don't stretch ourselves and I think our students also get that important lesson through this work - that data also has limitations. Now to understand what data can say, and data cannot say and where we have to stop and how there is a lag when counties are reporting their numbers if you look at in general there are epidemiologic measures like incidence number, of new cases per day, but when you are looking at COVID-19 actually it's very difficult to go with incidents because not all counties probably are reporting at the same time - so there could be a lag. Probably seven day moving average is a better measure and then - less noisy - the noise that you'd like to filter out, the uncertainties that inherent in the data and then it gives also like looking at a time frame how it would look like when you look at the data set over a seven day period and then you take an average so those type of different aspects that we learn from this data, and then how to handle the data, how to come up with meaningful conclusions from the data, what data tells us and where the limitations exist.
What I would like to add to that as well is really - not only we learned but really this is something if we wanted to communicate that to the public. Remember some of the early conversations we had about what time series did we want to report, are we going to report the raw data or are we going to transform, I'm going to use I mean when they're deporting as well the the seven-day moving average and we felt that it was really very important to do so because quite often when you listen to the the news outlets on the web on whatever - these are the raw data that are being reported for the most part and I think some of that - while those are facts - the facts can also are subject to interpretation and I think it's important to put them in proper context like what what was saying some agencies don't report the information every day, so there is a little bit of a lag - so all of a sudden you may see a spike, in the number of cases being reported - above that's being reported or you may see a zero incidence rate - which is actually just a result of the fact that the information is not necessarily reported instantaneously. So the seven-day average I think in itself is really very important to understand as a concept and I think it's very important to have a more smooth depiction of the reality, but more faithful depiction of that reality as well. In a way you have the disease that lead to a certain number of infection, the number of deaths, then you have the reporting of the information and ultimately this is really something that's filtered by the human process of the testing, the data collection, the assembly, the reporting and then there's us and we basically wanted to sort of take abstraction of the entire process with those steps that are taking place at the public health department or in some other agencies before being made available to us and before we report it. So that in itself is an important lesson and I think it's important for anyone who wants to make use of data to understand that data is there, data has errors as well, data can be delayed in the reporting, data needs to be corrected sometimes - need to be cleaned up - and so on as a matter of fact as a few days ago we still discovered really some of the issues that we had since we have indeed at this point in time an automated process for capturing the information from our data sources we realized really there was some some information was not filtered properly - not properly vetted - and we do need indeed to go back with looking our data and make sure that really even if the data is corrected in our data sources several days after the official reporting date that we captured this information, so the nature of the data is important understanding the big data quality is essential if you want to make informed decisions.
What did you consider on the other side of this - so we talked a lot about the data, the data sources, cleaning the data, and so when you think about your options for presenting it or actually visualizing it and creating the dashboard look and feel - what considerations and learnings did you have as you were putting that together?
If waking up in the morning someone wants to see how their home county is doing in terms of the infections rate so that they get that answer handy. So, when they open our website, the very default is the state and then they can click on their county and then it will give them this seven day moving average so they'll be able to see whether the infections rates are increasing, decreasing and where it stands. So, how to communicate - it's not only like writing papers or publishing journal articles but here we are also trying to communicate in a way so that our community get help from these websites.
For us it was ultimately very important to report the data but also to communicate in a synthetic fashion and that's why we communicate through various visual tools - certainly the map is very important, really being an extremely powerful way to communicate what is going on in people's community - in their backyard basically - and not just necessarily Mecklenburg County but really the broader community. So one of the tools that we did implement is with the ability for any user as a matter of fact to look at a number of counties in the direct vicinity so that they can compare basically maybe where relatives lives, communities where they socialize, and where they recreate and they do some shopping where they work as well to understand really the context- the broader context - that is typically smaller than the entire state but may as well across the state border. But being able to look at this across the state border and not be bound basically by state border that really splits in many ways our community in half, so I think that that those are very powerful tools but also being able to visualize things over time, so the evolution essentially of the incidence of the pandemic. So we do have indeed multiple ways really to visualize the information - we have bar charts that convey the magnitude of the impact of the pandemic in the different communities adjacent to a particular county. So I think that we're very intent within communicating as meaningful and as powerful as possible that information. So now as Dr. Paul indicated we can look at with our tool the reality of the pandemic at the state level, as well at the county level he can zoom in and out this may be quite revealing in terms really of the life experiences and the intervention that our local public officials may have put in place - but also the behavior - the differential behavior - that residents of different communities exhibit in the face of of this pandemic and adversity that results from it.
So if you could extend or improve the dashboard in one way, how would you do that?
Yes, if I could I definitely would add something in terms of the impacts that the pandemic has and certainly there are some websites which often are separate really from the reporting of of the pandemic consequences from a from a public health perspective but basically mesh together information about the loss of income, the loss of jobs in the communities, so that people can really visualize direct relationship that exists between the severity but of the pandemic from a public health perspective and the impact on store closures, business closures, bankruptcy rates in the community and I think all of that would be extremely powerful tool not just for the public officials but also for the citizens they will have a better understanding of the fact that really pandemic is not just an issue that needs to be studied by public health officials but also by policy decision makers, by the political leaders, and ultimately by anyone who is in a position of leadership in the community because indeed it is an event that has repercussions and implications throughout society. I think that really bringing together basically a broader set of causes and consequences ultimately of all of this would be very very interesting and as the vaccination stage is underway right now it is important to see the impact of the vaccination rates as this information becomes progressively available may have on the number of cases that is observed including the death rates and mortality rates in different communities.
We’ll be right back with Jean-Claude and Rajib in just a moment on Charlotte Business Buzz.
UNC Charlotte has created the first School of Data Science in the Carolinas. Through a combination of research, industry and community engagement, the School of Data Science is proud to offer the only data science bachelor’s in the state - and several master’s programs - providing students with the educational opportunities to become innovative leaders in the field. Learn more at datascience.uncc.edu.
Welcome back - we're continuing our conversation with Jean-Claude and Rajib on visualizing the COVID-19 pandemic through data science.
So I’m curious about how data science might be applied to help solve other complex, large-scale problems in healthcare, business, or society. Can you speak to that a little bit?
Data science is obviously a very intriguing perspective - on doing research I guess for us as researchers - but also in terms of how that interfaces with society at large. There's a profusion of of data and I think the ability that we have now I guess with the tools that you have not just to to store the information but also to tap into this information to share it with the public and leverage that information to make the proper policy decisions put in place the proper interventions I think is really very important and in a place like Charlotte where we talk a lot about social mobility and upward mobility and the lack thereof - certainly it is - many of us are in a position to to bring significant amount of knowledge to this question and hopefully to help our resolve some of that issue and to develop some interventions but ultimately before putting in place some interventions I think it's important to understand really the root causes of the problem and the multifaceted nature of what's lying underneath and I think the data is extremely beneficial for that - whether we're talking about social mobility or lack of affordable housing, a lack of mobility - all of those are extremely important - or criminality the fact that indeed we have some fairly significant hot spots of crime in certain areas. So having the data is really very useful but ultimately being overwhelmed with the data is counterproductive as well so data science is an approach to society and to the management of our social relationships that can be extremely constructive in terms of identifying some meaningful answers to all of those problems. So yes it is definitely much broader than certainly making sure that really businesses can operate more more efficiently and more productively but data science can work and can contribute to the greater good of society - that's a phrase of data science for social good and I think that's really very important - you should not forget that's something that is that is really very important and certainly for us in academics I think it is really very important to remind people that yes data can be very useful in order to generate more income and more profit for the businesses but it's also a way for the public sector but also for businesses who have some consideration for the fact that they live and they operate in society and their own long-term profitability is only conditioned by really the ability of that community to function constructively, productively, and equitably and data science can be extremely useful in that respect in many different areas that I indicated.
Rajib, what do you see for the future of data science?
The future of the data science from the health care perspectives then it has been increasingly now used artificial intelligence. These days the images are used extensively for in healthcare and it is one of probably the invasive technique to know what's going on inside someone's body and then the artificial intelligence play a big role. When you are looking at - you create an algorithm to detect whether someone's tumor is benign or like it's malignant, and then if when you present that algorithm to a radiologist when a radiologist use that algorithm and then regulators visually look at those images, then they can compare what the algorithm is telling them and what their decision is - so it leads to a more confidence in the decision, probably more accurate decision making so data science definitely play a big role here. If you look at stock markets the predictive models are used extensively, in weather and climate like where you can go and then you just put your zip code and then you can see what are the predictions -what are the chances of rain, chances of snow - but we do not have as such created those type of platform for whether the predictions of influenza, predictions of infectious diseases, this is are like a new area and then it's evolving so, definitely data science will play a big role here and I definitely emphasize more on the predictive modeling side.
That would certainly be interesting to be able to predict influenza and other diseases like we see the weather forecast for the day. What is UNC Charlotte's role at the forefront of the the future of data science?
So if you look at the the role of UNC Charlotte, then as a faculty member we address this is in through our teaching. If we look at the students that were involved in the project there are health analytics students who are learning how to analyze the data - the importance of data, how to interpret data accurately. Also there are masters in public health programs focusing on epidemiology, and we have a health services research, public health science Ph.D. programs where the data-driven approaches have been used extensively and I have been also on several geography students committee and then I know that how extensively they look at the data and then they use the data for their research. So through advising, through mentoring, UNC Charlotte definitely can play a big role in the data science - what we say evidence based decision making. Research is obviously what we all are doing and then I think many of the researches that are done in UNC Charlotte are data driven and then a lot of emphasis has been put how to analyze the data, how to store the data in a secured place if there are sensitive information, data security - so those type of things I think UNC Charlotte play a big role here definitely in this urban setting. And if you look at the community like the Atrium Health, Novant, and then the county health departments - they need a lot of help in - though they do have expertise in that - but that the need is always bigger than the resources. So UNC Charlotte can also offer help to them, how to help them with analyzing data, how to collaborate with them, so that good research on health and health care can be done.
I think UNC Charlotte is very nicely positioned given it's a very early involvement in data science offering a number of academic programs at the bachelor's level now and the master's level and moving on to offering a number of very focused programs at the Ph. D. level as well, but I think fundamentally what we try to do at UNC Charlotte is train students in evidence-based research and articulating really the theory and the practice and we bring to the community a lot of knowledge of the data, understanding of the social relationships, understanding of the environmental context, as well. Whether you work in the private sector, in the public sector would have a bearing on the kind of decisions that need to be implemented and the ability to cast in the proper context the process that is taking place and the decisions that may have to be adopted in order to do better and to enhance society - quality of life and the profitability of businesses - I think ultimately UNC Charlotte teaches members of the community whether you're young or older, an understanding of those relationships and the importance of evidence-based research and I think that we are very proud of doing this and we believe that we're doing a good job and we may make a mark on the local community as well as beyond.
Thanks so much for your time today, Jean-Claude and Rajib.
It was a pleasure to be with you today. Thank you.
The UNC Charlotte School of Data Science COVID-19 dashboard is publicly available online at sdscovid.uncc.edu.
Learn more or listen to previous episodes at belkcollege.uncc.edu/buzz B-U-Z-Z
This is Charlotte Business Buzz ... Connecting Charlotte business through one-on-one interviews with UNC Charlotte faculty, staff, alumni and industry partners… presented by the Belk College of Business and produced in association with University Communications.