COVID-19, a Data Story


Coronavirus was first reported in the Chinese city of Wuhan in December 2019. Also known as  COVID-19, this disease has since caused hundreds of thousands of deaths worldwide. Around the world, millions have been infected. And because we know very little about COVID-19 conclusively – what causes it to spread, who is most at risk and how we can best protect them – we’ve watched the data and the scientific stories that have interpreted it, closely.

As the disease engulfed Italy, and moved west to Spain, the world watched. Behind the scenes, tentative learnings were amassed but still the data picture stalled. And we learned new words, such as ‘R-rate’, ‘shielding’ and the references to data and science have multiplied. But without firm criteria for value or reporting, no clear picture was determined.

A pandemic was declared in March and, around the world, governments embarked on their own localised initiatives in an attempt to stop the spread. And as well as reducing infection by keeping people in isolation, we hoped this would buy sufficient time to learn enough about the disease so that decisions could be made evidentially, not based on a hunch or a possibility.

To ‘flatten the curve’ has been the collective news-friendly objective. But words like this have hidden the complexity of thought leadership that we needed. To impact change, decisions needed to be made, but when faced with the unknown where could we look towards, but the data itself?


Numbers continue to rise, around the world a plateau is yet to be reached. Officials in various countries say social distancing and the wearing of personal PPE is helping the rate that it is spreading to slow. But the data on which this is based is not clear. And despite the growing volume of available data and the knowledge that experts around the world have amassed, the slow-down is not nearly, or even close, to a stop.

COVID-19 numbers have dominated the global news for most of 2020 but without a true understanding of what they mean. We’ve yet to move onto discussions about resilience and preparation, and how much value the data will have if it is used for preparations of what may lie ahead. The numbers themselves – infection rates, safe social distances, deaths, re-infections, cause of death – have all come under scrutiny. But what else do we need to know and what open source data will help?

We have access to obesity, location, employment and travel data. But these are not featuring on any media maps, even though the knowledge they hold together may flatten the curve.

No two perspectives seem to be the same and along with the constant number debate, the narrative about data and common denominators rumbles. Is the global BAME community at higher risk? Is obesity a factor? Are children super spreaders? Can we predict the hotspots, identify where the next localised outbreak is likely to be?

John Hopkins University in the US has been keeping a tally of cases and fatalities. The most recent figures (02-07-20) are 18,404,898 cases worldwide (and 97,847 deaths. But nothing else is reported, this is only a numbers game.


We don’t know conclusively whether certain people are more likely than others to catch COVID-19 or why some people are more adversely affected than others. And nor do we know why symptoms vary so widely and from person to person. Initially, age was thought to be a significant factor. However, the virus itself does not appear to discriminate by age: many young healthy people and children have tested positive.  Some who have tested positive have shown COVID-19 symptoms; some have not.

It is possible that many people have had coronavirus and don’t know it. A study conducted by researchers in Singapore and published by the US Centers for Disease Control and Prevention is the latest to estimate that around 10% of new coronavirus infections may be sparked by people who were infected with the virus, but did not experience symptoms.

In response to this data, the CDC changed how it defined the risk of infection for Americans. It essentially says that anyone may be a considered a carrier, whether they have symptoms or not.


We do know that the virus spreads easily from person to person; but without robust data tracking systems to report cases, symptoms, diagnoses and deaths, there is no value to this knowledge. We need the open source, geo-spatial data to make sense of it and to build a picture.

If PlanSpatial were to build a concise, yet comprehensive, data picture that could be used to allocate resources in anticipation of a further outbreak spike, we would look at various open source and big data. Population, age, ethnic minority is de facto, but the addition of childhood health / obesity numbers, COPD clinics, major local employers (factory, food processing, healthcare, education) would be valuable. Those who already struggle with chronic medical conditions also seem to be vulnerable so clinic / pharmacy data may be applied.

In its raw state, COVID-19 data has held no value: but the end use or reporting has been highly subjective. Data has helped inform insights and headlines have screamed numbers, percentages – and induced fear. But how within this whole story can data be used for good? We’re at a crucial time now, moving away from isolation. And countries are watching each other, playing close attention to the numbers. But rather than ask what they are telling us, we should be asking what we need to know, and using these questions  as criteria to analyse the data.