Big Data Analysis of COVID-19

Never before have we been in so much need for good data analysis than now. As you may know, I love analysing data in all its forms, and…

Data Analysis of COVID-19: Comparing Dates

Never before have we been in so much need for good data analysis than now. As you may know, I love analysing data in all its forms, and COVID-19 is a good opportunity to drill down on data.

So to understand the changes around the death rate in each country, let’s take the death date for two dates (between 2 April and 9 April), and look to see the changes [here]:

COVID-19 Deaths on 9 April 2020 against 2 April 2020 [here]

I have thus plotted 2 April 2020 on the x-axis and 9 April 2020 on the y-axis. So any country above the red line is increasing its rate more than the average, and any country below the red line is reducing its rate against the average. Thus the US, France and UK are above the average, and China, Italy and Iran are below. Spain sits right on the red line. If you are interested, here is the Ordinary least squares (OLS) analysis for the red line, and where the R-squared value is 0.666:

'08/04/2020 = 2.768 * 25/03/2020 + ', '160.791'
OLS Regression Results
====================================================================
Dep. Variable: 25/03/2020 R-squared: 0.666
Model: OLS Adj. R-squared: 0.665
Method: Least Squares F-statistic: 371.7
Date: Thu, 09 Apr 2020 Prob (F-statistic): 3.21e-46
Time: 18:46:37 Log-Likelihood: -1363.2
No. Observations: 187 AIC: 2728.
Df Residuals: 186 BIC: 2732.
Df Model: 1
Covariance Type: nonrobust
====================================================================
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------
08/04/2020 0.237 0.012 19.278 0.000 0.213 0.261
==============================================================================
Omnibus: 168.058 Durbin-Watson: 1.808
Prob(Omnibus): 0.000 Jarque-Bera (JB): 12988.897
Skew: 2.694 Prob(JB): 0.00
Kurtosis: 43.472 Cond. No. 1.00
====================================================================

The mathematical formula for the average is thus:

[8 April 2020 Deaths] = 2.768 * [25 March 2020 Deaths] + 160.791

So, on average with this analysis, we are increasing at an average rate of 276.8% per week. If we look back two weeks, we see that Spain was on the left-hand side of the red line [here]:

COVID-19 Deaths on 9 April 2020 against 26 March 2020 [here]

And then if we plot from three weeks back, we see Spain moving well above and to the left the red line [here]:

COVID-19 Deaths on 9 April 2020 against 19 March 2020 [here]

Now, lets change the data set to new cases of COVIT-19 in each country. Now, if we plot of the past two weeks, we see that Spain is reducing, but the US is increasing at a higher rate than most [here]:

New cases from 9 April 2020 agaisnt 26 March 2020 [here]

And here is a machine learning prediction using three days to predict the death rate on 9 April [here]:

Prediction for 9 April 2020 using Random Forests [here]

Here is the machine learning code using Random Forests:

Conclusions

A key part of our analysis of COVID-19 must be the rates of change, and where a country is ahead or behind others. The plot on this page shows in detail how things are changing over time. You can analyse here:

and for new cases: