Predicting hospitalizations and positive cases in Belgium




V. Verardi
Vincenzo Verardi (UNAMURCRED, FNRS, ULB).
Predicting hospitalizations and positive cases in Belgium
As explained in a previous post COVIDSIR, epidemiologists have very performant models to understand the evolution of an epidemic. Probably the best known is the socalled
that allows to easily model the evolution of an epidemic relying on the reproduction number \(R_0\) and the infectious period of a pathogen. A great epidemic calculator is freely available at http://gabgoh.github.io/COVID/index.html. This is a wonderful tool to simulate how the Covid19 epidemic might spread according to the characteristics of a population. The reproduction number \(R_0\) is the average number of infections produced by a single case in a population where everyone is susceptible to be contaminated. This has been estimated to be around 2.2 for the Covid19 pandemic. Thomas Pueyo has illustrated how this value could cause dramatic outcomes in his article Coronavirus: The Hammer and the Dance if nothing is done.
The effective reproduction number \(R_e\) (i.e. the product of \(R_0\) and of the fraction of the population that is susceptible of being contaminated) can be reduced thanks to social distancing. By increasing social distances, the share of the population susceptible of being contaminated decreases and so does \(R_e\). As soon as \(R_e\) becomes smaller than one, the epidemic will start fading away or at least slowing down. To summarize, thanks to the increase in social distancing, the spread of epidemic can be slowed, and an overwhelming of the health sector can hopefully be avoided. After an exponential growth in the number of cases in the beginning of the epidemic, an inflection point of the curve could be reached much earlier than if nothing was done. A very simple and naïve model to estimate the total number of contaminated individuals could then be used to have a broad idea of how the epidemic is evolving. This would be to look for the sigmoidal (\(S\)looking) function relating the number of observed cases to time, that better approximates the observed cases. Probably the simplest candidate is the logistic function that can be written as:
where \(m\) is the asymptote (maximum), \(t_{inf}\) is the inflection point (the moment of maximal growth rate in the number of cases) and \(s\) is the scale (related to the steepness of the curve). Ideally one would like to predict how many individuals will be affected in any period and understand if, at the end of the epidemic, the number of cases will have been enough to reach the socalled herding or collective immunity.
For the very specific case of Covid19, only part of the infected individuals is identified through testing (at least for now) and many individuals that have milder symptoms go unnoticed. With the increasing number of tests performed over time, it is hence questionable to do a timeseries analysis using the number of positive cases as these will most likely increase due to an increase in testing (even if the epidemic slows down). Furthermore, any international comparison would make no sense as the number of tests performed changes dramatically from country to country and so does the identified number of positive cases.
What we suggest to do is to rely on the number of hospitalized patients, which is probably more informative. Even if the number will (fortunately) not coincide with the total number of positive cases, the shape of the prediction function should be similar. It could then be possible to estimate the total number of cases (from the predicted number of hospitalizations) doing a simple proportion (relying on the percentage of hospitalized patients amongst the infected ones as estimated in the literature). The Maximum Likelihood estimation (broadly speaking the most likely model given the data available) for the logistic regression is easy to fit. In the graph below we present the observed number of hospitalizations (dots), the best fitting logistic curve in blue (representing the expected cases) and its derivative (representing the expected change in the number of hospitalizations) in orange.
Click on the legend to add or remove lines