Background

Note: Nowcasts may be less reliable if reporting delay patterns differ from previous periods, e.g. to a high burden on the reporting system. Also note that the share of persons entering into the hospitalization incidence whose primary reason for hospitalization is actually COVID-19 may change over time (e.g., due to the presumably less severe omicron variant).

FAQ

For a summary of the project see also this article in the CODAG Report series (in German).

What is the goal of this platform?

The main goal of this project is to estimate the 7-day hospitalization incidence for Germany and its states reliably and to assess recent trends based on incomplete data. The frequently referred to most recent values of the raw hospitalization incidence underestimate the true number of hospitalizations (see also answers to the other questions). Nowcasting corrections of these numbers allow for a better assessment of the current epidemic situation.
At the same time we have a scientific interest to compare different nowcasting methods and to assess whether the combination of different nowcasts yields more reliable results.

What is the 7-day hospitalization incidence?

We follow the definition by Robert Koch Institute. Today’s seven-day hospitalization incidence is the number of hospitalized cases of COVID-19 (in absolute numbers or per 100.000 population) whose Meldedatum, i.e. the date when the respective infection was electronically recorded at the local health authorities was in the 7 preceding days. It is thus not equivalent to the number of new hospitalizations during the last 7 days. Also, the 7-day hospitalization incidence does not take into account whether the reason of hospitalization was COVID-19 or not (see next question).
Further information on the 7-day hospitalization incidence (in German) are available on the GitHub site of Robert Koch-Institute.
In this context we note that some German states (Bundesländer) also publish their own hospitalization incidences which follow different definitions (e.g., temporal aggregation by date of hospitalization rather than Meldedatum of the infection; see this news item by NDR)). We focus exclusively on the indicator provided by RKI.

Is there a distinction between hospitalizations with COVID-19 as the main reason and hospitalizations of infected persons due to other reasons?

The hospitalization incidence as defined by RKI does not take into account the main reason of hospitalization (“with or because of COVID”). Our analyses build directly upon this indicator and concern purely the problem of delays and incomplete most recent data points (see next question). The difficulty that only part of the hospitalizations are due to illness from COVID thus remains. To our knowledge there are currently no reliable national-level data on this question. Given the presumably lower average severity of infections with the omicron variant it is possible that the share of hospitalizations where COVID is not the reason of hospitalization increases. This can limit the comparability of incidence values across different phases of the pandemic and needs to be taken into account in their interpretation. However, it should be noted that hospitalizations with COVID as a secondary diagnosis nonetheless imply additional efforts for hospitals, e.g. in terms of isolation measures.
A more detailed account on this question for the states of Rheinland-Pfalz and Baden-Württemberg can be found in news items by German public broadcaster SWR (in German). According to these articles, approximately on third of the hospitalization incidence was due to cases where COVID-19 was the primary reason of hospitalization in Rheinland-Pfalz, while the number was close to two thirds in Baden-Württemberg. Analyses on the states of Bavaria and Saarland are available in CODAG report no. 26 from LMU Munich (in German).

Why are the reported values for the most recent days unreliable and why is that a problem?

The Meldedatum and the date when a hospitalization first appears the data set at Robert Koch Institut can be several days or even weeks apart. Several aspects play a role here. Firstly, an infected person may not be in a state which requires hospitalization on the Meldedatum, but reach such a state at a later point. In this case, the number of hospitalizations for the respective Meldedatum will be retrospectively increased by one. Secondly, there can be reporting delays between the actual date of hospitalization and the appearance of the hospitalization in the RKI data.
The daily values of the hospitalization incidence aggregated by Meldedatum are thus usually corrected upwards during the following days and weeks. Most additions occur within a few days, so that the values for the last few days are most strongly affected. Oftentimes they considerably underestimate the true number of hospitalizations (see e.g. this news item by the public broadcaster NDR or CODAG-Report Nr. 21 by LMU Munich; both in German). In particular, preliminary data can create the impression of a decreasing hospitalization incidence even if it is actually on the rise.

What does that mean for the thresholds which have been defined for the hospitalization incidence?

The thresholds of 1.5 and 5 suggested by RKI in September 2021 (Fig 1) explicitly refer to preliminary values as available on the respective date (see “time series of frozen values” below). They thus take into account the incompleteness of the data. The governmental decision from 18 November 2021 (Paragraph 9) does not address this distinction explicitly, but according to media coverage the thresholds of 3, 6 and 9 are equally applied to the preliminary values.

What does nowcasting mean and how should nowcasts be interpreted?

Nowcasting is a statistical tool which, based on preliminary data, estimates which value a given quantity will take once measurements are complete. In our application we estimate how many hospitalizations will be reported in total for a given Meldedatum. To this end, information on the hospitalizations already reported up to the current date and data on past reporting delays are used.
The term nowcasting typically refers to events which have already occurred, but have not been measured or reported completely. For instance, nowcasting methods can be used to estimate how many cases of COVID-19 have been detected on a given day before these information have been aggregated into a central data set. This is not exactly the case for the hospitalization incidence as it is possible that hospitalizations linked to a given Meldedatum have not even occurred yet at the time of nowcasting. We nonetheless use the term nowcasting as it has become common for this type of analysis.
The nowcasts presented here should be interpreted as probability statements. An exact estimation of the true number of hospitalizations is not feasible and nowcasts can merely provide a range of probable values (see below).

Why are several different nowcasts shown? What is an ensemble nowcast?

Nowcasts are always based on a number of assumptions. Moeover, different models may include different additional data sources. Therefore, results based on different approaches often vary, and it is reasonable to compare different nowcasts to get an idea of the range of predictions. Moreover it can be helpful to combine several nowcasts into a so-called ensemble nowcast to achieve a more robust estimation. This approach has shown benefits for instance in weather forecasting, but also in epidmimological applications.

Why is it important to provide uncertainty intervals?

No model is perfect and the exact number of hospitalizations for a given Meldedatum canot be predicted exactly. The nowcasts displayed here therefore explicitly quantify their own uncertainty, i.e. they state how reliable they consider their own estimation. This is done via intervals which are supposed to contain the true value with a given probability (50% or 95%).

How reliable are the nowcasts?

The evaluation of the different methods is an important goal of this project (see below). However, it is in the nature of things that assessing the quality of nowcasts is only feasible after a some time has passed. The visualization tool enables the user to return to past data versions and nowcasts issued in the past to get an idea of their reliability. E.g., nowcasts for the state of Saxony from 18 November 2021 were considerably below the ultimately observed values, which was likely due to the overstrain of the reporting system (see next question).

What are possible difficulties and weaknesses? When are results to be interpreted with special care?

The central assumption on which most nowcasts are built is that the delays between the Meldedatum and the apppearance of a hospitalization in the RKI data set will follow similar patterns in the future as they did in the past. If this is not the case, e.g., due to major changes in testing strategies or an overload of the health system, the quality of nowcasts may suffer. For instance, if average delays get loner, nowcasts may tend to underestimate the true values. It must be assumed that this is currently (December 2021) the case for certain federal states, or may be the case in weeks to come.

How often are nowcasts updated?

Nowcasts are updated daily at around 1pm. As long as a team have not updated their nowcast yet, their nowcast from the previous day (or the most recent nowcasts which is not older than seven days) is displayed.
Occasionally there may be delays, e.g. if input data from RKI become available later than usually. In this case we try to update nowcasts in a timely manner, but some models require a certain time to compute.

Why is there a switch to remove nowcasts for the current and previous day?

For the last two days a particularly large number of additional reports must be expected. Therefore, nowcasts for these days may be less reliable than for days which are further in the past. For this reason we did not show nowcasts for the last two days by default during the first few weeks of our project. In the meantime we have found that, if one takes into account the wider uncertainty intervals, the nowcasts for these two days are not actually less reliable than the remaining ones. We have therefore decided to show them by default.

What does the “time series by appearance in RKI data” show?

An alternative to nowcasting the hospitalization incidence by Meldedatum (i.e. the day when the respective infection was electronically recorded by the local health authority, see above) is to aggregate hospitalizations by the day when they first appeared in the RKI data set. These numbers do not change over the following days, meaning that trends are more straightforward to interpret. Owing to reporting delays, the resulting curve is typically shifted to the right compared to the seven-day hospitalization incidence by Meldedatum.

What does the “time series of frozen values” show?

Another alternative to nowcasting is to show the 7-day hospitalization incidence for each Meldedatum based on the data version from the respective date. This way, all shown values will be similarly incomplete and more comparable over time. A downside of this approach is that it only exploits part of the information already available.

What is meant by “retrospective nowcasts”?

The main goal of this project is to provide nowcasts in real time to allow for an improved assessment of the current situation. However, in order to systematically compare different modelling approaches, we also collect nowcast which have been created retrospectively and evaluate how they would have performed in the past. For a fair comparison it is crucial that models only use data which would already have been available at the respective time of nowcasting.

Contributing research groups

This platform is run by members of the Chair of Statistics and Econometrics at Karlsruhe Institute of Technology and the Computational Statistics Group at Heidelberg Institute for Theoretical Studies. Several other independent groups from academia and media contribute nowcasts (see also metadata files in our GitHub repository):

Moreover we display the most current adjusted hospitalization incidences from Robert Koch Institute.

Methods descriptions

ILM-prop (TU Ilmenau)

These Nowcasts are based on the proportion of the 7-day incidence of COVID-19 cases that appear in the hospitalization incidence after one, two etc. weeks. This model then multiplies todays known 7-day incidence by these weekly proportions and predicts todays 7-day hospitalization incidence by summing these up. The uncertainty is based on a log-normal (within age-groups) or normal (sum over all age groups) distribution where the dispersion is estimated based on retrospective application of the model.

KIT-simple_nowcast (Karlsruher Institut für Technologie)

Nowcasting is based on a simple estimation of the distribution of delays between the Meldedatum and appearance in the RKI data set (based on the last 60 days). From these, multiplication factors are derived and used for an upward correction of incomplete observations. To assess the nowcast uncertainty, the same corrected values are generated for past time points (based on the information available at the respective time) and compared to the subsequently observed values. To this end we assume a negative binomial distribution, where the dispersion parameter is a function of the time difference between Meldedatum and date of nowcast. Estimation of the dispersion parameter ist done via a maximum likelihood approach.
This method is purposefully kept simple and has the role of a reference/baseline model in our comparative evaluation study (see below). The central assumption is that delay distributions are temporally stable. Weekday effects and recent developments in case numbers are not accounted for.

LMU_StaBLab-GAM_nowcast (LMU München)

Nowcasts are based on a generalized additive model and the sequential multinomial structure of the time delay. The model is a slightly adapted version of the method by Schneble et al. (2020) for nowcasting of fatal infections.

RIVM-KEW (RIVM (Rijksinstituut voor Volksgezondheid en Milieu), Bilthoven, Niederlande)

This model is a simplified version of the model presented by van de Kassteele, Eilers and Wallinga (Epidemiology, 2019). The reported counts by date and delay are described by a negative binomial distribution. The expected values are modelled by a two-dimensional P-spline surface and other covariates. This surface is extrapolated for all dates and delays outside the reporting triangle. The nowcast is obtained by summing all counts over the delays by date. Prediction intervals are obtained by Monte Carlo simulation from the predictive distribution. This simplified model is without the calender time x delay interaction, the unimodality and boundary constraints. Model fitting is done efficiently using the mgcv package in R.

RKI-weekly_report (Robert Koch Institut)

These are the corrected seven-day hospitalization incidences first published on Thursdays in the weekly reports of the Robert Koch Institute and now updated on a daily basis e.g. in the COVID-19 Trends Dashboard. These are based on a modified version of the nowcasting method for 7-day case incidences.

SU-hier_bayes (Stockholm University)

Nowcasts are based on a modified version of the model from Günther et al 2020.

SZ-hosp_nowcast (Süddeutsche Zeitung)

SZ estimates the nowcasting values for the hospitalization incidence based on differences between the daily published and retrospectively corrected values resulting from later reports. To this end the archived data sets of Robert-Koch-Institute (https://github.com/robert-koch-institut/COVID-19-Hospitalisierungen_in_Deutschland) from the last 60 days are analysed. For each of the 25 days before the last date in the data set we compute by how many percent the later corrected value differs from the originally reported value. Quantiles of the resulting multiplication factors are computed, and the currently reported hospitalization incidence is multiplied with these quantiles to estimate the total incidence. Finally, the results are smoothed over a three-day window to remove unrealistic fluctuations.

epiforecasts-independent (London School of Hygiene and Tropical Medicine / epiforecasts)

A semi-parametric nowcasting method for right censored hospitalisations by date of positive test. Hospitalisations are modelled using a random walk on the log scale. Reporting delays are then modelled parametrically using a lognormal distribution with the log mean and log standard deviation each modelled using a weekly random walk with a pooled standard deviation. Report date effects are modelling using a random effect for day of the week with public holidays assumed to be reported like Sundays. Age groups and locations are nowcast independently (thus the name of the model). The model is implemented using the epinowcast R package. The analysis code is available here. Note: There is a second version of this model currently not displayed in which the different time series are modelled in a hierarchical manner (Epiforecasts-hierarchical).

NowcastHub-MeanEnsemble

This ensemble is created by taking quantile-wise means of the available member models providing complete nowcasts (28 days through 0 days before the current date).

NowcastHub-MedianEnsemble

This ensemble is created by taking quantile-wise medians of the available member models providing complete nowcasts (28 days through 0 days before the current date).

GitHub Repository and re-use of results

All nowcasts are available under open licences in a public GitHub repository. They can be re-used for a broad range of purposes provided that the source is acknowledged (check the respective license files in the repository for details). You are invited to briefly get in touch with the organizing team or the creators of the respective nowcasts when re-using them publicly.
Nowcasts from our platform are currently used e.g. by Zeit Online, Neue Zürcher Zeitung, Norddeutscher Rundfunk and Südwestrundfunk.

Evaluation study

We are planning to compare the different nowcasting methods in a prospective evaluation study. To this end we have deposited a public study protocol (similar to the protocol of a recent study on short-term predictions).

Collection of hospitalization incidences by Bundesländer

Some German states (Bundesländer) publish hospitalization data which differ from the RKI numbers. These may be less affected by delays or contain additional information on the distinction between hospitalization with oe because of COVID-19 (see details on the respective sites): Berlin, Bremen, Mecklenburg-Vorpommern, Niedersachsen, Nordrhein-Westphalen, Rheinland-Pfalz, Sachsen.