STAR Satellite Rainfall Estimates
Validation Statistics - Satellite Rainfall Validation over the CONUS
Bulk statistics are a way to compare algorithms in a reasonably compact manner, though a lot of information is lost in the process. The following statistics are used on this site:
Root Mean Squared Error (RMSE):
where n is the total number of estimates (e) and corresponding observations (o). The total error expressed by RMSE can be considered to have components due to bias and due to the lack of correspondence between the estimates and observations.
Pearson Correlation Coefficient:
The latter is expressed by the Pearson Correlation Coefficient:
which can be expressed as the covariance of the observations and estimates divided by the product of their respective standard deviations, which is shown in shorthand in the middle portion of the equation and in longhand on the right (where the overbar indicates mean values).
The other component of total error can be expressed by the
which can be expressed as the covariance of the observations and estimates divided by the product of their standard deviations, which is shown in shorthand in the middle portion of the equation and in longhand on the right (where the overbar indicates mean values).
However, while these statistics offer a first impression of relative statistical performance, many details are lost in the process, including whether or not a superior performance by one algorithm over another is unconditional or only occurs on an intermittent basis. Furthermore, the highly skewed nature of rainfall distributions means that performance for low rainfall amounts has a much greater relative impact on the overall statistics than performance for heavier amounts--the exact opposite of what is desired for hydrologic and extreme-event monitors and forecasters.
Consequently, the analysis here includes not only a scatter plot (which is color coded to indicate the frequency of points within a certain range of the joint distribution of o and e), but also plots of selected binary scores as a function of observed precipitation amount. This allows various aspects of algorithm performance to be evaluated at different levels of intensity.
Given the definitions of A, B, C, and D in the following table (where a value of 1 for o or e represents a nonzero value, not necessarily just 1.0) the following binary scores can be defined:
Probability of Detection (POD):
Which is the fraction of non-zero observations that are correctly detected by the estimate:
False Alarm Rate (FAR):
The False Alarm Rate is the fraction of non-zero estimates that were matched with zero observations (i.e. "false alarms"):
Bias is the ratio of total non-zero estimates to total non-zero observations:
Heidke Skill Score (HSS):
The Heidke Skill Score is a skill measure for discrimination. A value of 1 indicates perfectly correct discrimination; a value of 0 indicates no better skill than chance; a negative value indicates less skill than chance.
In the plots presented on this web site, these scores are calculated for multiple threshold values of observed precipitation in order to depict how the performance of each algorithm varies with preciptiation intensity. In some cases, an algorithm with relatively poorer overall scores may perform better for intense precipitation events than one with better overall scores due to the skewed distribution of precipitation mentioned above.