\(PRESS\) is a measure of the quality of a regression model using residuals. \(PRESS\) is a validation-type estimator of error that uses the deleted residuals to provide an estimate of the prediction error. When comparing alternate regression models, selecting the model with the lowest value of the \(PRESS\) statistic is a good approach because it means that the equation produces the least error when making new predictions (see Helsel et al., 2020).
It is particularly valuable in assessing multiple forms of multiple linear regressions, but it is also useful for simply comparing different options for a single explanatory variable in single-variable regression models.
Usage
press(data, ...)
# S3 method for class 'data.frame'
press(data, truth, estimate, na_rm = TRUE, ...)
press_vec(truth, estimate, na_rm = TRUE, ...)
Arguments
- data
A
data.frame
containing the columns specified by thetruth
andestimate
arguments.- ...
Not currently used.
- truth
The column identifier for the true results (that is
numeric
). This should be an unquoted column name although this argument is passed by expression and supports quasiquotation (you can unquote column names). For_vec()
functions, anumeric
vector.- estimate
The column identifier for the predicted results (that is also
numeric
). As withtruth
this can be specified different ways but the primary method is to use an unquoted variable name. For_vec()
functions, anumeric
vector.- na_rm
A
logical
value indicating whetherNA
values should be stripped before the computation proceeds.
Value
A tibble
with columns .metric
, .estimator
,
and .estimate
and 1 row of values.
For grouped data frames, the number of rows returned will be the same as the number of groups.
For press_vec()
, a single numeric
value (or NA
).
Details
The \(PRESS\) is only relevant for comparisons to other regression models with the same response variable units (Rasmunsen et al., 2009).
It estimates as follows: $$ PRESS = \sum_{i=1}^{n}{(sim_i - obs_i)^2} $$
where:
\(sim\) defines model simulations at time step \(i\)
\(obs\) defines model observations at time step \(i\)
Note
The $PRESS$ statistic is not appropriate for comparison of models having different transformations of response variable, e.g. linear regression and log-transformed linear regression (Helsel et al., 2020).
References
Rasmussen, P. P., Gray, J. R., Glysson, G. D. & Ziegler, A. C. Guidelines and procedures for computing time-series suspended-sediment concentrations and loads from in-stream turbidity-sensor and streamflow data. in U.S. Geological Survey Techniques and Methods book 3, chap. C4 53 (2009) https://pubs.usgs.gov/tm/tm3c4/.
Helsel, D. R., Hirsch, R. M., Ryberg, K. R., Archfield, S. A. & Gilroy, E. J. Statistical Methods in Water Resources. 484 (2020) doi:10.3133/tm4A3 .