Skip to contents

The tidyhydro package provides a set of commonly used metrics in hydrology (such as NSE, KGE, pBIAS) for use within a tidymodels infrastructure. Originally inspired by the yardstick and hydroGOF packages, this library is mainly written in C++ and provides a very quick estimation of desired goodness-of-fit criteria.

Additionally, you’ll find here a C++ implementation of lesser-known yet powerful metrics used in reports from the United States Geological Survey (USGS) and the National Environmental Monitoring Standards (NEMS) guidelines. Examples include PRESS (Prediction Error Sum of Squares), SFE (Standard Factorial Error), and MSPE (Model Standard Percentage Error) and others. Based on the equations from Helsel et al. (2020), Rasmunsen et al. (2008), Hicks et al. (2020) and etc. (see documentation for details).

Example

The tidyhydro package follows the philosophy of yardstick and provides S3 class methods for vectors and data frames. For example, one can estimate KGE, NSE or pBIAS for a data frame like this:

library(tidyhydro)
str(avacha)
#> Classes 'tbl_df', 'tbl' and 'data.frame':    365 obs. of  3 variables:
#>  $ date: Date, format: "2022-01-01" "2022-01-02" ...
#>  $ obs : num  76.2 76.2 76.3 76.3 76.4 76.4 76.5 76.5 76.6 76.6 ...
#>  $ sim : num  84.8 84.3 84 83.7 83.4 ...

kge(avacha, obs, sim)
#> # A tibble: 1 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 kge     standard       0.947

or create a metric_set and estimate several parameters at once like this:

hydro_metrics <- yardstick::metric_set(nse, pbias)

hydro_metrics(avacha, obs, sim)
#> # A tibble: 2 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 nse     standard      0.895 
#> 2 pbias   standard      0.0540

We do understand that sometimes one needs a qualitative interpretation of the model. Therefore, we populated some functions with a performance argument. When performance = TRUE, the metric interpretation will be returned according to Moriasi et al. (2015).

hydro_metrics(avacha, obs, sim, performance = TRUE)
#> # A tibble: 2 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>      <chr>    
#> 1 nse     standard   Excellent
#> 2 pbias   standard   Excellent

Installation

You can install the development version of tidyhydro from GitHub with:

# install.packages("pak")
pak::pak("atsyplenkov/tidyhydro")

Benchmarking

Since the package uses Rcpp in the background, it performs slightly faster than base R and other R packages (see benchmarks). This is particularly noticeable with large datasets:

set.seed(12234)
x <- runif(10^6)
y <- runif(10^6)

nse <- function(truth, estimate, na_rm = TRUE) {
  #fmt: skip
  1 - (sum((truth - estimate)^2, na.rm = na_rm) /
        sum((truth - mean(truth, na.rm = na_rm))^2, na.rm = na_rm))
}

bench::mark(
  tidyhydro = tidyhydro::nse_vec(truth = x, estimate = y),
  hydroGOF = hydroGOF::NSE(sim = y, obs = x),
  baseR = nse(truth = x, estimate = y),
  check = TRUE,
  relative = TRUE,
  filter_gc = FALSE,
  iterations = 50L
)
#> # A tibble: 3 × 6
#>   expression   min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <dbl>  <dbl>     <dbl>     <dbl>    <dbl>
#> 1 tidyhydro   1       1       22.7        NaN      NaN
#> 2 hydroGOF   15.2    19.1      1          Inf      Inf
#> 3 baseR       8.66   10.6      2.44       Inf      Inf

Code of Conduct

Please note that the tidyhydro project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

See also

  • hydroGOF - Goodness-of-fit functions for comparison of simulated and observed hydrological time series.
  • yardstick - tidy methods for models performance assessment.