Package 'ldc'

Title: Calculate and Plot Pollutant Load Duration Curves
Description: Load duration curves are a method for visualizing pollutant loads in freshwater streams based on the assumed relationship between streamflow and load. Functions are provided for calculating exceedance probabilities, pollutant loading, and plotting load duration curves.
Authors: Michael Schramm [aut, cre] , Texas Water Resources Institute [cph]
Maintainer: Michael Schramm <[email protected]>
License: MIT + file LICENSE
Version: 0.1.0
Built: 2024-08-11 04:51:16 UTC
Source: https://github.com/TxWRI/ldc

Help Index


Calculate annualized load duration curve

Description

Calculates the median annual ldc with confidence intervals.

Usage

calc_annual_ldc(
  .tbl,
  Q = NULL,
  C = NULL,
  Date = NULL,
  allowable_concentration = NULL,
  breaks = c(1, 0.8, 0.4, 0),
  labels = c("High Flows", "Medium Flows", "Low Flows"),
  conf_level = 0.9,
  estimator = 6,
  n = 500
)

Arguments

.tbl

data frame with at least three columns Q (discharge or flow), C (associated pollutant concentration), and Date.

Q

variable name in .tbl for discharge or flow. This must be of class 'units', typically with a units value of "ft^3/s".

C

variable name in .tbl for associated pollutant concentration at a given flow value. This must be of class 'units', typically with a units value of "mg/L" or "cfu/100mL".

Date

variable name in .tbl for the event Date. This variable must be of class 'Date'.

allowable_concentration

an object of class units specifying the allowable pollutant concentration.

breaks

a numeric vector of break points for flow categories. Must be of length of labels + 1. defaults to c(1, 0.8, 0.4, 0).

labels

labels for the categories specified by breaks.

conf_level

numeric, confidence level (default is 0.9) of the median interval at given exceedance probability.

estimator

one of c(5,6,7,8,9,"hd"). 6 is the default method correponding to the Weibull plotting position. Further details are provided in quantile. "hd" uses the Harrell-Davis Distribution-Free Quantile Estimator (see: hdquantile).

n

numeric, the length of generated probability points. Larger n may result in a slightly smoother curve at a cost of increased processing time. The probability points are used to generate the continuous sample quantiles types 5 to 9 (see quantile).

Details

The median annual ldc is calculated by computing the flow duration curve for each individual year in the dataset. Exceedance probabilities are calculated from the descending order of Daily Flows. By default, the Weibull plotting position is used:

p=P(Q>qi)=in+1p = P(Q > q_i) = \frac{i}{n+1}

where qi,i=1,2,...nq_i, i = 1, 2, ... n, is the i-th sorted streamflow value.

The median streamflow +/- chosen confidence interval is calculated at each exceedance probability. The load duration curve is calculated by multiplying the median streamflow by the allowable concentration and appropriate conversions.

Value

list of two tibbles (Q and C). Includes variables in .tbl and Daily_Flow_Volume (discharge volume), Daily_Load (pollutant sample volume), P_Exceedance (exeedance probability), Flow_Category (as defined by breaks and labels).

References

Vogel, Richard M., and Neil M. Fennessey. "Flow-duration curves. I: New interpretation and confidence intervals." Journal of Water Resources Planning and Management 120, no. 4 (1994): 485-504. doi:10.1061/(ASCE)0733-9496(1994)120:4(485)

Examples

# Basic example using built in Tres Palacios data
library(dplyr)
library(units)
# Format data
install_unit("cfu")
df <- as_tibble(tres_palacios) %>%
  ## filter data so this run quicker
  filter(!is.na(Indicator_Bacteria)) %>%
  ## flow must have units, here is is in cfs
  mutate(Flow = set_units(Flow, "ft^3/s")) %>%
  ## pollutant concentration must have units
  mutate(Indicator_Bacteria = set_units(Indicator_Bacteria, "cfu/100mL"))
# Calculate LDC

## specify the allowable concentration
allowable_concentration <- 126
## set the units
units(allowable_concentration) <- "cfu/100mL"
df_ldc <- calc_annual_ldc(df,
                   Q = Flow,
                   C = Indicator_Bacteria,
                   Date = Date,
                   allowable_concentration = allowable_concentration,
                   estimator = 5,
                   n = 1000)
df_ldc$Q

## cleanup
remove_unit("cfu")

Calculate load duration curve

Description

Calculates the period of record load duration curve from a data frame that includes mean daily flow and associated point measurements of pollutant concentration.

Usage

calc_ldc(
  .tbl,
  Q = NULL,
  C = NULL,
  allowable_concentration = NULL,
  breaks = c(1, 0.8, 0.4, 0),
  labels = c("High Flows", "Medium Flows", "Low Flows"),
  estimator = 6
)

Arguments

.tbl

data frame with at least two columns Q (discharge or flow) and C (associated pollutant concentration).

Q

variable name in .tbl for discharge or flow. This must have unit set, typically "ft^3/s".

C

variable name in .tbl for associated pollutant concentration at a given flow value. This must have a unit set, typically "mg/L" or "cfu/100mL".

allowable_concentration

an object of class units specifying the allowable pollutant concentration.

breaks

a numeric vector of break points for flow categories. Must be of length of labels + 1. defaults to c(1, 0.8, 0.4, 0).

labels

labels for the categories specified by breaks.

estimator

numeric, one of c(5,6,7,8,9). 6 is the default method correponding to the Weibull plotting position. Further details are provided in stats::quantile().

Details

The exceedance probability is calculated from the descending order of Daily Flows. By default, the Weibull plotting position is used:

p=P(Q>qi)=in+1p = P(Q > q_i) = \frac{i}{n+1}

where qi,i=1,2,...nq_i, i = 1, 2, ... n, is the i-th sorted streamflow value.

Value

object of class tibble. Includes variables in .tbl and Daily_Flow_Volume (discharge volume), Daily_Load (pollutant sample volume), P_Exceedance (exeedance probability), Flow_Category (as defined by breaks and labels).

Examples

# Basic example using built in Tres Palacios data
library(dplyr)
library(units)
# Format data
install_unit("cfu")
df <- as_tibble(tres_palacios) %>%
  ## filter data so this run quicker
  filter(!is.na(Indicator_Bacteria)) %>%
  ## flow must have units, here is is in cfs
  mutate(Flow = set_units(Flow, "ft^3/s")) %>%
  ## pollutant concentration must have units
  mutate(Indicator_Bacteria = set_units(Indicator_Bacteria, "cfu/100mL"))
# Calculate LDC

## specify the allowable concentration
allowable_concentration <- 126
## set the units
units(allowable_concentration) <- "cfu/100mL"
df_ldc <- calc_ldc(df,
                   Q = Flow,
                   C = Indicator_Bacteria,
                   allowable_concentration = allowable_concentration)
df_ldc

## cleanup
remove_unit("cfu")

Draw a load duration curve

Description

Creates a load duration curve visualization from the outputs of calc_ldc and summ_ldc as a ggplot object.

Usage

draw_ldc(
  .tbl_calc,
  .tbl_summ,
  y_lab = NULL,
  ldc_legend_name = "Allowable Load at State Water Quality Standard",
  measurement_name = "Measurement Value",
  measurement_shape = 21,
  measurement_color = "dodgerblue",
  measurement_alpha = 1,
  summary_name = "Summarized Measured Load",
  summary_stat_shape = 12,
  summary_stat_color = "red",
  label_nudge_y = 0,
  label_font_family = "Arial",
  label_font_size = 3,
  label_break = TRUE
)

Arguments

.tbl_calc

data frame object created by calc_ldc

.tbl_summ

data frame object created by summ_ldc

y_lab

optional string for y-axis label name, will be appended with units automatically. default is NULL.

ldc_legend_name

string, provides the name used for the allowable pollutant load line in the legend. required.

measurement_name

string, provides the name used for measured load values in the legend. required.

measurement_shape

aesthetic value passed to the layer plotting measured load values. defaults to 21.

measurement_color

aesthetic value passed to the layer plotting measured load values. defaults to "dodgerblue".

measurement_alpha

aesthetic value passed to the layer plotting measured load values. defaults to 1.

summary_name

string, provides the name used for summary statistic values in the legend. required.

summary_stat_shape

aesthetic value passed to the layer plotting summary statistic values. defaults to 12.

summary_stat_color

aesthetic value passed to the layer plotting summary statistic values. defaults to "red".

label_nudge_y

numeric value to vertically nudge flow category labels. If a log10 transformed scale is being used, a log value is probably appropriate for example log10(1000).

label_font_family

string specifying font family to use in flow category labels.

label_font_size

numeric value specifying font size to use in flow category labels.

label_break

logical, add line breaks to flow category labels. Labels will break at spaces.

Value

ggplot object

Examples

# Basic example using built in Tres Palacios data
library(dplyr)
library(units)
library(ggplot2)
# Format data
install_unit("cfu")
df <- as_tibble(tres_palacios) %>%
        ## filter data so this run quicker
        filter(!is.na(Indicator_Bacteria)) %>%
        ## flow must have units, here is is in cfs
        mutate(Flow = set_units(Flow, "ft^3/s")) %>%
        ## pollutant concentration must have units
        mutate(Indicator_Bacteria = set_units(Indicator_Bacteria, "cfu/100mL"))
# Calculate LDC

## specify the allowable concentration
allowable_concentration <- 126
## set the units
units(allowable_concentration) <- "cfu/100mL"
df_ldc <- calc_ldc(df,
                   Q = Flow,
                   C = Indicator_Bacteria,
                   allowable_concentration = allowable_concentration)

# Summarize LDC
df_sum <- summ_ldc(df_ldc,
                   Q = Flow,
                   C = Indicator_Bacteria,
                   Exceedance = P_Exceedance,
                   groups = Flow_Category,
                   method = "geomean")

# Create ggplot object
draw_ldc(df_ldc,
         df_sum,
         y_lab = expression(paste(italic("E. coli"))),
         label_nudge_y = log10(1000)) +
         scale_y_log10() +
         theme(legend.title = element_blank(),
               legend.direction = "vertical",
               legend.position = "bottom")

## cleanup
remove_unit("cfu")

Summarize load duration curve

Description

Calculates summary statistics for flow and pollutant concentrations for desired flow categories. Estimates "average" pollutant load per category based on average concentration times the median flow.

Usage

summ_ldc(.tbl, Q, C, Exceedance, groups, method = "geomean")

Arguments

.tbl

data frame, prefferably the output from calc_ldc().

Q

variable name in .tbl for discharge or flow. This must have unit set, typically "ft^3/s".

C

variable name in .tbl for associated pollutant concentration at a given flow value. This must have a unit set, typically "mg/L" or "cfu/100mL".

Exceedance

variable name in .tbl with flow/load exceedance probabilities.

groups

variable name in .tbl with categorized flow names.

method

string that describes the summary statistic used for the pollutant concentration. Must be one of c('geomean', 'mean', 'median').

Value

object of class tibble. Includes Flow Category grouping variable, median flow and exceedance values, geometric mean/mean/median pollutant concentration, and estimated average load based on median flow times the average pollutant concentration per flow category.

Examples

# Basic example using built in Tres Palacios data
library(dplyr)
library(units)
# Format data
install_unit("cfu")
df <- as_tibble(tres_palacios) %>%
        ## filter data so this run quicker
        filter(!is.na(Indicator_Bacteria)) %>%
        ## flow must have units, here is is in cfs
        mutate(Flow = set_units(Flow, "ft^3/s")) %>%
        ## pollutant concentration must have units
        mutate(Indicator_Bacteria = set_units(Indicator_Bacteria, "cfu/100mL"))
# Calculate LDC

## specify the allowable concentration
allowable_concentration <- 126
## set the units
units(allowable_concentration) <- "cfu/100mL"
df_ldc <- calc_ldc(df,
                   Q = Flow,
                   C = Indicator_Bacteria,
                   allowable_concentration = allowable_concentration)

# Summarize LDC
df_sum <- summ_ldc(df_ldc,
                   Q = Flow,
                   C = Indicator_Bacteria,
                   Exceedance = P_Exceedance,
                   groups = Flow_Category,
                   method = "geomean")
df_sum

## cleanup
remove_unit("cfu")

Mean daily flow and point E. coli bacteria measurements.

Description

A dataset containing the mean daily flow and E. coli bacteria concentrations on the Tres Palacios River from 2000 through 2020.

Usage

tres_palacios

Format

A data frame with 7671 rows and 4 variables:

site_no

USGS gage number

Date

Observation Date

Flow

Mean Daily Flow in cfs

Indicator_Bacteria

Bacteria concentration measured on the given day in MPN/100mL

Source

USGS NWIS https://waterdata.usgs.gov/nwis and TCEQ SWQM https://www.tceq.texas.gov/waterquality/monitoring