Package 'quantileCI' reference manual

Title:	Confidence Intervals for Quantiles
Description:	Computes exact and interpolated confidence intervals for population quantiles based on a single independent and identically distributed sample. The package implements different methods for computing the confidence intervals and also provides functionality to check the coverage of the methods.
Authors:	Michael Höhle [aut, cre]
Maintainer:	Michael Höhle <[email protected]>
License:	GPL-3
Version:	0.1
Built:	2025-02-12 02:54:25 UTC
Source:	https://github.com/mhoehle/quantileCI

Water Monitoring Sample from Flint, Michigan, 2015

Description

The data correspond to the original January as well as the revised June version of the 2015 "Lead and Copper Report and Consumer Notice of Lead Result" report. Data are taken from the sources stated below, not the original report. Note that a lead level of zero means that the measurement was below the detection limit.

Usage

data(flint)
data(flint)

Format

A data frame with 71 rows and 2 variables:

lead: The measured lead concentration in the tap water in parts per billion (ppb = mg/L) as now also made explicit using the units package.
exclude: Logical indicating if the observations was removed by the authorities or not. In total two observations were removed as explained in the references.

References

Langkjær-Bain, R. (2017), The murky tale of Flint's deceptive water data. Significance, 14: 16–21. https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1740-9713.2017.01016.x.

Quantiles and the Flint water crisis (2017), Wicklin, R, http://blogs.sas.com/content/iml/2017/05/17/quantiles-flint-water-crisis.html

Youtube video by Michigan Radio explaining the computation of the quantile https://www.youtube.com/watch?v=9pql00zr700

Beware the Argument: The Flint Water Crisis and Quantiles (2016), Höhle M, https://staff.math.su.se/hoehle/blog/2017/06/18/quantiles.html

Two-sided confidence interval method for the median by the method of Hettmansperger & Sheather (1991)

Description

Two-sided confidence interval method for the median by the method of Hettmansperger & Sheather (1991)

Usage

median_confint_hs(
  x,
  conf.level = 0.95,
  x_is_sorted = FALSE,
  interpolate = TRUE
)
median_confint_hs(
  x,
  conf.level = 0.95,
  x_is_sorted = FALSE,
  interpolate = TRUE
)

Arguments

`x`	vector of observations
`conf.level`	A conf.level * 100% confidence interval is computed
`x_is_sorted`	Boolean (Default: FALSE) to safe sorting x, if it is already sorted. This is merely for speed reasons in situations where it is more efficient to only sort x once.
`interpolate`	Boolean (Default: TRUE) stating whether to interpolate the order statistics. If no interpolation is selected then this is just the standard exact procedure based on the order statistics. Note: This procedure is conservative (i.e. coverage is usualler larger than the nominal conf.level and hence the interval is actually in general too large).

Details

The interpolation procedure suggested by Hettmansperger and Sheather (1986) for the median is applied to the order statistic.

Value

A vector of length two containing the lower and upper limit of the confidence interval

References

Hettmansperger TP and Sheather SJ (1986), Confidence intervals based on interpolated order statistics, Statistics and Probability Letters, 4, p. 75-79.

Examples

set.seed(123)
x <- rnorm(25)
median_confint_hs(x=x, conf.level=0.95, interpolate=TRUE)
set.seed(123)
x <- rnorm(25)
median_confint_hs(x=x, conf.level=0.95, interpolate=TRUE)

Computing the coverage of different confidence interval methods for quantiles by Monte Carlo integration.

Description

Computing the coverage of different confidence interval methods for quantiles by Monte Carlo integration.

Usage

qci_coverage_one_sim(
  qci_fun,
  n,
  rfunc = rnorm,
  qfunc = qnorm,
  p = 0.5,
  conf.level = 0.95,
  ...
)
qci_coverage_one_sim(
  qci_fun,
  n,
  rfunc = rnorm,
  qfunc = qnorm,
  p = 0.5,
  conf.level = 0.95,
  ...
)

Arguments

`qci_fun`	Function which given n, p and conf.level computed a set of different confidence intervals. Should return a matrix of dimension 2 x (no. of methods) which contains the lower and upper bound of each confidence interval method.
`n`	Size of the sample to generate in the simulation
`rfunc`	Function for generating the samples
`qfunc`	Quantile function for computing the true quantile
`p`	The quantile of interest 0 <= p <= 1
`conf.level`	conf.level * 100% two-sided confidence intervals are computed
`...`	Additional arguments passed to `rfunc` and `qfunc`

Value

A vector of Booleans of length (no. methods) stating if each method contains the true value or not

Examples

##Function to compute different methods on same x.
quantile_confints <- function(x, p, conf.level, x_is_sorted=FALSE) {
  if (!x_is_sorted) { x <- sort(x)}

  ##Compute the various confidence intervals as above
  res <- data.frame(
    nyblom_exact=quantileCI::quantile_confint_nyblom(x=x, p=p, conf.level=conf.level,
                                                 x_is_sorted=TRUE, interpolate=FALSE),
    nyblom_interp=quantileCI::quantile_confint_nyblom(x=x, p=p, conf.level=conf.level,
                                                 x_is_sorted=TRUE, interpolate=TRUE),
    boot=quantileCI::quantile_confint_boot(x, p=p, conf.level=conf.level, R=999)
  )
  if (p == 0.5) {
    res$hs_interp = quantileCI::median_confint_hs(x=x, conf.level=conf.level,
                                                 x_is_sorted=TRUE, interpolate=TRUE)
  }
  return(res)
}

## One run of the simulation function
quantileCI::qci_coverage_one_sim(qci_fun=quantile_confints, n=100,p=0.5,conf.level=0.95)

## Several runs, calculate row means to get coverage by sampling
res <- sapply(1L:10L, function(i) {
  quantileCI::qci_coverage_one_sim(qci_fun=quantile_confints, n=100,p=0.5,conf.level=0.95)
})
res
##Function to compute different methods on same x.
quantile_confints <- function(x, p, conf.level, x_is_sorted=FALSE) {
  if (!x_is_sorted) { x <- sort(x)}

  ##Compute the various confidence intervals as above
  res <- data.frame(
    nyblom_exact=quantileCI::quantile_confint_nyblom(x=x, p=p, conf.level=conf.level,
                                                 x_is_sorted=TRUE, interpolate=FALSE),
    nyblom_interp=quantileCI::quantile_confint_nyblom(x=x, p=p, conf.level=conf.level,
                                                 x_is_sorted=TRUE, interpolate=TRUE),
    boot=quantileCI::quantile_confint_boot(x, p=p, conf.level=conf.level, R=999)
  )
  if (p == 0.5) {
    res$hs_interp = quantileCI::median_confint_hs(x=x, conf.level=conf.level,
                                                 x_is_sorted=TRUE, interpolate=TRUE)
  }
  return(res)
}

## One run of the simulation function
quantileCI::qci_coverage_one_sim(qci_fun=quantile_confints, n=100,p=0.5,conf.level=0.95)

## Several runs, calculate row means to get coverage by sampling
res <- sapply(1L:10L, function(i) {
  quantileCI::qci_coverage_one_sim(qci_fun=quantile_confints, n=100,p=0.5,conf.level=0.95)
})
res

Confidence interval method for a given quantile based on the basic bootstrap and using the percentile method.

Description

Confidence interval method for a given quantile based on the basic bootstrap and using the percentile method.

Usage

quantile_confint_boot(x, p, conf.level = 0.95, R = 999, type = 7)
quantile_confint_boot(x, p, conf.level = 0.95, R = 999, type = 7)

Arguments

`x`	vector of observations
`p`	quantile of interest
`conf.level`	A conf.level * 100% confidence interval is computed
`R`	number of replications to use in the bootstrap (Default: 999)
`type`	Type of empirical quantile estimation procedure, @seealso the `quantile` function.

Details

Basic bootstrap with the confidence interval computed based on the percentile method.

Value

A vector of length two containing the lower and upper limit of the two-sided confidence interval.

Examples

set.seed(123)
x <- rnorm(25)
quantile_confint_boot(x=x, p=0.8, conf.level=0.95, R=999)
set.seed(123)
x <- rnorm(25)
quantile_confint_boot(x=x, p=0.8, conf.level=0.95, R=999)

Standard exact two-sided quantile confidence interval based on the binomial distribution

Description

Standard exact two-sided quantile confidence interval based on the binomial distribution

Usage

quantile_confint_exact(
  x,
  p,
  conf.level = 0.95,
  x_is_sorted = FALSE,
  fix_interval = TRUE
)
quantile_confint_exact(
  x,
  p,
  conf.level = 0.95,
  x_is_sorted = FALSE,
  fix_interval = TRUE
)

Arguments

`x`	vector of observations
`p`	quantile of interest, $0 \leq p \leq 1$
`conf.level`	A `conf.level` * 100% confidence interval is computed
`x_is_sorted`	Boolean (Default: FALSE) to safe sorting x, if it is already sorted. This is merely for speed reasons in situations where it is more efficient to only sort x once.
`fix_interval`	Boolean (Default: TRUE) For the case with no interpolation, try to extend interval upwards if coverage is too little.

Details

This function is a pure call-through to the Nyblom function with interpolate=FALSE.

Value

A vector of length two containing the lower and upper limit of the confidence interval

References

Nyblom J, Note in interpolated order statistics, Statistics and Probability Letters 14, p. 129-131.

Examples

set.seed(123)
x <- rnorm(25)
quantile_confint_exact(x=x, p=0.8, conf.level=0.95)
set.seed(123)
x <- rnorm(25)
quantile_confint_exact(x=x, p=0.8, conf.level=0.95)

Two-sided quantile confidence interval based on interpolating the order statistic as suggested in Nyblom (1991)

Description

Two-sided quantile confidence interval based on interpolating the order statistic as suggested in Nyblom (1991)

Usage

quantile_confint_nyblom(
  x,
  p,
  conf.level = 0.95,
  x_is_sorted = FALSE,
  interpolate = TRUE,
  fix_interval = TRUE
)
quantile_confint_nyblom(
  x,
  p,
  conf.level = 0.95,
  x_is_sorted = FALSE,
  interpolate = TRUE,
  fix_interval = TRUE
)

Arguments

`x`	vector of observations
`p`	quantile of interest, $0 \leq p \leq 1$
`conf.level`	A conf.level * 100% confidence interval is computed
`x_is_sorted`	Boolean (Default: FALSE) indicating if `x` is already sorted and, hence, it is not necessary to sort it again. This is merely for speed reasons in situations where it is more efficient to only sort x once.
`interpolate`	Boolean (Default: TRUE) stating whether to interpolate the order statistics. If no interpolation is selected then this is just the standard exact procedure based on the order statistics. Note: This procedure is conservative (i.e. coverage is usualler larger than the nominal conf.level and hence the interval is actually in general too large).
`fix_interval`	Boolean (Default: TRUE) For the case with no interpolation, try to extend interval upwards if coverage is too little.

Details

The interpolation procedure suggested by Nyblom (1992), which extends work by Hettmansperger and Sheather (1986) for the median is applied to the order statistic.

Value

A vector of length two containing the lower and upper limit of the confidence interval

References

Nyblom J, Note in interpolated order statistics, Statistics and Probability Letters 14, p. 129-131.

Examples

set.seed(123)
x <- rnorm(25)
quantile_confint_nyblom(x=x, p=0.8, conf.level=0.95, interpolate=TRUE)
set.seed(123)
x <- rnorm(25)
quantile_confint_nyblom(x=x, p=0.8, conf.level=0.95, interpolate=TRUE)

Package 'quantileCI'

Help Index

Water Monitoring Sample from Flint, Michigan, 2015

Description

Usage

Format

References

Two-sided confidence interval method for the median by the method of Hettmansperger & Sheather (1991)

Description

Usage

Arguments

Details

Value

References

Examples

Computing the coverage of different confidence interval methods for quantiles by Monte Carlo integration.

Description

Usage

Arguments

Value

Examples

Confidence interval method for a given quantile based on the basic bootstrap and using the percentile method.

Description

Usage

Arguments

Details

Value

Examples

Standard exact two-sided quantile confidence interval based on the binomial distribution

Description

Usage

Arguments

Details

Value

References

Examples

Two-sided quantile confidence interval based on interpolating the order statistic as suggested in Nyblom (1991)

Description

Usage

Arguments

Details

Value

References

Examples