Title: | Confidence Intervals for Quantiles |
---|---|
Description: | Computes exact and interpolated confidence intervals for population quantiles based on a single independent and identically distributed sample. The package implements different methods for computing the confidence intervals and also provides functionality to check the coverage of the methods. |
Authors: | Michael Höhle [aut, cre] |
Maintainer: | Michael Höhle <[email protected]> |
License: | GPL-3 |
Version: | 0.1 |
Built: | 2024-11-14 03:59:05 UTC |
Source: | https://github.com/mhoehle/quantileCI |
The data correspond to the original January as well as the revised June version of the 2015 "Lead and Copper Report and Consumer Notice of Lead Result" report. Data are taken from the sources stated below, not the original report. Note that a lead level of zero means that the measurement was below the detection limit.
data(flint)
data(flint)
A data frame with 71 rows and 2 variables:
The measured lead concentration in the tap water in parts per billion (ppb = mg/L) as now also made explicit using the units
package.
Logical indicating if the observations was removed by the authorities or not. In total two observations were removed as explained in the references.
Langkjær-Bain, R. (2017), The murky tale of Flint's deceptive water data. Significance, 14: 16–21. https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1740-9713.2017.01016.x.
Quantiles and the Flint water crisis (2017), Wicklin, R, http://blogs.sas.com/content/iml/2017/05/17/quantiles-flint-water-crisis.html
Youtube video by Michigan Radio explaining the computation of the quantile https://www.youtube.com/watch?v=9pql00zr700
Beware the Argument: The Flint Water Crisis and Quantiles (2016), Höhle M, https://staff.math.su.se/hoehle/blog/2017/06/18/quantiles.html
Two-sided confidence interval method for the median by the method of Hettmansperger & Sheather (1991)
median_confint_hs( x, conf.level = 0.95, x_is_sorted = FALSE, interpolate = TRUE )
median_confint_hs( x, conf.level = 0.95, x_is_sorted = FALSE, interpolate = TRUE )
x |
vector of observations |
conf.level |
A conf.level * 100% confidence interval is computed |
x_is_sorted |
Boolean (Default: FALSE) to safe sorting x, if it is already sorted. This is merely for speed reasons in situations where it is more efficient to only sort x once. |
interpolate |
Boolean (Default: TRUE) stating whether to interpolate the order statistics. If no interpolation is selected then this is just the standard exact procedure based on the order statistics. Note: This procedure is conservative (i.e. coverage is usualler larger than the nominal conf.level and hence the interval is actually in general too large). |
The interpolation procedure suggested by Hettmansperger and Sheather (1986) for the median is applied to the order statistic.
A vector of length two containing the lower and upper limit of the confidence interval
Hettmansperger TP and Sheather SJ (1986), Confidence intervals based on interpolated order statistics, Statistics and Probability Letters, 4, p. 75-79.
set.seed(123) x <- rnorm(25) median_confint_hs(x=x, conf.level=0.95, interpolate=TRUE)
set.seed(123) x <- rnorm(25) median_confint_hs(x=x, conf.level=0.95, interpolate=TRUE)
Computing the coverage of different confidence interval methods for quantiles by Monte Carlo integration.
qci_coverage_one_sim( qci_fun, n, rfunc = rnorm, qfunc = qnorm, p = 0.5, conf.level = 0.95, ... )
qci_coverage_one_sim( qci_fun, n, rfunc = rnorm, qfunc = qnorm, p = 0.5, conf.level = 0.95, ... )
qci_fun |
Function which given n, p and conf.level computed a set of different confidence intervals. Should return a matrix of dimension 2 x (no. of methods) which contains the lower and upper bound of each confidence interval method. |
n |
Size of the sample to generate in the simulation |
rfunc |
Function for generating the samples |
qfunc |
Quantile function for computing the true quantile |
p |
The quantile of interest 0 <= p <= 1 |
conf.level |
conf.level * 100% two-sided confidence intervals are computed |
... |
Additional arguments passed to |
A vector of Booleans of length (no. methods) stating if each method contains the true value or not
##Function to compute different methods on same x. quantile_confints <- function(x, p, conf.level, x_is_sorted=FALSE) { if (!x_is_sorted) { x <- sort(x)} ##Compute the various confidence intervals as above res <- data.frame( nyblom_exact=quantileCI::quantile_confint_nyblom(x=x, p=p, conf.level=conf.level, x_is_sorted=TRUE, interpolate=FALSE), nyblom_interp=quantileCI::quantile_confint_nyblom(x=x, p=p, conf.level=conf.level, x_is_sorted=TRUE, interpolate=TRUE), boot=quantileCI::quantile_confint_boot(x, p=p, conf.level=conf.level, R=999) ) if (p == 0.5) { res$hs_interp = quantileCI::median_confint_hs(x=x, conf.level=conf.level, x_is_sorted=TRUE, interpolate=TRUE) } return(res) } ## One run of the simulation function quantileCI::qci_coverage_one_sim(qci_fun=quantile_confints, n=100,p=0.5,conf.level=0.95) ## Several runs, calculate row means to get coverage by sampling res <- sapply(1L:10L, function(i) { quantileCI::qci_coverage_one_sim(qci_fun=quantile_confints, n=100,p=0.5,conf.level=0.95) }) res
##Function to compute different methods on same x. quantile_confints <- function(x, p, conf.level, x_is_sorted=FALSE) { if (!x_is_sorted) { x <- sort(x)} ##Compute the various confidence intervals as above res <- data.frame( nyblom_exact=quantileCI::quantile_confint_nyblom(x=x, p=p, conf.level=conf.level, x_is_sorted=TRUE, interpolate=FALSE), nyblom_interp=quantileCI::quantile_confint_nyblom(x=x, p=p, conf.level=conf.level, x_is_sorted=TRUE, interpolate=TRUE), boot=quantileCI::quantile_confint_boot(x, p=p, conf.level=conf.level, R=999) ) if (p == 0.5) { res$hs_interp = quantileCI::median_confint_hs(x=x, conf.level=conf.level, x_is_sorted=TRUE, interpolate=TRUE) } return(res) } ## One run of the simulation function quantileCI::qci_coverage_one_sim(qci_fun=quantile_confints, n=100,p=0.5,conf.level=0.95) ## Several runs, calculate row means to get coverage by sampling res <- sapply(1L:10L, function(i) { quantileCI::qci_coverage_one_sim(qci_fun=quantile_confints, n=100,p=0.5,conf.level=0.95) }) res
Confidence interval method for a given quantile based on the basic bootstrap and using the percentile method.
quantile_confint_boot(x, p, conf.level = 0.95, R = 999, type = 7)
quantile_confint_boot(x, p, conf.level = 0.95, R = 999, type = 7)
x |
vector of observations |
p |
quantile of interest |
conf.level |
A conf.level * 100% confidence interval is computed |
R |
number of replications to use in the bootstrap (Default: 999) |
type |
Type of empirical quantile estimation procedure, @seealso the |
Basic bootstrap with the confidence interval computed based on the percentile method.
A vector of length two containing the lower and upper limit of the two-sided confidence interval.
set.seed(123) x <- rnorm(25) quantile_confint_boot(x=x, p=0.8, conf.level=0.95, R=999)
set.seed(123) x <- rnorm(25) quantile_confint_boot(x=x, p=0.8, conf.level=0.95, R=999)
Standard exact two-sided quantile confidence interval based on the binomial distribution
quantile_confint_exact( x, p, conf.level = 0.95, x_is_sorted = FALSE, fix_interval = TRUE )
quantile_confint_exact( x, p, conf.level = 0.95, x_is_sorted = FALSE, fix_interval = TRUE )
x |
vector of observations |
p |
quantile of interest, |
conf.level |
A |
x_is_sorted |
Boolean (Default: FALSE) to safe sorting x, if it is already sorted. This is merely for speed reasons in situations where it is more efficient to only sort x once. |
fix_interval |
Boolean (Default: TRUE) For the case with no interpolation, try to extend interval upwards if coverage is too little. |
This function is a pure call-through to the Nyblom function
with interpolate=FALSE
.
A vector of length two containing the lower and upper limit of the confidence interval
Nyblom J, Note in interpolated order statistics, Statistics and Probability Letters 14, p. 129-131.
set.seed(123) x <- rnorm(25) quantile_confint_exact(x=x, p=0.8, conf.level=0.95)
set.seed(123) x <- rnorm(25) quantile_confint_exact(x=x, p=0.8, conf.level=0.95)
Two-sided quantile confidence interval based on interpolating the order statistic as suggested in Nyblom (1991)
quantile_confint_nyblom( x, p, conf.level = 0.95, x_is_sorted = FALSE, interpolate = TRUE, fix_interval = TRUE )
quantile_confint_nyblom( x, p, conf.level = 0.95, x_is_sorted = FALSE, interpolate = TRUE, fix_interval = TRUE )
x |
vector of observations |
p |
quantile of interest, |
conf.level |
A conf.level * 100% confidence interval is computed |
x_is_sorted |
Boolean (Default: FALSE) indicating if |
interpolate |
Boolean (Default: TRUE) stating whether to interpolate the order statistics. If no interpolation is selected then this is just the standard exact procedure based on the order statistics. Note: This procedure is conservative (i.e. coverage is usualler larger than the nominal conf.level and hence the interval is actually in general too large). |
fix_interval |
Boolean (Default: TRUE) For the case with no interpolation, try to extend interval upwards if coverage is too little. |
The interpolation procedure suggested by Nyblom (1992), which extends work by Hettmansperger and Sheather (1986) for the median is applied to the order statistic.
A vector of length two containing the lower and upper limit of the confidence interval
Nyblom J, Note in interpolated order statistics, Statistics and Probability Letters 14, p. 129-131.
set.seed(123) x <- rnorm(25) quantile_confint_nyblom(x=x, p=0.8, conf.level=0.95, interpolate=TRUE)
set.seed(123) x <- rnorm(25) quantile_confint_nyblom(x=x, p=0.8, conf.level=0.95, interpolate=TRUE)