Title: | Tools for Exploring Regression Models with 'iNZight' |
---|---|
Description: | Provides a suite of functions to use with regression models, including summaries, residual plots, and factor comparisons. Used as part of the Model Fitting module of 'iNZight', a graphical user interface providing easy exploration and visualisation of data for students of statistics, available in both desktop and online versions. |
Authors: | Tom Elliott [aut, cre] |
Maintainer: | Tom Elliott <[email protected]> |
License: | GPL-3 |
Version: | 1.3.4 |
Built: | 2025-02-28 02:46:59 UTC |
Source: | https://github.com/inzightvit/inzightregression |
Obtain a quick model comparison matrix for a selection of models
compare_models(x, ...) ## Default S3 method: compare_models(x, ...) ## S3 method for class 'svyglm' compare_models(x, ...)
compare_models(x, ...) ## Default S3 method: compare_models(x, ...) ## S3 method for class 'svyglm' compare_models(x, ...)
x |
a regression model (lm, glm, svyglm, ...) |
... |
other models |
an 'inzmodelcomp' object containing model comparison statistics
compare_models(default)
: default method
compare_models(svyglm)
: method for survey GLMs
Tom Elliott
m0 <- lm(Sepal.Length ~ 1, data = iris) m1 <- lm(Sepal.Length ~ Sepal.Width, data = iris) m2 <- lm(Sepal.Length ~ Sepal.Width + Species, data = iris) compare_models(m0, m1, m2)
m0 <- lm(Sepal.Length ~ 1, data = iris) m1 <- lm(Sepal.Length ~ Sepal.Width, data = iris) m2 <- lm(Sepal.Length ~ Sepal.Width + Species, data = iris) compare_models(m0, m1, m2)
Computes confidence intervals for the pairwise differences between levels
of a factor, based off of stats::TukeyHSD
.
factorComp(fit, factor) ## S3 method for class 'inzfactorcomp' print(x, ...)
factorComp(fit, factor) ## S3 method for class 'inzfactorcomp' print(x, ...)
fit |
a lm/glm/svyglm object |
factor |
the name of the factor to compare |
x |
an |
... |
extra arguments for print (ignored) |
a factor level comparison object with estimates, CIs, and (adjusted) p-values
print(inzfactorcomp)
: print method for object of class inzfactorcomp
Tom Elliott
f <- lm(Sepal.Length ~ Sepal.Width + Species, data = iris) factorComp(f, "Species")
f <- lm(Sepal.Length ~ Sepal.Width + Species, data = iris) factorComp(f, "Species")
Produces an array of histograms to compare against the histogram of residuals for a fitted linear model.
histogramArray(x, n = 7, env = parent.frame())
histogramArray(x, n = 7, env = parent.frame())
x |
an |
n |
the number of additional histograms to plot alongside the original. |
env |
environment for finding data to bootstrap |
The histogram of the model x
appears in the top-left
position. For each of the other histograms, the fitted values of
x
are taken and normal random errors are added to these. The
normal residual standard errors have standard error equal to the
estimated residual standard error of x
. A model is then fitted
to this altered data and a histogram is produced.
No return value, called to generate plot.
David Banks, Tom Elliott
histogramArray(lm(Sepal.Length ~ Sepal.Width + Species, data = iris))
histogramArray(lm(Sepal.Length ~ Sepal.Width + Species, data = iris))
Produces a sample of QQ-plots based on the fitted values, overlaid by a QQ-plot of the original data.
iNZightQQplot(x, n = 5, env = parent.frame())
iNZightQQplot(x, n = 5, env = parent.frame())
x |
an |
n |
the number of sampled QQ plots to produce beneath the QQ plot of
|
env |
environment for finding data to bootstrap |
Multiple bootstrap models are generated from the fitted values of
the model, each with different random normal errors with standard
error equal to the estimated residual standard error from the
original model. These are plotted, and then overlaid by the QQ plot
from the original data.
This plot can be used to assess the assumption of normality in the
residuals for a linear regression model.
No return value, called to produce plot.
David Banks, Tom Elliott
fit <- lm(Volume ~ Height + Girth, data = trees) iNZightQQplot(fit)
fit <- lm(Volume ~ Height + Girth, data = trees) iNZightQQplot(fit)
The iNZight summary improves upon the base R summary output for fitted regression models. More information is provided and displayed in a more intuitive format. This function both creates and returns a summary object, as well as printing it.
iNZightSummary( x, method = "standard", reorder.factors = FALSE, digits = max(3, getOption("digits") - 3), symbolic.cor = x$symbolic.cor, signif.stars = getOption("show.signif.stars"), exclude = NULL, exponentiate.ci = FALSE, ... )
iNZightSummary( x, method = "standard", reorder.factors = FALSE, digits = max(3, getOption("digits") - 3), symbolic.cor = x$symbolic.cor, signif.stars = getOption("show.signif.stars"), exclude = NULL, exponentiate.ci = FALSE, ... )
x |
an object of class |
method |
one of either |
reorder.factors |
logical, if |
digits |
the number of significant digits to use when printing. |
symbolic.cor |
logical, if |
signif.stars |
logical, if |
exclude |
a character vector of names of variables to be excluded from the summary output (i.e., confounding variables). |
exponentiate.ci |
logical, if |
... |
further arguments passed to and from other methods. |
This summary function provides more information in the following ways:
Factor headers are now given. The base level for a factor is also listed with an estimate of 0. This is to make it clear what the base level of a factor is, rather than attempting to work out by deduction from what has already been printed.
The p-value of a factor is now given; this is the output from
Anova
, which calculates the p-value based off of
Type III sums of squares, rather than sequentially as done by
anova
.
Each level of a factor is indented by 2 characters for its label and its p-value to distinguish between a factor, and levels of a factor.
The labels for each level of an interaction are now just the levels of
the factor (separated by a .
), rather than being prepended with
the factor name also.
An object of class summary.lm
, summary.glm
, or
summary.svyglm
.
If any level is not observed in a factor, no p-values will be printed on all factors. This is because we cannot calculate Type III sums of squares when this is the case.
The fitted model currently requires that the data are stored in a
dataframe, which is pointed at by the data
argument to
lm
(or equivalent).
Simon Potter, Tom Elliott.
The model fitting functions lm
, glm
, and
summary
.
svyglm
in the survey
package.
Function coef
will extract the matrix of coefficients
with standard errors, t-statistics and p-values.
To calculate p-values for factors, use Anova
with
type III sums of squares.
m <- lm(Sepal.Length ~ ., data = iris) iNZightSummary(m) # exclude confounding variables for which you don't # need to know about their coefficients: iNZightSummary(m, exclude = "Sepal.Width")
m <- lm(Sepal.Length ~ ., data = iris) iNZightSummary(m) # exclude confounding variables for which you don't # need to know about their coefficients: iNZightSummary(m, exclude = "Sepal.Width")
inzplot method
Diagnostic Plots for Regression Models
## S3 method for class 'glm' inzplot(x, ..., env = parent.frame()) ## S3 method for class 'lm' inzplot( x, which = c("residual", "scale", "leverage", "cooks", "normal", "hist"), show.bootstraps = nrow(x$model) < 1e+05, label.id = 3L, col.smooth = "orangered", col.bs = "lightgreen", cook.levels = c(0.5, 1), col.cook = "pink", ..., bs.fits = NULL, env = parent.frame() )
## S3 method for class 'glm' inzplot(x, ..., env = parent.frame()) ## S3 method for class 'lm' inzplot( x, which = c("residual", "scale", "leverage", "cooks", "normal", "hist"), show.bootstraps = nrow(x$model) < 1e+05, label.id = 3L, col.smooth = "orangered", col.bs = "lightgreen", cook.levels = c(0.5, 1), col.cook = "pink", ..., bs.fits = NULL, env = parent.frame() )
x |
a regression model |
... |
additional arguments |
env |
the environment for evaluating things (e.g., bootstraps) |
which |
the type of plot to draw |
show.bootstraps |
logical, if |
label.id |
integer for the number of extreme points to label (with row id) |
col.smooth |
the colour of smoothers |
col.bs |
the colour of bootstrap (smoothers) |
cook.levels |
levels of the Cook's distance at which to draw contours. |
col.cook |
the colour of Cook's distance contours |
bs.fits |
a list of bootstrapped datasets |
A ggplot object with a plot method that will show the plot in the graphics device
inzplot(glm)
: Method for GLMs
There are several plot types available:
residual versus fitted
scale-location
residual versus leverage
Cook's distance
normal Q-Q
histogram array
forest plot
Tom Elliott
iris_fit <- lm(Sepal.Width ~ Sepal.Length, data = iris) inzplot(iris_fit) inzplot(iris_fit, which = "residual", show.bootstraps = FALSE)
iris_fit <- lm(Sepal.Width ~ Sepal.Length, data = iris) inzplot(iris_fit) inzplot(iris_fit, which = "residual", show.bootstraps = FALSE)
inzsummary method
Summary method for linear models
## S3 method for class 'lm' inzsummary(x, ..., env = parent.frame())
## S3 method for class 'lm' inzsummary(x, ..., env = parent.frame())
x |
an |
... |
additional arguments passed to |
env |
the environment for evaluating things (e.g., bootstraps) |
An object of class summary.lm
, summary.glm
, or
summary.svyglm
.
iNZightSummary
This function draws partial residual plots for a continuous explanatory variables in a given model.
partialResPlot( fit, varname, showBootstraps = nrow(fit$model) >= 30 & nrow(fit$model) < 4000, use.inzightplots = FALSE, env = parent.frame() ) allPartialResPlots(fit, ...)
partialResPlot( fit, varname, showBootstraps = nrow(fit$model) >= 30 & nrow(fit$model) < 4000, use.inzightplots = FALSE, env = parent.frame() ) allPartialResPlots(fit, ...)
fit |
an |
varname |
character, the name of an explanatory variable in the model |
showBootstraps |
logical, if |
use.inzightplots |
logical, if |
env |
environment where the data is stored for bootstrapping |
... |
additional arguments passed to 'partialResPlot' |
No return value, called for side-effect of producing a plot.
allPartialResPlots()
: Cycle through all partial residual plots
David Banks, Tom Elliott.
m <- lm(Sepal.Length ~ Sepal.Width + Petal.Width, data = iris) partialResPlot(m, "Sepal.Width") allPartialResPlots(lm(Sepal.Length ~ Sepal.Width + Petal.Width, data = iris))
m <- lm(Sepal.Length ~ Sepal.Width + Petal.Width, data = iris) partialResPlot(m, "Sepal.Width") allPartialResPlots(lm(Sepal.Length ~ Sepal.Width + Petal.Width, data = iris))
These plots are an extension of the original plots provided by
plot.lm
.
Six plots are currently available: residuals versus fitted,
Scale-Location of against
fitted values, residuals against leverages, Cook's distance, Normal
Q-Q plot and histogram of residuals.
Also provided is the summary plot which shows all diagnostic plots
arranged in a 2 by 3 grid. By default, this is shown first, then each
of the individual plots in turn.
plotlm6( x, which = 1:6, panel = if (add.smooth) panel.smooth else points, sub.caption = NULL, main = "", ask = prod(par("mfcol")) < length(which) && dev.interactive(), id.n = 3, labels.id = names(residuals(x)), cex.id = 0.75, qqline = TRUE, cook.levels = c(0.5, 1), add.smooth = getOption("add.smooth", TRUE), label.pos = c(4, 2), cex.caption = 1, showBootstraps = nrow(x$model) >= 30 && nrow(x$model) < 4000, use.inzightplots = FALSE, env = parent.frame(), ... )
plotlm6( x, which = 1:6, panel = if (add.smooth) panel.smooth else points, sub.caption = NULL, main = "", ask = prod(par("mfcol")) < length(which) && dev.interactive(), id.n = 3, labels.id = names(residuals(x)), cex.id = 0.75, qqline = TRUE, cook.levels = c(0.5, 1), add.smooth = getOption("add.smooth", TRUE), label.pos = c(4, 2), cex.caption = 1, showBootstraps = nrow(x$model) >= 30 && nrow(x$model) < 4000, use.inzightplots = FALSE, env = parent.frame(), ... )
x |
an |
which |
numeric, if a subset of the plots is required, specify a subset of
the numbers |
panel |
panel function. the useful alternative to |
sub.caption |
common title. Above the figures if there are more than one; used as
|
main |
title to each plot, in addition to |
ask |
logical, if |
id.n |
number of points to be labelled in each plot, starting with the most extreme. |
labels.id |
vector of labels, from which the labels for extreme plots will be
chosen. |
cex.id |
magnification of point labels. |
qqline |
logical, if |
cook.levels |
levels of the Cook's distance at which to draw contours. |
add.smooth |
logical, if |
label.pos |
positioning of labels, for the left half and right half of the graph respectively, for plots 1–3. |
cex.caption |
controls the size of |
showBootstraps |
logical, if |
use.inzightplots |
logical, if set to |
env |
environment for performing bootstrap simulations (i.e., to find the dataset!) |
... |
other arguments to be passed to through to plotting functions. |
For the residuals versus fitted values plot, we add bootstrapped
smoothers to illustrate variance. The smoother is also added to the
Scale-Location plot.
The Normal Q-Q and histogram plots are taken from the normcheck
function in the s20x
package.
No return value; called for the side-effect of producing a plot.
Simon Potter, David Banks, Tom Elliott.
m <- lm(Sepal.Length ~ Sepal.Width + Petal.Width, data = iris) plotlm6(m, which = 1) # the summary grid: plotlm6(m, which = 7) # the default cycles through all 6 plots plotlm6(m)
m <- lm(Sepal.Length ~ Sepal.Width + Petal.Width, data = iris) plotlm6(m, which = 1) # the summary grid: plotlm6(m, which = 7) # the default cycles through all 6 plots plotlm6(m)
A modified 'poly()' function that allows for missing values.
Poly(x, degree = 1, coefs = NULL, raw = FALSE, ...)
Poly(x, degree = 1, coefs = NULL, raw = FALSE, ...)
x |
variable to convert to matrix |
degree |
degree of polynomial |
coefs |
pass to poly() function |
raw |
pass to poly() function |
... |
more arguments for the poly() function |
Credit goes to whoever posted this online first (google search if you must find it!)
a matrix, with NAs in the missing rows
Tom Elliott
Poly(rnorm(100), degree = 2L) # handles missing values: iris.na <- iris iris.na$Sepal.Length[c(5, 10)] <- NA lm(Sepal.Width ~ Poly(Sepal.Length, 2L), data = iris.na) # stats::poly() produces an error in this case: # lm(Sepal.Width ~ poly(Sepal.Length, 2L), data = iris.na)
Poly(rnorm(100), degree = 2L) # handles missing values: iris.na <- iris iris.na$Sepal.Length[c(5, 10)] <- NA lm(Sepal.Width ~ Poly(Sepal.Length, 2L), data = iris.na) # stats::poly() produces an error in this case: # lm(Sepal.Width ~ poly(Sepal.Length, 2L), data = iris.na)