| Title: | Tools for Exploring Regression Models with 'iNZight' |
|---|---|
| Description: | Provides a suite of functions to use with regression models, including summaries, residual plots, and factor comparisons. Used as part of the Model Fitting module of 'iNZight', a graphical user interface providing easy exploration and visualisation of data for students of statistics, available in both desktop and online versions. |
| Authors: | Tom Elliott [aut, cre] (ORCID: <https://orcid.org/0000-0002-7815-6318>), Ken Deng [aut], Simon Potter [aut], David Banks [aut], Danny Chang [ctb] |
| Maintainer: | Tom Elliott <[email protected]> |
| License: | GPL-3 |
| Version: | 1.3.5 |
| Built: | 2026-05-21 10:09:31 UTC |
| Source: | https://github.com/inzightvit/inzightregression |
Checks for high correlation between predictor variables using Variance Inflation Factors (VIF). Handles both continuous predictors (standard VIF) and categorical predictors (Generalised VIF).
check_linear_independence(model, show_plot = c("pairs", "none"))check_linear_independence(model, show_plot = c("pairs", "none"))
model |
An object of class |
show_plot |
Character string indicating whether to show a pairs plot.
Options: |
A list of class "inzcheck" containing:
Name of the check ("Linear Independence").
The maximum VIF score detected in the model.
"OK" or "FAILED".
Suggestion to remove correlated variables if failed.
Verifies that the relationship between predictors and the response is linear. Uses the Ramsey RESET test for statistical verification and a Residuals vs Fitted plot for visual inspection.
check_linearity(model, test = c("reset"), show_plot = c("resid", "none"))check_linearity(model, test = c("reset"), show_plot = c("resid", "none"))
model |
An object of class |
test |
Character string indicating the test method.
Default is |
show_plot |
Character string indicating visualisation preference.
Options: |
A list of class "inzcheck" containing:
Name of the check ("Linearity").
Name of the test performed.
P-value from the selected test.
"OK" or "FAILED".
Suggestion to add polynomial terms if failed.
A generic wrapper that detects the model type (Linear vs. Generalised Linear) and dispatches the appropriate diagnostic checks.
check_model(model, ...)check_model(model, ...)
model |
A model object (e.g., from |
... |
Additional arguments passed to the specific model checker
(e.g., |
A list of results from the performed checks (invisible).
Placeholder function for future GLM diagnostics.
check_model_glm(model, ...)check_model_glm(model, ...)
model |
A model object (e.g., from |
... |
Additional arguments passed to the specific model checker
(e.g., |
The master wrapper function that runs a full diagnostic suite on a linear model. It checks assumptions in a hierarchical order: 1. Linear Independence (Multicollinearity) 2. Linearity 3. Constant Variance 4. Normality of Residuals
check_model_lm( model, checks = c("all", "linear_independence", "linearity", "variance", "normality") )check_model_lm( model, checks = c("all", "linear_independence", "linearity", "variance", "normality") )
model |
An object of class |
checks |
Character vector indicating which checks to run.
Choices: |
If an earlier check fails (e.g., Linearity), the process stops immediately to prevent misleading results in subsequent checks.
A list of results from all performed checks (invisible).
Assesses whether the residuals of a linear model follow a normal distribution. It combines a statistical test (Shapiro-Wilk or Kolmogorov-Smirnov) with diagnostic plots (Q-Q Plot or Histogram).
check_normality( model, test = c("shapiro", "ks"), show_plot = c("qq", "hist", "both") )check_normality( model, test = c("shapiro", "ks"), show_plot = c("qq", "hist", "both") )
model |
An object of class |
test |
Character string indicating which statistical test to use.
Options: |
show_plot |
Character string indicating visualisation preference.
Options: |
A list of class "inzcheck" containing:
Name of the check ("Normality").
Name of the test performed.
P-value from the selected test.
"OK" or "FAILED" based on user decision.
Suggested fix action if the check fails.
Checks if the variance of the error terms is constant across all levels of the independent variables. It combines a statistical test (Breusch-Pagan or White) with diagnostic plots (Residuals vs Fitted or Scale-Location plot). If the check fails, it suggests a Box-Cox transformation.
check_variance( model, test = c("bp", "white"), show_plot = c("residual", "scale", "none") )check_variance( model, test = c("bp", "white"), show_plot = c("residual", "scale", "none") )
model |
An object of class |
test |
Character string indicating the test method.
Options: |
show_plot |
Character string indicating visualisation preference.
Options: |
A list of class "inzcheck" containing:
Name of the check ("Constant Variance").
Name of the test performed.
P-value from the selected test.
"OK" or "FAILED".
Specific Box-Cox suggestion (e.g., "Log Transformation") if failed.
Obtain a quick model comparison matrix for a selection of models
compare_models(x, ...) ## Default S3 method: compare_models(x, ...) ## S3 method for class 'svyglm' compare_models(x, ...)compare_models(x, ...) ## Default S3 method: compare_models(x, ...) ## S3 method for class 'svyglm' compare_models(x, ...)
x |
a regression model (lm, glm, svyglm, ...) |
... |
other models |
an 'inzmodelcomp' object containing model comparison statistics
compare_models(default): default method
compare_models(svyglm): method for survey GLMs
Tom Elliott
m0 <- lm(Sepal.Length ~ 1, data = iris) m1 <- lm(Sepal.Length ~ Sepal.Width, data = iris) m2 <- lm(Sepal.Length ~ Sepal.Width + Species, data = iris) compare_models(m0, m1, m2)m0 <- lm(Sepal.Length ~ 1, data = iris) m1 <- lm(Sepal.Length ~ Sepal.Width, data = iris) m2 <- lm(Sepal.Length ~ Sepal.Width + Species, data = iris) compare_models(m0, m1, m2)
Computes confidence intervals for the pairwise differences between levels
of a factor, based off of stats::TukeyHSD.
factorComp(fit, factor) ## S3 method for class 'inzfactorcomp' print(x, ...)factorComp(fit, factor) ## S3 method for class 'inzfactorcomp' print(x, ...)
fit |
a lm/glm/svyglm object |
factor |
the name of the factor to compare |
x |
an |
... |
extra arguments for print (ignored) |
a factor level comparison object with estimates, CIs, and (adjusted) p-values
print(inzfactorcomp): print method for object of class inzfactorcomp
Tom Elliott
f <- lm(Sepal.Length ~ Sepal.Width + Species, data = iris) factorComp(f, "Species")f <- lm(Sepal.Length ~ Sepal.Width + Species, data = iris) factorComp(f, "Species")
Produces an array of histograms to compare against the histogram of residuals for a fitted linear model.
histogramArray(x, n = 7, env = parent.frame())histogramArray(x, n = 7, env = parent.frame())
x |
an |
n |
the number of additional histograms to plot alongside the original. |
env |
environment for finding data to bootstrap |
The histogram of the model x appears in the top-left
position. For each of the other histograms, the fitted values of
x are taken and normal random errors are added to these. The
normal residual standard errors have standard error equal to the
estimated residual standard error of x. A model is then fitted
to this altered data and a histogram is produced.
No return value, called to generate plot.
David Banks, Tom Elliott
histogramArray(lm(Sepal.Length ~ Sepal.Width + Species, data = iris))histogramArray(lm(Sepal.Length ~ Sepal.Width + Species, data = iris))
Produces a sample of QQ-plots based on the fitted values, overlaid by a QQ-plot of the original data.
iNZightQQplot(x, n = 5, env = parent.frame())iNZightQQplot(x, n = 5, env = parent.frame())
x |
an |
n |
the number of sampled QQ plots to produce beneath the QQ plot of
|
env |
environment for finding data to bootstrap |
Multiple bootstrap models are generated from the fitted values of
the model, each with different random normal errors with standard
error equal to the estimated residual standard error from the
original model. These are plotted, and then overlaid by the QQ plot
from the original data.
This plot can be used to assess the assumption of normality in the
residuals for a linear regression model.
No return value, called to produce plot.
David Banks, Tom Elliott
fit <- lm(Volume ~ Height + Girth, data = trees) iNZightQQplot(fit)fit <- lm(Volume ~ Height + Girth, data = trees) iNZightQQplot(fit)
The iNZight summary improves upon the base R summary output for fitted regression models. More information is provided and displayed in a more intuitive format. This function both creates and returns a summary object, as well as printing it.
iNZightSummary( x, method = "standard", reorder.factors = FALSE, digits = max(3, getOption("digits") - 3), symbolic.cor = x$symbolic.cor, signif.stars = getOption("show.signif.stars"), exclude = NULL, exponentiate.ci = FALSE, ... )iNZightSummary( x, method = "standard", reorder.factors = FALSE, digits = max(3, getOption("digits") - 3), symbolic.cor = x$symbolic.cor, signif.stars = getOption("show.signif.stars"), exclude = NULL, exponentiate.ci = FALSE, ... )
x |
an object of class |
method |
one of either |
reorder.factors |
logical, if |
digits |
the number of significant digits to use when printing. |
symbolic.cor |
logical, if |
signif.stars |
logical, if |
exclude |
a character vector of names of variables to be excluded from the summary output (i.e., confounding variables). |
exponentiate.ci |
logical, if |
... |
further arguments passed to and from other methods. |
This summary function provides more information in the following ways:
Factor headers are now given. The base level for a factor is also listed with an estimate of 0. This is to make it clear what the base level of a factor is, rather than attempting to work out by deduction from what has already been printed.
The p-value of a factor is now given; this is the output from
Anova, which calculates the p-value based off of
Type III sums of squares, rather than sequentially as done by
anova.
Each level of a factor is indented by 2 characters for its label and its p-value to distinguish between a factor, and levels of a factor.
The labels for each level of an interaction are now just the levels of
the factor (separated by a .), rather than being prepended with
the factor name also.
An object of class summary.lm, summary.glm, or
summary.svyglm.
If any level is not observed in a factor, no p-values will be printed on all factors. This is because we cannot calculate Type III sums of squares when this is the case.
The fitted model currently requires that the data are stored in a
dataframe, which is pointed at by the data argument to
lm (or equivalent).
Simon Potter, Tom Elliott.
The model fitting functions lm, glm, and
summary.
svyglm in the survey package.
Function coef will extract the matrix of coefficients
with standard errors, t-statistics and p-values.
To calculate p-values for factors, use Anova with
type III sums of squares.
m <- lm(Sepal.Length ~ ., data = iris) iNZightSummary(m) # exclude confounding variables for which you don't # need to know about their coefficients: iNZightSummary(m, exclude = "Sepal.Width")m <- lm(Sepal.Length ~ ., data = iris) iNZightSummary(m) # exclude confounding variables for which you don't # need to know about their coefficients: iNZightSummary(m, exclude = "Sepal.Width")
inzplot method
Diagnostic Plots for Regression Models
## S3 method for class 'glm' inzplot(x, ..., env = parent.frame()) ## S3 method for class 'lm' inzplot( x, which = c("residual", "scale", "leverage", "cooks", "normal", "hist"), show.bootstraps = nrow(x$model) < 1e+05, label.id = 3L, col.smooth = "orangered", col.bs = "lightgreen", cook.levels = c(0.5, 1), col.cook = "pink", ..., bs.fits = NULL, env = parent.frame() )## S3 method for class 'glm' inzplot(x, ..., env = parent.frame()) ## S3 method for class 'lm' inzplot( x, which = c("residual", "scale", "leverage", "cooks", "normal", "hist"), show.bootstraps = nrow(x$model) < 1e+05, label.id = 3L, col.smooth = "orangered", col.bs = "lightgreen", cook.levels = c(0.5, 1), col.cook = "pink", ..., bs.fits = NULL, env = parent.frame() )
x |
a regression model |
... |
additional arguments |
env |
the environment for evaluating things (e.g., bootstraps) |
which |
the type of plot to draw |
show.bootstraps |
logical, if |
label.id |
integer for the number of extreme points to label (with row id) |
col.smooth |
the colour of smoothers |
col.bs |
the colour of bootstrap (smoothers) |
cook.levels |
levels of the Cook's distance at which to draw contours. |
col.cook |
the colour of Cook's distance contours |
bs.fits |
a list of bootstrapped datasets |
A ggplot object with a plot method that will show the plot in the graphics device
inzplot(glm): Method for GLMs
There are several plot types available:
residual versus fitted
scale-location
residual versus leverage
Cook's distance
normal Q-Q
histogram array
forest plot (only supported on R >= 4.3)
Tom Elliott
iris_fit <- lm(Sepal.Width ~ Sepal.Length, data = iris) inzplot(iris_fit) inzplot(iris_fit, which = "residual", show.bootstraps = FALSE)iris_fit <- lm(Sepal.Width ~ Sepal.Length, data = iris) inzplot(iris_fit) inzplot(iris_fit, which = "residual", show.bootstraps = FALSE)
inzsummary method
Summary method for linear models
## S3 method for class 'lm' inzsummary(x, ..., env = parent.frame())## S3 method for class 'lm' inzsummary(x, ..., env = parent.frame())
x |
an |
... |
additional arguments passed to |
env |
the environment for evaluating things (e.g., bootstraps) |
An object of class summary.lm, summary.glm, or
summary.svyglm.
iNZightSummary
This function draws partial residual plots for a continuous explanatory variables in a given model.
partialResPlot( fit, varname, showBootstraps = nrow(fit$model) >= 30 & nrow(fit$model) < 4000, use.inzightplots = FALSE, env = parent.frame() ) allPartialResPlots(fit, ...)partialResPlot( fit, varname, showBootstraps = nrow(fit$model) >= 30 & nrow(fit$model) < 4000, use.inzightplots = FALSE, env = parent.frame() ) allPartialResPlots(fit, ...)
fit |
an |
varname |
character, the name of an explanatory variable in the model |
showBootstraps |
logical, if |
use.inzightplots |
logical, if |
env |
environment where the data is stored for bootstrapping |
... |
additional arguments passed to 'partialResPlot' |
No return value, called for side-effect of producing a plot.
allPartialResPlots(): Cycle through all partial residual plots
David Banks, Tom Elliott.
m <- lm(Sepal.Length ~ Sepal.Width + Petal.Width, data = iris) partialResPlot(m, "Sepal.Width") allPartialResPlots(lm(Sepal.Length ~ Sepal.Width + Petal.Width, data = iris))m <- lm(Sepal.Length ~ Sepal.Width + Petal.Width, data = iris) partialResPlot(m, "Sepal.Width") allPartialResPlots(lm(Sepal.Length ~ Sepal.Width + Petal.Width, data = iris))
These plots are an extension of the original plots provided by
plot.lm.
Six plots are currently available: residuals versus fitted,
Scale-Location of against
fitted values, residuals against leverages, Cook's distance, Normal
Q-Q plot and histogram of residuals.
Also provided is the summary plot which shows all diagnostic plots
arranged in a 2 by 3 grid. By default, this is shown first, then each
of the individual plots in turn.
plotlm6( x, which = 1:6, panel = if (add.smooth) panel.smooth else points, sub.caption = NULL, main = "", ask = prod(par("mfcol")) < length(which) && dev.interactive(), id.n = 3, labels.id = names(residuals(x)), cex.id = 0.75, qqline = TRUE, cook.levels = c(0.5, 1), add.smooth = getOption("add.smooth", TRUE), label.pos = c(4, 2), cex.caption = 1, showBootstraps = nrow(x$model) >= 30 && nrow(x$model) < 4000, use.inzightplots = FALSE, env = parent.frame(), ... )plotlm6( x, which = 1:6, panel = if (add.smooth) panel.smooth else points, sub.caption = NULL, main = "", ask = prod(par("mfcol")) < length(which) && dev.interactive(), id.n = 3, labels.id = names(residuals(x)), cex.id = 0.75, qqline = TRUE, cook.levels = c(0.5, 1), add.smooth = getOption("add.smooth", TRUE), label.pos = c(4, 2), cex.caption = 1, showBootstraps = nrow(x$model) >= 30 && nrow(x$model) < 4000, use.inzightplots = FALSE, env = parent.frame(), ... )
x |
an |
which |
numeric, if a subset of the plots is required, specify a subset of
the numbers |
panel |
panel function. the useful alternative to |
sub.caption |
common title. Above the figures if there are more than one; used as
|
main |
title to each plot, in addition to |
ask |
logical, if |
id.n |
number of points to be labelled in each plot, starting with the most extreme. |
labels.id |
vector of labels, from which the labels for extreme plots will be
chosen. |
cex.id |
magnification of point labels. |
qqline |
logical, if |
cook.levels |
levels of the Cook's distance at which to draw contours. |
add.smooth |
logical, if |
label.pos |
positioning of labels, for the left half and right half of the graph respectively, for plots 1–3. |
cex.caption |
controls the size of |
showBootstraps |
logical, if |
use.inzightplots |
logical, if set to |
env |
environment for performing bootstrap simulations (i.e., to find the dataset!) |
... |
other arguments to be passed to through to plotting functions. |
For the residuals versus fitted values plot, we add bootstrapped
smoothers to illustrate variance. The smoother is also added to the
Scale-Location plot.
The Normal Q-Q and histogram plots are taken from the normcheck
function in the s20x package.
No return value; called for the side-effect of producing a plot.
Simon Potter, David Banks, Tom Elliott.
m <- lm(Sepal.Length ~ Sepal.Width + Petal.Width, data = iris) plotlm6(m, which = 1) # the summary grid: plotlm6(m, which = 7) # the default cycles through all 6 plots plotlm6(m)m <- lm(Sepal.Length ~ Sepal.Width + Petal.Width, data = iris) plotlm6(m, which = 1) # the summary grid: plotlm6(m, which = 7) # the default cycles through all 6 plots plotlm6(m)
A modified 'poly()' function that allows for missing values.
Poly(x, degree = 1, coefs = NULL, raw = FALSE, ...)Poly(x, degree = 1, coefs = NULL, raw = FALSE, ...)
x |
variable to convert to matrix |
degree |
degree of polynomial |
coefs |
pass to poly() function |
raw |
pass to poly() function |
... |
more arguments for the poly() function |
Credit goes to whoever posted this online first (google search if you must find it!)
a matrix, with NAs in the missing rows
Tom Elliott
Poly(rnorm(100), degree = 2L) # handles missing values: iris.na <- iris iris.na$Sepal.Length[c(5, 10)] <- NA lm(Sepal.Width ~ Poly(Sepal.Length, 2L), data = iris.na) # stats::poly() produces an error in this case: # lm(Sepal.Width ~ poly(Sepal.Length, 2L), data = iris.na)Poly(rnorm(100), degree = 2L) # handles missing values: iris.na <- iris iris.na$Sepal.Length[c(5, 10)] <- NA lm(Sepal.Width ~ Poly(Sepal.Length, 2L), data = iris.na) # stats::poly() produces an error in this case: # lm(Sepal.Width ~ poly(Sepal.Length, 2L), data = iris.na)