Package 'iNZightRegression' reference manual

Package 'iNZightRegression'

Title:	Tools for Exploring Regression Models with 'iNZight'
Description:	Provides a suite of functions to use with regression models, including summaries, residual plots, and factor comparisons. Used as part of the Model Fitting module of 'iNZight', a graphical user interface providing easy exploration and visualisation of data for students of statistics, available in both desktop and online versions.
Authors:	Tom Elliott [aut, cre] , Simon Potter [aut], David Banks [aut], Danny Chang [ctb]
Maintainer:	Tom Elliott <[email protected]>
License:	GPL-3
Version:	1.3.4
Built:	2025-03-30 02:56:35 UTC
Source:	https://github.com/inzightvit/inzightregression

Title:

Tools for Exploring Regression Models with 'iNZight'

Description:

Provides a suite of functions to use with regression models, including summaries, residual plots, and factor comparisons. Used as part of the Model Fitting module of 'iNZight', a graphical user interface providing easy exploration and visualisation of data for students of statistics, available in both desktop and online versions.

Authors:

Tom Elliott [aut, cre]

, Simon Potter [aut], David Banks [aut], Danny Chang [ctb]

Maintainer:

Tom Elliott <[email protected]>

License:

GPL-3

Version:

1.3.4

Built:

2025-03-30 02:56:35 UTC

Source:

https://github.com/inzightvit/inzightregression

Help Index

Compare regression models using AIC and BIC.

Description

Obtain a quick model comparison matrix for a selection of models

Usage

compare_models(x, ...)

## Default S3 method:
compare_models(x, ...)

## S3 method for class 'svyglm'
compare_models(x, ...)
compare_models(x, ...)

## Default S3 method:
compare_models(x, ...)

## S3 method for class 'svyglm'
compare_models(x, ...)

Arguments

`x`	a regression model (lm, glm, svyglm, ...)
`...`	other models

Value

an 'inzmodelcomp' object containing model comparison statistics

Methods (by class)

compare_models(default): default method
compare_models(svyglm): method for survey GLMs

Author(s)

Tom Elliott

Examples

m0 <- lm(Sepal.Length ~ 1, data = iris)
m1 <- lm(Sepal.Length ~ Sepal.Width, data = iris)
m2 <- lm(Sepal.Length ~ Sepal.Width + Species, data = iris)
compare_models(m0, m1, m2)
m0 <- lm(Sepal.Length ~ 1, data = iris)
m1 <- lm(Sepal.Length ~ Sepal.Width, data = iris)
m2 <- lm(Sepal.Length ~ Sepal.Width + Species, data = iris)
compare_models(m0, m1, m2)

Compare factor levels

Description

Computes confidence intervals for the pairwise differences between levels of a factor, based off of stats::TukeyHSD.

Usage

factorComp(fit, factor)

## S3 method for class 'inzfactorcomp'
print(x, ...)
factorComp(fit, factor)

## S3 method for class 'inzfactorcomp'
print(x, ...)

Arguments

`fit`	a lm/glm/svyglm object
`factor`	the name of the factor to compare
`x`	an `inzfactorcomp` object
`...`	extra arguments for print (ignored)

Value

a factor level comparison object with estimates, CIs, and (adjusted) p-values

Functions

print(inzfactorcomp): print method for object of class inzfactorcomp

Author(s)

Tom Elliott

Examples

f <- lm(Sepal.Length ~ Sepal.Width + Species, data = iris)
factorComp(f, "Species")
f <- lm(Sepal.Length ~ Sepal.Width + Species, data = iris)
factorComp(f, "Species")

Histogram Array

Description

Produces an array of histograms to compare against the histogram of residuals for a fitted linear model.

Usage

histogramArray(x, n = 7, env = parent.frame())
histogramArray(x, n = 7, env = parent.frame())

Arguments

`x`	an `lm` or `svyglm` object.
`n`	the number of additional histograms to plot alongside the original.
`env`	environment for finding data to bootstrap

Details

The histogram of the model x appears in the top-left position. For each of the other histograms, the fitted values of x are taken and normal random errors are added to these. The normal residual standard errors have standard error equal to the estimated residual standard error of x. A model is then fitted to this altered data and a histogram is produced.

Value

No return value, called to generate plot.

Author(s)

David Banks, Tom Elliott

Examples

histogramArray(lm(Sepal.Length ~ Sepal.Width + Species, data = iris))
histogramArray(lm(Sepal.Length ~ Sepal.Width + Species, data = iris))

iNZight QQ Plot

Description

Produces a sample of QQ-plots based on the fitted values, overlaid by a QQ-plot of the original data.

Usage

iNZightQQplot(x, n = 5, env = parent.frame())
iNZightQQplot(x, n = 5, env = parent.frame())

Arguments

`x`	an `lm` or `svyglm` object (with `family = "Gaussian"`.
`n`	the number of sampled QQ plots to produce beneath the QQ plot of `x`.
`env`	environment for finding data to bootstrap

Details

Multiple bootstrap models are generated from the fitted values of the model, each with different random normal errors with standard error equal to the estimated residual standard error from the original model. These are plotted, and then overlaid by the QQ plot from the original data.

This plot can be used to assess the assumption of normality in the residuals for a linear regression model.

Value

No return value, called to produce plot.

Author(s)

David Banks, Tom Elliott

Examples

fit <- lm(Volume ~ Height + Girth, data = trees)
iNZightQQplot(fit)

fit <- lm(Volume ~ Height + Girth, data = trees)
iNZightQQplot(fit)

Informative Summary Information for Regression Models

Description

The iNZight summary improves upon the base R summary output for fitted regression models. More information is provided and displayed in a more intuitive format. This function both creates and returns a summary object, as well as printing it.

Usage

iNZightSummary(
  x,
  method = "standard",
  reorder.factors = FALSE,
  digits = max(3, getOption("digits") - 3),
  symbolic.cor = x$symbolic.cor,
  signif.stars = getOption("show.signif.stars"),
  exclude = NULL,
  exponentiate.ci = FALSE,
  ...
)
iNZightSummary(
  x,
  method = "standard",
  reorder.factors = FALSE,
  digits = max(3, getOption("digits") - 3),
  symbolic.cor = x$symbolic.cor,
  signif.stars = getOption("show.signif.stars"),
  exclude = NULL,
  exponentiate.ci = FALSE,
  ...
)

Arguments

`x`	an object of class `"lm"`, `"glm"` or `"svyglm"`, usually the result of a call to the corresponding function.
`method`	one of either `"standard"` or `"bootstrap"`. If `"bootstrap"`, then bootstrapped estimates and standard errors are calculated; otherwise, uses the standard estimates.
`reorder.factors`	logical, if `TRUE`, and there are factors present in the model, then the most common level of the factor is set to be the baseline.
`digits`	the number of significant digits to use when printing.
`symbolic.cor`	logical, if `TRUE`, print the correlations in a symbolic form (see `symnum`), rather than as numbers.
`signif.stars`	logical, if `TRUE`, ‘significance stars’ are printed for each coefficient.
`exclude`	a character vector of names of variables to be excluded from the summary output (i.e., confounding variables).
`exponentiate.ci`	logical, if `TRUE`, the exponential of the confidence intervals will be printed if appropriate (log/logit link or log transformed response)
`...`	further arguments passed to and from other methods.

Details

This summary function provides more information in the following ways:

Factor headers are now given. The base level for a factor is also listed with an estimate of 0. This is to make it clear what the base level of a factor is, rather than attempting to work out by deduction from what has already been printed.

The p-value of a factor is now given; this is the output from Anova, which calculates the p-value based off of Type III sums of squares, rather than sequentially as done by anova.

Each level of a factor is indented by 2 characters for its label and its p-value to distinguish between a factor, and levels of a factor.

The labels for each level of an interaction are now just the levels of the factor (separated by a .), rather than being prepended with the factor name also.

Value

An object of class summary.lm, summary.glm, or summary.svyglm.

Note

If any level is not observed in a factor, no p-values will be printed on all factors. This is because we cannot calculate Type III sums of squares when this is the case.

The fitted model currently requires that the data are stored in a dataframe, which is pointed at by the data argument to lm (or equivalent).

Author(s)

Simon Potter, Tom Elliott.

Examples

m <- lm(Sepal.Length ~ ., data = iris)
iNZightSummary(m)

# exclude confounding variables for which you don't
# need to know about their coefficients:
iNZightSummary(m, exclude = "Sepal.Width")
m <- lm(Sepal.Length ~ ., data = iris)
iNZightSummary(m)

# exclude confounding variables for which you don't
# need to know about their coefficients:
iNZightSummary(m, exclude = "Sepal.Width")

inzplot method

Description

inzplot method

Diagnostic Plots for Regression Models

Usage

## S3 method for class 'glm'
inzplot(x, ..., env = parent.frame())

## S3 method for class 'lm'
inzplot(
  x,
  which = c("residual", "scale", "leverage", "cooks", "normal", "hist"),
  show.bootstraps = nrow(x$model) < 1e+05,
  label.id = 3L,
  col.smooth = "orangered",
  col.bs = "lightgreen",
  cook.levels = c(0.5, 1),
  col.cook = "pink",
  ...,
  bs.fits = NULL,
  env = parent.frame()
)
## S3 method for class 'glm'
inzplot(x, ..., env = parent.frame())

## S3 method for class 'lm'
inzplot(
  x,
  which = c("residual", "scale", "leverage", "cooks", "normal", "hist"),
  show.bootstraps = nrow(x$model) < 1e+05,
  label.id = 3L,
  col.smooth = "orangered",
  col.bs = "lightgreen",
  cook.levels = c(0.5, 1),
  col.cook = "pink",
  ...,
  bs.fits = NULL,
  env = parent.frame()
)

Arguments

`x`	a regression model
`...`	additional arguments
`env`	the environment for evaluating things (e.g., bootstraps)
`which`	the type of plot to draw
`show.bootstraps`	logical, if `TRUE` bootstrap smoothers will be shown (defaults to `TRUE` if fewer than 100,000 observations)
`label.id`	integer for the number of extreme points to label (with row id)
`col.smooth`	the colour of smoothers
`col.bs`	the colour of bootstrap (smoothers)
`cook.levels`	levels of the Cook's distance at which to draw contours.
`col.cook`	the colour of Cook's distance contours
`bs.fits`	a list of bootstrapped datasets

Value

A ggplot object with a plot method that will show the plot in the graphics device

Functions

inzplot(glm): Method for GLMs

Plot types

There are several plot types available:

residual versus fitted
scale-location
residual versus leverage
Cook's distance
normal Q-Q
histogram array
forest plot

Author(s)

Tom Elliott

Examples

iris_fit <- lm(Sepal.Width ~ Sepal.Length, data = iris)
inzplot(iris_fit)
inzplot(iris_fit, which = "residual", show.bootstraps = FALSE)
iris_fit <- lm(Sepal.Width ~ Sepal.Length, data = iris)
inzplot(iris_fit)
inzplot(iris_fit, which = "residual", show.bootstraps = FALSE)

inzsummary method

Description

inzsummary method

Summary method for linear models

Usage

## S3 method for class 'lm'
inzsummary(x, ..., env = parent.frame())
## S3 method for class 'lm'
inzsummary(x, ..., env = parent.frame())

Arguments

`x`	an `lm`, `glm`, or `svyglm` object
`...`	additional arguments passed to `iNZightSummary`
`env`	the environment for evaluating things (e.g., bootstraps)

Value

An object of class summary.lm, summary.glm, or summary.svyglm.

Partial residual plot of continuous variable

Description

This function draws partial residual plots for a continuous explanatory variables in a given model.

Usage

partialResPlot(
  fit,
  varname,
  showBootstraps = nrow(fit$model) >= 30 & nrow(fit$model) < 4000,
  use.inzightplots = FALSE,
  env = parent.frame()
)

allPartialResPlots(fit, ...)
partialResPlot(
  fit,
  varname,
  showBootstraps = nrow(fit$model) >= 30 & nrow(fit$model) < 4000,
  use.inzightplots = FALSE,
  env = parent.frame()
)

allPartialResPlots(fit, ...)

Arguments

`fit`	an `lm`, `glm` or `svyglm` object.
`varname`	character, the name of an explanatory variable in the model
`showBootstraps`	logical, if `TRUE`, bootstrap smoothers will overlay the graph. By default this is `TRUE` if there are between 30 and 4000 observations in the model, otherwise it is `FALSE`.
`use.inzightplots`	logical, if `TRUE`, the iNZightPlots package will be used for plotting.
`env`	environment where the data is stored for bootstrapping
`...`	additional arguments passed to 'partialResPlot'

Value

No return value, called for side-effect of producing a plot.

Functions

allPartialResPlots(): Cycle through all partial residual plots

Author(s)

David Banks, Tom Elliott.

Examples

m <- lm(Sepal.Length ~ Sepal.Width + Petal.Width, data = iris)
partialResPlot(m, "Sepal.Width")


allPartialResPlots(lm(Sepal.Length ~ Sepal.Width + Petal.Width, data = iris))

m <- lm(Sepal.Length ~ Sepal.Width + Petal.Width, data = iris)
partialResPlot(m, "Sepal.Width")


allPartialResPlots(lm(Sepal.Length ~ Sepal.Width + Petal.Width, data = iris))

Extended Plot Diagnostics for (g)lm Models

Description

These plots are an extension of the original plots provided by plot.lm.

Six plots are currently available: residuals versus fitted, Scale-Location of $\sqrt{| residuals|}$ against fitted values, residuals against leverages, Cook's distance, Normal Q-Q plot and histogram of residuals.

Also provided is the summary plot which shows all diagnostic plots arranged in a 2 by 3 grid. By default, this is shown first, then each of the individual plots in turn.

Usage

plotlm6(
  x,
  which = 1:6,
  panel = if (add.smooth) panel.smooth else points,
  sub.caption = NULL,
  main = "",
  ask = prod(par("mfcol")) < length(which) && dev.interactive(),
  id.n = 3,
  labels.id = names(residuals(x)),
  cex.id = 0.75,
  qqline = TRUE,
  cook.levels = c(0.5, 1),
  add.smooth = getOption("add.smooth", TRUE),
  label.pos = c(4, 2),
  cex.caption = 1,
  showBootstraps = nrow(x$model) >= 30 && nrow(x$model) < 4000,
  use.inzightplots = FALSE,
  env = parent.frame(),
  ...
)
plotlm6(
  x,
  which = 1:6,
  panel = if (add.smooth) panel.smooth else points,
  sub.caption = NULL,
  main = "",
  ask = prod(par("mfcol")) < length(which) && dev.interactive(),
  id.n = 3,
  labels.id = names(residuals(x)),
  cex.id = 0.75,
  qqline = TRUE,
  cook.levels = c(0.5, 1),
  add.smooth = getOption("add.smooth", TRUE),
  label.pos = c(4, 2),
  cex.caption = 1,
  showBootstraps = nrow(x$model) >= 30 && nrow(x$model) < 4000,
  use.inzightplots = FALSE,
  env = parent.frame(),
  ...
)

Arguments

`x`	an `lm` object, typically the result of `lm` or `glm`. Can also take `svyglm` objects.
`which`	numeric, if a subset of the plots is required, specify a subset of the numbers `1:6`. `7` will produce a summary plot showing all of the plots arranged in a a grid. `1:6` will show the summary plot followed by each of the single plots one by one (default).
`panel`	panel function. the useful alternative to `points`, `panel.smooth` can be chosen by `add.smooth = TRUE`.
`sub.caption`	common title. Above the figures if there are more than one; used as `sub` (s.`title`) otherwise. If `NULL`, as by default, a possible abbreviated version of `deparse(x$call)` is used.
`main`	title to each plot, in addition to `caption`.
`ask`	logical, if `TRUE`, the user is asked before each plot, see `par(ask=.)`. Ignored when only one plot is being shown.
`id.n`	number of points to be labelled in each plot, starting with the most extreme.
`labels.id`	vector of labels, from which the labels for extreme plots will be chosen. `NULL` uses observation numbers.
`cex.id`	magnification of point labels.
`qqline`	logical, if `TRUE`, a `qqline()` is added to the normal QQ plot.
`cook.levels`	levels of the Cook's distance at which to draw contours.
`add.smooth`	logical, if `TRUE`, a smoother is drawn to the appropriate plots; see also `panel` above.
`label.pos`	positioning of labels, for the left half and right half of the graph respectively, for plots 1–3.
`cex.caption`	controls the size of `caption`.
`showBootstraps`	logical, if `TRUE`, bootstrap loess smoothers are drawn in the first 4 plots. By default, only drawn for sample sizes of at least 30.
`use.inzightplots`	logical, if set to `TRUE`, the iNZightPlots package will be used for plotting, rather than base R graphics.
`env`	environment for performing bootstrap simulations (i.e., to find the dataset!)
`...`	other arguments to be passed to through to plotting functions.

Details

For the residuals versus fitted values plot, we add bootstrapped smoothers to illustrate variance. The smoother is also added to the Scale-Location plot.

The Normal Q-Q and histogram plots are taken from the normcheck function in the s20x package.

Value

No return value; called for the side-effect of producing a plot.

Author(s)

Simon Potter, David Banks, Tom Elliott.

Examples


m <- lm(Sepal.Length ~ Sepal.Width + Petal.Width, data = iris)
plotlm6(m, which = 1)

# the summary grid:
plotlm6(m, which = 7)

# the default cycles through all 6 plots
plotlm6(m)

m <- lm(Sepal.Length ~ Sepal.Width + Petal.Width, data = iris)
plotlm6(m, which = 1)

# the summary grid:
plotlm6(m, which = 7)

# the default cycles through all 6 plots
plotlm6(m)

Polynomial Matrix

Description

A modified 'poly()' function that allows for missing values.

Usage

Poly(x, degree = 1, coefs = NULL, raw = FALSE, ...)
Poly(x, degree = 1, coefs = NULL, raw = FALSE, ...)

Arguments

`x`	variable to convert to matrix
`degree`	degree of polynomial
`coefs`	pass to poly() function
`raw`	pass to poly() function
`...`	more arguments for the poly() function

Details

Credit goes to whoever posted this online first (google search if you must find it!)

Value

a matrix, with NAs in the missing rows

Author(s)

Tom Elliott

Examples

Poly(rnorm(100), degree = 2L)

# handles missing values:
iris.na <- iris
iris.na$Sepal.Length[c(5, 10)] <- NA
lm(Sepal.Width ~ Poly(Sepal.Length, 2L), data = iris.na)

# stats::poly() produces an error in this case:
# lm(Sepal.Width ~ poly(Sepal.Length, 2L), data = iris.na)

Poly(rnorm(100), degree = 2L)

# handles missing values:
iris.na <- iris
iris.na$Sepal.Length[c(5, 10)] <- NA
lm(Sepal.Width ~ Poly(Sepal.Length, 2L), data = iris.na)

# stats::poly() produces an error in this case:
# lm(Sepal.Width ~ poly(Sepal.Length, 2L), data = iris.na)

Package 'iNZightRegression'

Help Index

Compare regression models using AIC and BIC.

Description

Usage

Arguments

Value

Methods (by class)

Author(s)

Examples

Compare factor levels

Description

Usage

Arguments

Value

Functions

Author(s)

Examples

Histogram Array

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

iNZight QQ Plot

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Informative Summary Information for Regression Models

Description

Usage

Arguments

Details

Value

Note

Author(s)

See Also

Examples

inzplot method

Description

Usage

Arguments

Value

Functions

Plot types

Author(s)

Examples

inzsummary method

Description

Usage

Arguments

Value

See Also

Partial residual plot of continuous variable

Description

Usage

Arguments

Value

Functions

Author(s)

Examples

Extended Plot Diagnostics for (g)lm Models

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Polynomial Matrix

Description