Package 'iNZightTools' reference manual

Title:	Tools for 'iNZight'
Description:	Provides a collection of wrapper functions for common variable and dataset manipulation workflows primarily used by 'iNZight', a graphical user interface providing easy exploration and visualisation of data for students of statistics, available in both desktop and online versions. Additionally, many of the functions return the 'tidyverse' code used to obtain the result in an effort to bridge the gap between GUI and coding.
Authors:	Tom Elliott [aut, cre] , Daniel Barnett [aut], Yiwen He [aut], Zhaoming Su [aut], Lushi Cai [ctb], Akshay Gupta [ctb], Owen Jin [ctb], Christoph Knopf [ctb]
Maintainer:	Tom Elliott <[email protected]>
License:	GPL-3
Version:	2.0.1
Built:	2025-03-30 03:10:30 UTC
Source:	https://github.com/inzightvit/inzighttools

Add suffix to string

Description

When creating new variables or modifying the data set, we often add a suffix added to distinguish the new name from the original one. However, if the same action is performed twice (for example, filtering a data set), the suffix is duplicated (data.filtered.filtered). This function averts this by adding the suffix if it doesn't exist, and otherwise appending a counter (data.filtered2).

Usage

add_suffix(name, suffix)
add_suffix(name, suffix)

Arguments

`name`	a character vector containing (original) names
`suffix`	the suffix to add, a length-one character vector

Value

character vector of names with suffix appended

Examples

add_suffix("data", "filtered")
add_suffix(c("data.filtered", "data.filtered.reshaped"), "filtered")
add_suffix("data", "filtered")
add_suffix(c("data.filtered", "data.filtered.reshaped"), "filtered")

Aggregate data by categorical variables

Description

Summarizes non-categorical variables in a dataframe by grouping them based on specified categorical variables and returns the aggregated result along with the tidyverse code used to generate it.

Usage

aggregate_data(
  data,
  group_vars,
  summaries,
  vars = NULL,
  names = NULL,
  quantiles = c(0.25, 0.75)
)

aggregate_dt(
  data,
  dt,
  dt_comp,
  group_vars = NULL,
  summaries,
  vars = NULL,
  names = NULL,
  quantiles = c(0.25, 0.75)
)
aggregate_data(
  data,
  group_vars,
  summaries,
  vars = NULL,
  names = NULL,
  quantiles = c(0.25, 0.75)
)

aggregate_dt(
  data,
  dt,
  dt_comp,
  group_vars = NULL,
  summaries,
  vars = NULL,
  names = NULL,
  quantiles = c(0.25, 0.75)
)

Arguments

`data`	A dataframe or survey design object to be aggregated.
`group_vars`	A character vector specifying the variables in `data` to ' be used as grouping factors.
`summaries`	An unnamed character vector or named list of summary functions to calculate for each group. If unnamed, the vector elements should be names of variables in the dataset for which summary statistics need to be calculated. If named, the names should correspond to the summary functions (e.g., "mean", "sd", "iqr") to be applied to each variable.
`vars`	(Optional) A character vector specifying the names of variables in the dataset for which summary statistics need to be calculated. This argument is ignored if `summaries` is a named list.
`names`	(Optional) A character vector or named list providing name templates for the newly created variables. See details for more information.
`quantiles`	(Optional) A numeric vector specifying the desired quantiles (e.g., c(0.25, 0.5, 0.75)). See details for more information.
`dt`	A character string representing the name of the date-time variable in the dataset.
`dt_comp`	A character string specifying the component of the date-time to use for grouping.

Details

The aggregate_data() function accepts any R function that returns a single-value summary (e.g., mean, var, sd, sum, IQR). By default, new variables are named ⁠{var}_{fun}⁠, where {var} is the variable name and {fun} is the summary function used. The user can provide custom names using the names argument, either as a vector of the same length as vars, or as a named list where the names correspond to summary functions (e.g., "mean" or "sd").

The special summary "missing" can be included, which counts the number of missing values in the variable. The default name for this summary is ⁠{var}_missing⁠.

If quantiles are requested, the function calculates the specified quantiles (e.g., 25th, 50th, 75th percentiles), creating new variables for each quantile. To customize the names of these variables, use {p} as a placeholder in the names argument, where {p} represents the quantile value. For example, using names = "Q{p}_{var}" will create variables like "Q0.25_Sepal.Length" for the 25th percentile.

Value

An aggregated dataframe containing the summary statistics for each group, along with the tidyverse code used for the aggregation.

Functions

aggregate_dt(): Aggregate data by dates and times

Author(s)

Tom Elliott, Owen Jin, Zhaoming Su

Zhaoming Su

Examples

aggregated <-
    aggregate_data(iris,
        group_vars = c("Species"),
        summaries = c("mean", "sd", "iqr")
    )
code(aggregated)
head(aggregated)

aggregated <-
    aggregate_data(iris,
        group_vars = c("Species"),
        summaries = c("mean", "sd", "iqr")
    )
code(aggregated)
head(aggregated)

Append rows to a dataset

Description

Append rows to a dataset

Usage

append_rows(data, new_data, when_added = FALSE)
append_rows(data, new_data, when_added = FALSE)

Arguments

`data`	The original dataset to which new rows will be appended.
`new_data`	The dataset containing the new rows.
`when_added`	Logical; indicates whether a `.when_added` column is required.

Value

A dataset with new rows appended below the original data.

Author(s)

Yiwen He, Zhaoming Su

Get Data's Code

Description

Used to grab code from a data.frame generated by this package.

Usage

code(data)
code(data)

Arguments

data

dataset you want to extract the code from

Details

This is simply a helper function to grab the contents of the 'code' attribute contained in the data object.

Value

The code used to generate the data.frame, if available (else NULL)

Author(s)

Tom Elliott

Collapse data by values of a categorical variable

Description

Collapse values in a categorical variable into one defined level

Usage

collapse_cat(data, var, levels, new_level, name = NULL)
collapse_cat(data, var, levels, new_level, name = NULL)

Arguments

`data`	a dataframe to collapse
`var`	a string of the name of the categorical variable to collapse
`levels`	a character vector of the levels to be collapsed
`new_level`	a string for the new level
`name`	a name for the new variable

Value

the original dataframe containing a new column of the collapsed variable with tidyverse code attached

Author(s)

Zhaoming Su

Examples

collapsed <- collapse_cat(iris,
    var = "Species",
    c("versicolor", "virginica"),
    new_level = "V"
)
cat(code(collapsed))
tail(collapsed)

collapsed <- collapse_cat(iris,
    var = "Species",
    c("versicolor", "virginica"),
    new_level = "V"
)
cat(code(collapsed))
tail(collapsed)

Combine variables into one categorical variable

Description

Combine chosen variables of any class by concatenating them into one factor variable, and returns the result along with tidyverse code used to generate it.

Usage

combine_vars(
  data,
  vars,
  sep = ":",
  name = NULL,
  keep_empty = FALSE,
  keep_na = TRUE
)
combine_vars(
  data,
  vars,
  sep = ":",
  name = NULL,
  keep_empty = FALSE,
  keep_na = TRUE
)

Arguments

`data`	a dataframe with the columns to be combined
`vars`	a character vector of the variables to be combined
`sep`	a character string to separate the levels
`name`	a name for the new variable
`keep_empty`	logical, if `FALSE` empty level combinations are removed from the factor
`keep_na`	logical, if `TRUE` the `<NA>` in the factors or `NA` in the characters will turn in a level `"(Missing)"`; otherwise, the resulting entries will return `<NA>`

Value

original dataframe containing new columns of the new categorical variable with tidyverse code attached

Author(s)

Owen Jin, Zhaoming Su

Examples

combined <- combine_vars(warpbreaks, vars = c("wool", "tension"), sep = "_")
cat(code(combined))
head(combined)

combined <- combine_vars(warpbreaks, vars = c("wool", "tension"), sep = "_")
cat(code(combined))
head(combined)

Convert variables to categorical variables

Description

Convert specified variables into factors

Usage

convert_to_cat(data, vars, names = NULL)
convert_to_cat(data, vars, names = NULL)

Arguments

`data`	a dataframe with the categorical column to convert
`vars`	a character vector of column names to convert
`names`	a character vector of names for the created variables

Value

original dataframe containing new columns of the converted variables with tidyverse code attached

Author(s)

Zhaoming Su

Examples

converted <- convert_to_cat(iris, vars = c("Petal.Width"))
cat(code(converted))
head(converted)

converted <- convert_to_cat(iris, vars = c("Petal.Width"))
cat(code(converted))
head(converted)

Convert variables to dates

Description

Convert variables to dates

Usage

convert_to_date(data, vars, ord = NULL, names = NULL)
convert_to_date(data, vars, ord = NULL, names = NULL)

Arguments

`data`	a dataframe with the variables to convert
`vars`	a character vector of column names to convert
`ord`	a character vector of date-time formats
`names`	a character vector of names for the created variables

Value

original dataframe containing new columns of the converted variables with tidyverse code attached

Author(s)

Zhaoming Su

Convert variables to date-time

Description

Convert variables to date-time

Usage

convert_to_datetime(data, vars, ord = NULL, names = NULL, tz = "")
convert_to_datetime(data, vars, ord = NULL, names = NULL, tz = "")

Arguments

`data`	a dataframe with the variables to convert
`vars`	a character vector of column names to convert
`ord`	a character vector of date-time formats
`names`	a character vector of names for the created variables
`tz`	a time zone name (default: local time zone). See `OlsonNames`

Value

original dataframe containing new columns of the converted variables with tidyverse code attached

Author(s)

Zhaoming Su

Create variable name

Description

Convert a given string to a valid R variable name, converting spaces to underscores (_) instead of dots.

Usage

create_varname(x)
create_varname(x)

Arguments

`x`	a string to convert

Value

a string, which is also a valid variable name

Author(s)

Tom Elliott

Examples

create_varname("a new variable")
create_varname("8d4-2q5")
create_varname("a new variable")
create_varname("8d4-2q5")

Create new variables

Description

Create new variables by using valid R expressions and returns the result along with tidyverse code used to generate it.

Usage

create_vars(data, vars = ".new_var", vars_expr = NULL)
create_vars(data, vars = ".new_var", vars_expr = NULL)

Arguments

`data`	a dataframe to which to add new variables to
`vars`	a character of the new variable names
`vars_expr`	a character of valid R expressions which can generate vectors of values

Value

original dataframe containing the new columns created from vars_expr with tidyverse code attached

Author(s)

Zhaoming Su

Examples

created <- create_vars(
    data = iris,
    vars = "Sepal.Length_less_Sepal.Width",
    "Sepal.Length - Sepal.Width"
)
cat(code(created))
head(created)
created <- create_vars(
    data = iris,
    vars = "Sepal.Length_less_Sepal.Width",
    "Sepal.Length - Sepal.Width"
)
cat(code(created))
head(created)

Delete variables

Description

Delete variables from a dataset

Usage

delete_vars(data, vars = NULL)
delete_vars(data, vars = NULL)

Arguments

`data`	dataset
`vars`	variable names to delete

Value

dataset without chosen variables

Author(s)

Zhaoming Su

Extract date component from a date-time variable

Description

This function extracts a specific date component from a date-time variable in a dataframe.

Usage

extract_dt_comp(data, var, comp, name = NULL)
extract_dt_comp(data, var, comp, name = NULL)

Arguments

`data`	The dataframe containing the date-time variable.
`var`	The name of the date-time variable to extract the component.
`comp`	The date component wanted from the variable. See `iNZightTools:::inz_dt_comp` for the full list of components.
`name`	The name of the new column to store the extracted date component.

Value

A dataframe with the new date component column.

Author(s)

Zhaoming Su

Extract part of a datetimes variable (DEPRECATED)

Description

This function has been replaced by 'extract_dt_comp' and will be removed in the next release.

Usage

extract_part(.data, varname, part, name)
extract_part(.data, varname, part, name)

Arguments

`.data`	dataframe
`varname`	name of the variable
`part`	part of the variable wanted
`name`	name of the new column

Value

see 'extract_dt_comp'

Filter

Description

Filter

Filter inzdf

Usage

## S3 method for class 'inzdf_db'
filter(.data, ..., table = NULL, .preserve = FALSE)
## S3 method for class 'inzdf_db'
filter(.data, ..., table = NULL, .preserve = FALSE)

Arguments

`.data`	A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.
`...`	<`data-masking`> Expressions that return a logical value, and are defined in terms of the variables in `.data`. If multiple expressions are included, they are combined with the `&` operator. Only rows for which all conditions evaluate to `TRUE` are kept.
`table`	name of the table to use, defaults to first in list
`.preserve`	ignored

Filter data by levels of categorical variables

Description

This function filters a dataframe or survey design object by keeping only the rows where a specified categorical variable matches one of the given levels. The resulting filtered dataframe is returned, along with the tidyverse code used to generate it.

Usage

filter_cat(data, var, levels)
filter_cat(data, var, levels)

Arguments

`data`	A dataframe or survey design object to be filtered.
`var`	The name of the column in `data` to be filtered by.
`levels`	A character vector of levels in `var` to keep.

Value

A filtered dataframe with the tidyverse code attached.

Author(s)

Owen Jin, Zhaoming Su

Examples

filtered <- filter_cat(iris,
    var = "Species",
    levels = c("versicolor", "virginica")
)
cat(code(filtered))
head(filtered)

filtered <- filter_cat(iris,
    var = "Species",
    levels = c("versicolor", "virginica")
)
cat(code(filtered))
head(filtered)

Filter data by levels of numeric variables

Description

This function filters a dataframe or survey design object by applying a specified boolean condition to one of its numeric variables. The resulting filtered dataframe is returned, along with the tidyverse code used to generate it.

Usage

filter_num(data, var, op = c("<=", "<", ">=", ">", "==", "!="), num)
filter_num(data, var, op = c("<=", "<", ">=", ">", "==", "!="), num)

Arguments

`data`	A dataframe or survey design object to be filtered.
`var`	The name of the column in `data` to be filtered by.
`op`	A logical operator to apply for the filtering condition. Valid options are: "<=", "<", ">=", ">", "==", or "!=".
`num`	The numeric value for which the specified `op` is applied.

Value

A filtered dataframe with the tidyverse code attached.

Author(s)

Owen Jin, Tom Elliott, Zhaoming Su

Examples

filtered <- filter_num(iris, var = "Sepal.Length", op = "<=", num = 5)
cat(code(filtered))
head(filtered)

library(survey)
data(api)
svy <- svydesign(~ dnum + snum,
    weights = ~pw, fpc = ~ fpc1 + fpc2,
    data = apiclus2
)
svy_filtered <- filter_num(svy, var = "api00", op = "<", num = 700)
cat(code(svy_filtered))

filtered <- filter_num(iris, var = "Sepal.Length", op = "<=", num = 5)
cat(code(filtered))
head(filtered)

library(survey)
data(api)
svy <- svydesign(~ dnum + snum,
    weights = ~pw, fpc = ~ fpc1 + fpc2,
    data = apiclus2
)
svy_filtered <- filter_num(svy, var = "api00", op = "<", num = 700)
cat(code(svy_filtered))

Fit a survey design

Description

Fit a survey design to an object

Usage

fitDesign(svydes, dataset.name)
fitDesign(svydes, dataset.name)

Arguments

`svydes`	a design
`dataset.name`	a dataset name

Value

a survey object

Author(s)

Tom Elliott

Fit models

Description

Wrapper function for 'lm', 'glm', and 'svyglm'.

Usage

fitModel(
  y,
  x,
  data,
  family = "gaussian",
  link = switch(family, gaussian = "gaussian", binomial = "logit", poisson = "log",
    negbin = "log"),
  design = "simple",
  svydes = NA,
  surv_params = NULL,
  ...
)
fitModel(
  y,
  x,
  data,
  family = "gaussian",
  link = switch(family, gaussian = "gaussian", binomial = "logit", poisson = "log",
    negbin = "log"),
  design = "simple",
  svydes = NA,
  surv_params = NULL,
  ...
)

Arguments

`y`	character string representing the response,
`x`	character string of the explanatory variables,
`data`	name of the object containing the data.
`family`	gaussian, binomial, poisson (so far, no others will be added)
`link`	the link function to use
`design`	data design specification. one of 'simple', 'survey' or 'experiment'
`svydes`	a vector of arguments to be passed to the svydesign function, excluding data (defined above)
`surv_params`	a vector containing arguments for `survival::Surv()`
`...`	further arguments to be passed to lm, glm, svyglm, such as offset, etc.

Value

A model call formula (using lm, glm, or svyglm)

Author(s)

Tom Elliott

Form Class Intervals

Description

This function creates categorical intervals from a numeric variable in the given dataset.

Usage

form_class_intervals(
  data,
  variable,
  method = c("equal", "width", "count", "manual"),
  n_intervals = 4L,
  interval_width,
  format = "(a,b]",
  range = NULL,
  format_lowest = ifelse(isinteger, "< a", "<= a"),
  format_highest = "> b",
  break_points = NULL,
  name = sprintf("%s.f", variable)
)
form_class_intervals(
  data,
  variable,
  method = c("equal", "width", "count", "manual"),
  n_intervals = 4L,
  interval_width,
  format = "(a,b]",
  range = NULL,
  format_lowest = ifelse(isinteger, "< a", "<= a"),
  format_highest = "> b",
  break_points = NULL,
  name = sprintf("%s.f", variable)
)

Arguments

`data`	A dataset or a survey object.
`variable`	The name of the numeric variable to convert into intervals.
`method`	The method used to create intervals: 'equal' for equal-width intervals, 'width' for intervals of a specific width, 'count' for equal-count intervals, and 'manual' to specify break points manually.
`n_intervals`	For methods 'equal' and 'count', this specifies the number of intervals to create.
`interval_width`	For method 'width', this sets the width of the intervals.
`format`	The format for interval labels; use 'a' and 'b' to represent the min/max of each interval, respectively.
`range`	The range of the data; use this to adjust the labels (e.g., for continuous data, set this to the floor/ceiling of the min/max of the data to get prettier intervals). If range does not cover the range of the data, values outside will be placed into 'less than a' and 'greater than b' categories.
`format_lowest`	Label format for values lower than the min of range.
`format_highest`	Label format for values higher than the max of range.
`break_points`	For method 'manual', specify breakpoints here as a numeric vector.
`name`	The name of the new variable in the resulting data set.

Value

A dataframe with an additional column containing categorical class intervals.

Author(s)

Tom Elliott, Zhaoming Su

Examples

form_class_intervals(iris, "Sepal.Length", "equal", 5L)
form_class_intervals(iris, "Sepal.Length", "equal", 5L)

iNZight data frame object

Description

This object allows the data to be either a standard R data.frame or a connection to a database.

Usage

inzdf(x, name, ...)

## S3 method for class 'tbl_df'
inzdf(x, name, ...)

## S3 method for class 'data.frame'
inzdf(x, name, ...)

## S3 method for class 'SQLiteConnection'
inzdf(
  x,
  name = deparse(substitute(x)),
  schema = NULL,
  var_attrs = list(),
  dictionary = NULL,
  keep_con = FALSE,
  ...
)
inzdf(x, name, ...)

## S3 method for class 'tbl_df'
inzdf(x, name, ...)

## S3 method for class 'data.frame'
inzdf(x, name, ...)

## S3 method for class 'SQLiteConnection'
inzdf(
  x,
  name = deparse(substitute(x)),
  schema = NULL,
  var_attrs = list(),
  dictionary = NULL,
  keep_con = FALSE,
  ...
)

Arguments

`x`	a data.frame or db connection
`name`	the name of the data
`...`	additional arguments passed to methods
`schema`	a list specifying the schema of the database (used for linking)
`var_attrs`	nested list of variables attributes for each table > variable
`dictionary`	an inzdict object
`keep_con`	if 'TRUE' data will remain in DB (use for very large data)

Details

TODO: It is possible to specify a linking structure between multiple datasets, and when variables are selected the dataset will be linked 'on-the-fly'. This, when used with databases, will significantly reduce the size of data in memory.

Value

an inzdf object

Is factor check

Description

This function checks if a variable a factor.

Usage

is_cat(x)
is_cat(x)

Arguments

`x`	the variable to check

Value

logical, TRUE if the variable is a factor

Author(s)

Tom Elliott

Is datetime check

Description

This function checks if a variable a date/time/datetime

Usage

is_dt(x)
is_dt(x)

Arguments

`x`	the variable to check

Value

logical, TRUE if the variable is a datetime

Author(s)

Tom Elliott

Is numeric check

Description

This function checks if a variable is numeric, or could be considered one. For example, dates and times can be treated as numeric, so return TRUE.

Usage

is_num(x)
is_num(x)

Arguments

`x`	the variable to check

Value

logical, TRUE if the variable is numeric

Author(s)

Tom Elliott

Is Preview

Description

Checks if the complete file was read or not.

Usage

is_preview(df)
is_preview(df)

Arguments

`df`	data to check

Value

logical

Check if object is a survey object (either standard or replicate design)

Description

Check if object is a survey object (either standard or replicate design)

Usage

is_survey(x)
is_survey(x)

Arguments

`x`	object to be tested

Value

logical

Author(s)

Tom Elliott

Check if object is a survey object (created by svydesign())

Description

Check if object is a survey object (created by svydesign())

Usage

is_svydesign(x)
is_svydesign(x)

Arguments

`x`	object to be tested

Value

logical

Author(s)

Tom Elliott

Check if object is a replicate survey object (created by svrepdesign())

Description

Check if object is a replicate survey object (created by svrepdesign())

Usage

is_svyrep(x)
is_svyrep(x)

Arguments

`x`	object to be tested

Value

logical

Author(s)

Tom Elliott

Join data with another dataset

Description

Join data with another dataset

Usage

join_data(
  data_l,
  data_r,
  by = NULL,
  how = c("inner", "left", "right", "full", "anti", "semi"),
  suffix_l = ".x",
  suffix_r = ".y"
)
join_data(
  data_l,
  data_r,
  by = NULL,
  how = c("inner", "left", "right", "full", "anti", "semi"),
  suffix_l = ".x",
  suffix_r = ".y"
)

Arguments

`data_l`	original data
`data_r`	imported dataset
`by`	a character vector of variables to join by
`how`	the method used to join the datasets
`suffix_l`	suffix for the original dataset (ignored for filter-joins)
`suffix_r`	suffix for the imported dataset (ignored for filter-joins)

Value

joined dataset

Author(s)

Zhaoming Su

Import linked data into an `inzdf` object

Description

Import linked data into an inzdf object

Usage

load_linked(
  x,
  schema,
  con,
  name = ifelse(missing(con), deparse(substitute(x)), deparse(substitute(con))),
  keep_con = FALSE,
  progress = FALSE,
  ...
)
load_linked(
  x,
  schema,
  con,
  name = ifelse(missing(con), deparse(substitute(x)), deparse(substitute(con))),
  keep_con = FALSE,
  progress = FALSE,
  ...
)

Arguments

`x`	a linked specification file or vector of data set paths
`schema`	a list describing the schema/relationships between the files
`con`	a database connection to load the linked data into
`name`	the name of the data set collection
`keep_con`	if `TRUE` data will remain in DB (use for very large data)
`progress`	either `TRUE` or `FALSE` to enable/disable the default progress bar, or a list of three functions to `x <- create(from, to)`, `set(x, i)`, and `destroy(x)` a progress bar.
`...`	additional arguments passed to data reading function `smart_read()`

Value

an inzdf object

Load object(s) from an Rdata file

Description

Load object(s) from an Rdata file

Usage

load_rda(file)
load_rda(file)

Arguments

file

path to an rdata file

Value

list of data frames, plus code

Author(s)

Tom Elliott

Make unique variable names

Description

Helper function to create new variable names that are unique given a set of existing names (in a data set, for example). If a variable name already exists, a number will be appended.

Usage

make_names(new, existing = character())
make_names(new, existing = character())

Arguments

`new`	a vector of proposed new variable names
`existing`	a vector of existing variable names

Value

a vector of unique variable names

Author(s)

Tom Elliott

Examples

make_names(c("var_x", "var_y"), c("var_x", "var_z"))

make_names(c("var_x", "var_y"), c("var_x", "var_z"))

Convert missing values to categorical variables

Description

Turn <NA> in categorical variables into "(Missing)"; numeric variables will be converted to categorical variables where numeric values as "(Observed)" and NA as "(Missing)".

Usage

missing_to_cat(data, vars, names = NULL)
missing_to_cat(data, vars, names = NULL)

Arguments

`data`	a dataframe with the columns to convert its missing values into categorical
`vars`	a character vector of the variables in `data` for conversion of missing values
`names`	a character vector of names for the new variables

Value

original dataframe containing new columns of the converted variables for the missing values with tidyverse code attached

Author(s)

Zhaoming Su

Examples

missing <- missing_to_cat(iris, vars = c("Species", "Sepal.Length"))
cat(code(missing))
head(missing)

missing <- missing_to_cat(iris, vars = c("Species", "Sepal.Length"))
cat(code(missing))
head(missing)

Open a New Graphics Device

Description

Opens a new graphics device

Usage

newdevice(width = 7, height = 7, ...)
newdevice(width = 7, height = 7, ...)

Arguments

`width`	the width (in inches) of the new device
`height`	the height (in inches) of the new device
`...`	additional arguments passed to the new device function

Details

Depending on the system, difference devices are better. The windows device works fine (for now), only attempt to speed up any other devices that we're going to be using. We speed them up by getting rid of buffering.

Author(s)

Tom Elliott

Anti value matching

Description

Anti value matching

Usage

x %notin% table
x %notin% table

Arguments

`x`	vector of values to be matched
`table`	vector of values to match against

Value

A logical vector of same length as 'x', indicating if each element does not exist in the table.

NULL or operator

Description

NULL or operator

Usage

a %||% b
a %||% b

Arguments

`a`	an object, potentially NULL
`b`	an object

Value

a if a is not NULL, otherwise b

Tidy-printing of the code attached to an object

Description

Tidy-printing of the code attached to an object

Usage

print_code(x, ...)
print_code(x, ...)

Arguments

`x`	a dataframe with code attached
`...`	additional arguments passed to tidy_all_code()

Value

Called for side-effect of printing code to the console.

Examples

iris_agg <- aggregate_data(iris, group_vars = "Species", summaries = "mean")
print_code(iris_agg)
iris_agg <- aggregate_data(iris, group_vars = "Species", summaries = "mean")
print_code(iris_agg)

Random sampling without replacement

Description

Take a specified number of groups of observations with fixed group size by sampling without replacement and returns the result along with tidyverse code used to generate it.

Usage

random_sample(data, n, sample_size)
random_sample(data, n, sample_size)

Arguments

`data`	a dataframe to sample from
`n`	the number of groups to generate
`sample_size`	the size of each group specified in `n`

Value

a dataframe containing the random samples with tidyverse code attached

Author(s)

Owen Jin, Zhaoming Su

Examples

rs <- random_sample(iris, n = 5, sample_size = 3)
cat(code(rs))
head(rs)

rs <- random_sample(iris, n = 5, sample_size = 3)
cat(code(rs))
head(rs)

Rank the data of numeric variables

Description

Rank the values of numeric variables, for example, in descending order, and then returns the result along with tidyverse code used to generate it. See row_number and percent_rank.

Usage

rank_vars(data, vars, rank_type = c("min", "dense", "percent"))
rank_vars(data, vars, rank_type = c("min", "dense", "percent"))

Arguments

`data`	a dataframe with the variables to rank
`vars`	a character vector of numeric variables in `data` to rank
`rank_type`	either `"min"`, `"dense"` or `"percent"`, see `row_number`, `percent_rank`

Value

the original dataframe containing new columns with the ranks of the variables in vars with tidyverse code attached

Author(s)

Zhaoming Su

Examples

ranked <- rank_vars(iris, vars = c("Sepal.Length", "Petal.Length"))
cat(code(ranked))
head(ranked)

ranked <- rank_vars(iris, vars = c("Sepal.Length", "Petal.Length"))
cat(code(ranked))
head(ranked)

Data Dictionaries

Description

This function reads a data dictionary from a file and attaches it to a dataset. The attached data dictionary provides utility functions that can be used by other methods, such as plots, to automatically create axes and more.

Usage

read_dictionary(
  file,
  name = "name",
  type = "type",
  title = "title",
  description = "description",
  units = "units",
  codes = "codes",
  values = "values",
  level_separator = "|",
  ...
)

## S3 method for class 'dictionary'
print(x, kable = FALSE, include_other = TRUE, ...)

## S3 method for class 'dictionary'
x[i, ...]

apply_dictionary(data, dict)

has_dictionary(data)

get_dictionary(data)
read_dictionary(
  file,
  name = "name",
  type = "type",
  title = "title",
  description = "description",
  units = "units",
  codes = "codes",
  values = "values",
  level_separator = "|",
  ...
)

## S3 method for class 'dictionary'
print(x, kable = FALSE, include_other = TRUE, ...)

## S3 method for class 'dictionary'
x[i, ...]

apply_dictionary(data, dict)

has_dictionary(data)

get_dictionary(data)

Arguments

`file`	The path to the file containing the data dictionary.
`name`	The name of the column containing the variable name.
`type`	The name of the column containing the variable type.
`title`	The name of the column containing a short, human-readable title for the variable. If blank, the variable name will be used instead.
`description`	The name of the column containing the variable description.
`units`	The name of the column containing units (for numeric variables only).
`codes`	The name of the column containing factor codes (for categorical variables only).
`values`	The name of the column containing factor values corresponding to the codes. These should be in the same order as the codes.
`level_separator`	The separator used to separate levels in `codes` and `values` columns. The default separator is "\|". Alternatively, you can provide a vector of length 2, where the first element is used for `codes` and the second element for `values`.
`...`	Additional arguments, passed to `smart_read`.
`x`	A `dictionary` object.
`kable`	If `TRUE`, the output will be formatted using kable.
`include_other`	If `TRUE`, additional variables will be included in the output.
`i`	Subset index.
`data`	A dataset (dataframe, tibble).
`dict`	A dictionary (created using `read_dictionary()`).

Value

The dataset with the attached data dictionary.

Read CSV with iNZight metadata

Description

This function will read a CSV file with iNZight metadata in the header. This allows plain text CSV files to be supplied with additional comments that describe the structure of the data to make import and data handling easier.

Usage

read_meta(file, preview = FALSE, column_types, ...)
read_meta(file, preview = FALSE, column_types, ...)

Arguments

`file`	the plain text file with metadata
`preview`	logical, if `TRUE` only the first 10 rows are returned
`column_types`	optional column types
`...`	more arguments

Details

The main example is to define factor levels for an integer variable in large data sets.

Value

a data frame

Author(s)

Tom Elliott

Read text as data

Description

The text can also be the value '"clipboard"' which will use 'readr::clipboard()'.

Usage

read_text(txt, delim = "\t", ...)
read_text(txt, delim = "\t", ...)

Arguments

`txt`	character string
`delim`	the delimiter to use, passed to 'readr::read_delim()'
`...`	additional arguments passed to 'readr::read_delim()'

Value

data.frame

Author(s)

Tom Elliott

Remove rows from data by row numbers

Description

This function filters a dataframe or a survey design object by removing specified rows based on the provided row numbers. The resulting filtered dataframe is returned, along with the tidyverse code used to generate it.

Usage

remove_rows(data, rows)
remove_rows(data, rows)

Arguments

`data`	A dataframe or a survey design object to be filtered.
`rows`	A numeric vector of row numbers to be sliced off.

Value

A filtered dataframe with the tidyverse code attached.

Author(s)

Owen Jin, Zhaoming Su

Examples

data <- remove_rows(iris, rows = c(1, 4, 5))
cat(code(data))
head(data)

data <- remove_rows(iris, rows = c(1, 4, 5))
cat(code(data))
head(data)

Rename the levels of a categorical variable

Description

Rename the levels of a categorical variables, and returns the result along with tidyverse code used to generate it.

Usage

rename_levels(data, var, tobe_asis, name = NULL)
rename_levels(data, var, tobe_asis, name = NULL)

Arguments

`data`	a dataframe with the column to be renamed
`var`	a character of the categorical variable to rename
`tobe_asis`	a named list of the old level names assigned to the new level names ie. list('new level names' = 'old level names')
`name`	a name for the new variable

Value

original dataframe containing a new column of the renamed categorical variable with tidyverse code attached

Author(s)

Zhaoming Su

Examples

renamed <- rename_levels(iris,
    var = "Species",
    tobe_asis = list(set = "setosa", ver = "versicolor")
)
cat(code(renamed))
head(renamed)

renamed <- rename_levels(iris,
    var = "Species",
    tobe_asis = list(set = "setosa", ver = "versicolor")
)
cat(code(renamed))
head(renamed)

Rename column names

Description

Rename columns of a dataset with desired names

Usage

rename_vars(data, tobe_asis)
rename_vars(data, tobe_asis)

Arguments

`data`	a dataframe with columns to rename
`tobe_asis`	a named list of the old column names assigned to the new column names ie. list('new column names' = 'old column names')

Value

original dataframe containing new columns of the renamed columns with tidyverse code attached

Author(s)

Zhaoming Su

Examples

renamed <- rename_vars(iris, list(
    sepal_length = "Sepal.Length",
    sepal_width = "Sepal.Width",
    petal_length = "Petal.Length",
    petal_width = "Petal.Width"
))
cat(code(renamed))
head(renamed)

renamed <- rename_vars(iris, list(
    sepal_length = "Sepal.Length",
    sepal_width = "Sepal.Width",
    petal_length = "Petal.Length",
    petal_width = "Petal.Width"
))
cat(code(renamed))
head(renamed)

Reorder the levels of a categorical variable

Description

Reorder the levels of a categorical variable either manually or automatically

Usage

reorder_levels(
  data,
  var,
  new_levels = NULL,
  auto = c("freq", "order", "seq"),
  name = NULL
)
reorder_levels(
  data,
  var,
  new_levels = NULL,
  auto = c("freq", "order", "seq"),
  name = NULL
)

Arguments

`data`	a dataframe to reorder
`var`	a categorical variable to reorder
`new_levels`	a character vector of the new factor order; overrides `auto` if not `NULL`
`auto`	only meaningful if `new_levels` is `NULL`: the method to auto-reorder the levels, see `fct_inorder`
`name`	name for the new variable

Value

original dataframe containing a new column of the reordered categorical variable with tidyverse code attached

Author(s)

Zhaoming Su

Examples

reordered <- reorder_levels(iris,
    var = "Species",
    new_levels = c("versicolor", "virginica", "setosa")
)
cat(code(reordered))
head(reordered)

reordered <- reorder_levels(iris,
    var = "Species",
    auto = "freq"
)
cat(code(reordered))
head(reordered)

reordered <- reorder_levels(iris,
    var = "Species",
    new_levels = c("versicolor", "virginica", "setosa")
)
cat(code(reordered))
head(reordered)

reordered <- reorder_levels(iris,
    var = "Species",
    auto = "freq"
)
cat(code(reordered))
head(reordered)

Reshaping dataset from wide to long or from long to wide

Description

Reshaping dataset from wide to long or from long to wide

Usage

reshape_data(
  data,
  data_to = c("long", "wide"),
  cols,
  names_to = "name",
  values_to = "value",
  names_from = "name",
  values_from = "value"
)
reshape_data(
  data,
  data_to = c("long", "wide"),
  cols,
  names_to = "name",
  values_to = "value",
  names_from = "name",
  values_from = "value"
)

Arguments

`data`	a dataset to reshape
`data_to`	whether the target dataset is `long` or `wide`
`cols`	columns to gather together (for wide to long)
`names_to`	name for new column containing old names (for wide to long)
`values_to`	name for new column containing old values (for wide to long)
`names_from`	column to spread out (for long to wide)
`values_from`	values to be put in the spread columns (for long to wide)

Value

reshaped dataset

Author(s)

Zhaoming Su

Save an object with, optionally, a (valid) name

Description

Save an object with, optionally, a (valid) name

Usage

save_rda(data, file, name)
save_rda(data, file, name)

Arguments

`data`	the data frame to save
`file`	where to save it
`name`	optional, the name the data will have in the rda file

Value

logical, should be TRUE, along with code for the save

Author(s)

Tom Elliott

Select

Description

Select

Select variables from a dataset

Description

Select a (reordered) subset of variables from a subset.

Usage

select_vars(data, keep)
select_vars(data, keep)

Arguments

`data`	the dataset
`keep`	vector of variable names to keep

Value

a data frame with tidyverse code attribute

Author(s)

Tom Elliott, Zhaoming Su

Examples

select_vars(iris, c("Sepal.Length", "Species", "Sepal.Width"))
select_vars(iris, c("Sepal.Length", "Species", "Sepal.Width"))

Separate columns

Description

Separate columns

Usage

separate_var(data, var, by, names, into = c("cols", "rows"))
separate_var(data, var, by, names, into = c("cols", "rows"))

Arguments

`data`	dataset
`var`	name of variable to be separated
`by`	a string as delimiter between values (separate by delimiter) or integer(s) as number of characters to split by (separate by position), the length of `by` should be `1` unless `by` is integer and `into = "cols"`; if `by` is a non-integer numeric vector its values will be rounded down to the nearest integer
`names`	for `into = "cols"`, a character vector of output column names; use `NA` if there are components that you don't want to appear in the output; the number of non-`NA` elements determines the number of new columns in the result
`into`	whether to split into new rows or columns

Value

Separated dataset

Author(s)

Zhaoming Su

List available sheets within a file

Description

Useful when reading an Excel file to quickly check what other sheets are available.

Usage

sheets(x)
sheets(x)

Arguments

`x`	a dataframe, presumably returned by `smart_read`

Value

vector of sheet names, or NULL if the file was not an Excel workbook

Author(s)

Tom Elliott

Examples

cas_file <- system.file("extdata/cas500.xls", package = "iNZightTools")
cas <- smart_read(cas_file)
sheets(cas)
cas_file <- system.file("extdata/cas500.xls", package = "iNZightTools")
cas <- smart_read(cas_file)
sheets(cas)

Read a data file

Description

A simple function that imports a file without the users needing to specify information about the file type (see Details for more). The smart_read() function uses the file's extension to determine the appropriate function to read the data. Additionally, characters are converted to factors by default, mostly for compatibility with iNZight (https://inzight.nz).

Usage

smart_read(
  file,
  ext = tools::file_ext(file),
  preview = FALSE,
  column_types = NULL,
  ...
)
smart_read(
  file,
  ext = tools::file_ext(file),
  preview = FALSE,
  column_types = NULL,
  ...
)

Arguments

`file`	the file path to read
`ext`	file extension, namely "csv" or "txt"
`preview`	logical, if `TRUE` only the first few rows of the data will be returned
`column_types`	vector of column types (see ?readr::read_csv)
`...`	additional parameters passed to read_* functions

Details

Currently, smart_read() understands the following file types:

delimited (.csv, .txt)
Excel (.xls, .xlsx)
SPSS (.sav)
Stata (.dta)
SAS (.sas7bdat, .xpt)
R data (.rds)
JSON (.json)

Value

A dataframe with some additional attributes:

name is the name of the file
code contains the 'tidyverse' code used to read the data
sheets contains names of sheets if 'file' is an Excel file (can be retrieved using the sheets() helper function)

Reading delimited files

By default, smart_read() will detect the delimiter used in the file if the argument delimiter = NULL is passed in (the default). If this does not work, you can override this argument:

smart_read('path/to/file', delimiter = '+')

Author(s)

Tom Elliott

Sort data by variables

Description

Sorts a dataframe by one or more variables, and returns the result along with tidyverse code used to generate it.

Usage

sort_vars(data, vars, asc = rep(TRUE, length(vars)))
sort_vars(data, vars, asc = rep(TRUE, length(vars)))

Arguments

`data`	a dataframe to sort
`vars`	a character vector of variable names to sort by
`asc`	logical, length of 1 or same length as `vars`. If `TRUE` (default), then sorted in ascending order, otherwise descending.

Value

data with tidyverse code attached

Author(s)

Owen Jin, Zhaoming Su

Examples

sorted <- sort_vars(iris,
    vars = c("Sepal.Width", "Sepal.Length"),
    asc = c(TRUE, FALSE)
)
cat(code(sorted))
head(sorted)

sorted <- sort_vars(iris,
    vars = c("Sepal.Width", "Sepal.Length"),
    asc = c(TRUE, FALSE)
)
cat(code(sorted))
head(sorted)

Standardize the data of a numeric variable

Description

Centre then divide by the standard error of the values in a numeric variable

Usage

standardize_vars(data, vars, names = NULL)
standardize_vars(data, vars, names = NULL)

Arguments

`data`	a dataframe with the columns to standardize
`vars`	a character vector of the numeric variables in `data` to standardize
`names`	names for the created variables

Value

the original dataframe containing new columns of the standardized variables with tidyverse code attached

Author(s)

Zhaoming Su

Examples

standardized <- standardize_vars(iris, var = c("Sepal.Width", "Petal.Width"))
cat(code(standardized))
head(standardized)

standardized <- standardize_vars(iris, var = c("Sepal.Width", "Petal.Width"))
cat(code(standardized))
head(standardized)

Interquartile range function for surveys

Description

Calculates the interquartile range from complex survey data. A wrapper for taking differences of svyquantile at 0.25 and 0.75 quantiles, and meant to be called from within srvyr::summarise.

Usage

survey_IQR(x, na.rm = TRUE)
survey_IQR(x, na.rm = TRUE)

Arguments

`x`	A variable or expression
`na.rm`	logical, if `TRUE` missing values are removed

Value

a vector of interquartile ranges

Author(s)

Tom Elliott

Examples

library(survey)
library(srvyr)
data(api)

dstrata <- apistrat %>%
    as_survey(strata = stype, weights = pw)

dstrata %>%
    summarise(api99_iqr = survey_IQR(api99))

library(survey)
library(srvyr)
data(api)

dstrata <- apistrat %>%
    as_survey(strata = stype, weights = pw)

dstrata %>%
    summarise(api99_iqr = survey_IQR(api99))

iNZight Tidy Code

Description

Tidy code with correct indents and limit the code to the specific width

Usage

tidy_all_code(x, width = 80, indent = 4, outfile, incl_library = TRUE)
tidy_all_code(x, width = 80, indent = 4, outfile, incl_library = TRUE)

Arguments

`x`	character string or file name of the file containing messy code
`width`	the width of a line
`indent`	how many spaces for one indent
`outfile`	the file name of the file containing formatted code
`incl_library`	logical, if true, the output code will contain library name

Value

formatted code, optionally written to 'outfile'

Author(s)

Tom Elliott, Lushi Cai

Transform data of numeric variables

Description

Transform the values of numeric variables by applying a mathematical function

Usage

transform_vars(data, vars, fn, names = NULL)
transform_vars(data, vars, fn, names = NULL)

Arguments

`data`	a dataframe with the variables to transform
`vars`	a character of the numeric variables in `data` to transform
`fn`	the name (a string) of a valid R function
`names`	the names of the new variables

Value

the original dataframe containing the new columns of the transformed variable with tidyverse code attached

Author(s)

Zhaoming Su

Examples

transformed <- transform_vars(iris,
    var = "Petal.Length",
    fn = "log"
)
cat(code(transformed))
head(transformed)

transformed <- transform_vars(iris,
    var = "Petal.Length",
    fn = "log"
)
cat(code(transformed))
head(transformed)

Details of Validation Rule Results

Description

Generates the more detailed text required for the details section in iNZValidateWin.

Usage

validation_details(cf, v, var, id.var, df)
validation_details(cf, v, var, id.var, df)

Arguments

`cf`	Confrontation object from `validate::confront()`
`v`	Validator that generated `cf`
`var`	Rule name to give details about
`id.var`	Variable name denoting a unique identifier for each observation
`df`	The dataset that was confronted

Value

A character vector giving each line of the summary detail text

Author(s)

Daniel Barnett

Validation Confrontation Summary

Description

Generates a summary of a confrontation which gives basic information about each validation rule tested.

Usage

validation_summary(cf)
validation_summary(cf)

Arguments

`cf`	Confrontation object from `validate::confront()`

Value

A data.frame with number of tests performed, number of passes, number of failures, and failure percentage for each validation rule.

Author(s)

Daniel Barnett

Get variable type name

Description

Get variable type name

Usage

vartype(x)
vartype(x)

Arguments

`x`	vector to be examined

Value

character vector of the variable's type

Author(s)

Tom Elliott

Get all variable types from data object

Description

Get all variable types from data object

Usage

vartypes(x)
vartypes(x)

Arguments

`x`	data object (data.frame or inzdf)

Value

a named vector of variable types

Package 'iNZightTools'

Help Index

Add suffix to string

Description

Usage

Arguments

Value

Examples

Aggregate data by categorical variables

Description

Usage

Arguments

Details

Value

Functions

Author(s)

See Also

Examples

Append rows to a dataset

Description

Usage

Arguments

Value

Author(s)

Get Data's Code

Description

Usage

Arguments

Details

Value

Author(s)

Collapse data by values of a categorical variable

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Combine variables into one categorical variable

Description

Usage

Arguments

Value

Author(s)

Examples

Convert variables to categorical variables

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Convert variables to dates

Description

Usage

Arguments

Value

Author(s)

See Also

Convert variables to date-time

Description

Usage

Arguments

Value

Author(s)

See Also

Create variable name

Description

Usage

Arguments

Value

Author(s)

Examples

Create new variables

Description

Usage

Arguments

Value