Title: | Scaled Logistic Regression |
---|---|
Description: | Maximum likelihood estimation of the scaled logit model parameters proposed in Dunning (2006) <doi:10.1002/sim.2282>. |
Authors: | Arseniy Khvorov [aut, cre] |
Maintainer: | Arseniy Khvorov <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.1 |
Built: | 2025-02-17 03:44:57 UTC |
Source: | https://github.com/khvorov45/sclr |
Fits the scaled logit model as well as logistic regression. Does a likelihood ratio test.
check_baseline( formula = NULL, data = NULL, fit_sclr = NULL, fit_lr = NULL, conf_lvl = 0.95, verbose = TRUE )
check_baseline( formula = NULL, data = NULL, fit_sclr = NULL, fit_lr = NULL, conf_lvl = 0.95, verbose = TRUE )
formula |
Formula to use for model fitting. |
data |
Optional dataframe. |
fit_sclr |
Fit object returned by |
fit_lr |
Fit object returned by |
conf_lvl |
Confidence level for the test |
verbose |
Whether to print message based on test result. |
A tibble
with a summary.
library(sclr) l1 <- sclr_ideal_data(n = 50, theta = 1e6, seed = 20191104) check_baseline(status ~ logHI, l1)
library(sclr) l1 <- sclr_ideal_data(n = 50, theta = 1e6, seed = 20191104) check_baseline(status ~ logHI, l1)
coef
returns MLE's.
vcov
returns the estimated variance-covariance matrix at MLE's.
confint
returns the confidence interval.
model.matrix
returns the model matrix (x).
model.frame
returns the model frame (x and y in one matrix).
## S3 method for class 'sclr' coef(object, ...) ## S3 method for class 'sclr' vcov(object, ...) ## S3 method for class 'sclr' confint(object, parm, level = 0.95, ...) ## S3 method for class 'sclr' model.matrix(object, ...) ## S3 method for class 'sclr' model.frame(formula, ...) ## S3 method for class 'sclr' logLik(object, ...)
## S3 method for class 'sclr' coef(object, ...) ## S3 method for class 'sclr' vcov(object, ...) ## S3 method for class 'sclr' confint(object, parm, level = 0.95, ...) ## S3 method for class 'sclr' model.matrix(object, ...) ## S3 method for class 'sclr' model.frame(formula, ...) ## S3 method for class 'sclr' logLik(object, ...)
object , formula
|
An object returned by |
... |
Not used. Needed to match generic signature. |
parm |
Parameter name, if missing, all parameters are considered. |
level |
Confidence level. |
The search engine behind get_protection_level
. Should not
usually be necessary to call this directly.
find_prot_titre_val( fit, var_name, newdata = NULL, prot_var_name = "prot_point", lvl = 0.5, ci_level = 0.95, tol = 10^(-7) )
find_prot_titre_val( fit, var_name, newdata = NULL, prot_var_name = "prot_point", lvl = 0.5, ci_level = 0.95, tol = 10^(-7) )
fit |
Object returned by |
var_name |
Name of the covariate for which the protection values should
be calculated. This name should appear in the formula of the call to
|
newdata |
A dataframe with all covariates except the one for which protection values should be calculated. |
prot_var_name |
A variable name among those returned by
|
lvl |
Protection level to find titre values for. Default is 0.5 (50%). |
ci_level |
Confidence level for the calculated interval. Default is 0.95. |
tol |
Tolerance. The values will be found numerically,
once the algorithm converges within |
A dataframe. Will have the same variables as newdata
with
the addition of the var_name
variable.
Calculates covariate values corresponding to a particular protection level.
Only accepts one covariate at a time, fixed values of all the others should
be provided. The search engine is find_prot_titre_val
.
get_protection_level( fit, var_name, newdata = NULL, lvl = 0.5, ci_level = 0.95, tol = 10^(-7) )
get_protection_level( fit, var_name, newdata = NULL, lvl = 0.5, ci_level = 0.95, tol = 10^(-7) )
fit |
Object returned by |
var_name |
Name of the covariate for which to find values corresponding
to a protection level. This name should appear in the formula in the call
to |
newdata |
A dataframe with all covariates except the one for which
protection values should be calculated. If there is only one covariate, can
be left as |
lvl |
Protection level to find covariate values for. Default is 0.5 (50%) |
ci_level |
Confidence level for the calculated interval. Default is 0.95. |
tol |
Tolerance. The values will be found numerically, once the
algorithm converges within |
A tibble
. Will have the same variables as
newdata
with the addition of the var_name
variable.
sclr
objectnew_sclr
creates the object sclr
returns.
is_sclr
checks if the object is of class sclr
.
new_sclr(fit, x, y, cl, mf, mt) is_sclr(fit)
new_sclr(fit, x, y, cl, mf, mt) is_sclr(fit)
fit |
A list returned by |
x |
Model matrix. |
y |
Model response. |
cl |
Call. |
mf |
Model frame. |
mt |
Model terms. |
sclr
object
A simulated dataset containing 5000 independent observations on antibody titres and the corresponding infection status. The data was simulated to resemble real influenza infection and haemagglutinin titre data.
one_titre_data
one_titre_data
A data frame with 5000 observations and 2 variables:
haemagglutinin-inhibiting (HI) titre. True simulated titre on a log scale.
influenza infection status. 1 - infected. 0 - not infected
The model behind the simulation was
Where
- Inverse logit function
= 0.5
= -5
= 2
Returns only the protection estimates. The only supported interval is a confidence interval (i.e. the interval for the estimated expected value).
## S3 method for class 'sclr' predict(object, newdata, ci_lvl = 0.95, ...)
## S3 method for class 'sclr' predict(object, newdata, ci_lvl = 0.95, ...)
object |
Object returned by |
newdata |
A dataframe with all covariates. Names should be as they
appear in the formula in the call to |
ci_lvl |
Confidence level for the calculated interval. |
... |
Not used. Needed to match generic signature. |
The model is
Where is the binary
outcome indicator, (e.g. 1 - infected, 0 - not infected).
-
covariate.
- number of covariates.
This function calculates
transformations at the covariate values found in
newdata
as well as the variance-covariance matrices of those
transformations. This is used to calculate the confidence intervals at the
given parameter values. The inverse logit transformation is then applied
to point estimates and interval bounds.
A tibble
obtained by adding the following
columns to newdata
:
prot_point_lin prot_l_lin prot_u_lin |
Point estimate, low and high bounds of the linear transformation. |
prot_sd_lin |
Estimated standard deviation of the linear transformation. |
prot_point prot_l prot_u |
Inverse logit-transformed point estimate, low and high bounds of the linear transformation. |
sclr
object.Summarises a sclr
object for printing. For a dataframe summary, see
tidy
.
## S3 method for class 'sclr' print(x, level = 0.95, ...) ## S3 method for class 'sclr' summary(object, level = 0.95, ...)
## S3 method for class 'sclr' print(x, level = 0.95, ...) ## S3 method for class 'sclr' summary(object, level = 0.95, ...)
x , object
|
An object returned by |
level |
Confidence level for the intervals. |
... |
Not used. Needed to match generic signature. |
Used to fit the scaled logit model from Dunning (2006).
sclr( formula, data = NULL, ci_lvl = 0.95, tol = 10^(-7), algorithm = c("newton-raphson", "gradient-ascent"), nr_iter = 2000, ga_iter = 2000, n_conv = 3, conventional_names = FALSE, seed = NULL )
sclr( formula, data = NULL, ci_lvl = 0.95, tol = 10^(-7), algorithm = c("newton-raphson", "gradient-ascent"), nr_iter = 2000, ga_iter = 2000, n_conv = 3, conventional_names = FALSE, seed = NULL )
formula |
an object of class "formula": a symbolic description of the model to be fitted. |
data |
a data frame. |
ci_lvl |
Confidence interval level for the parameter estimates. |
tol |
Tolerance. |
algorithm |
Algorithms to run. "newton-raphson" or "gradient-ascent". If a character vector, the algorithms will be applied in the order they are present in the vector. |
nr_iter |
Maximum allowed iterations for Newton-Raphson. |
ga_iter |
Maximum allowed iterations for gradient ascent. |
n_conv |
Number of times the algorithm has to converge (to work around local maxima). |
conventional_names |
If |
seed |
Seed for the algorithms. |
The model is logistic regression with an added parameter for the top
asymptote. That parameter is reported as theta
(or (Baseline)
if conventional_names = TRUE
). Note that it is reported on the logit
scale. See vignette("sclr-math")
for model specification,
log-likelihood, scores and second derivatives. The main default optimisation
algorithm is Newton-Raphson. Gradient ascent is used as a fallback by
default. Computing engine behind the fitting is sclr_fit
.
An object of class sclr
. This is a list with the following
elements:
parameters |
Maximum likelihood estimates of the parameter values. |
covariance_mat |
The variance-covariance matrix of the parameter estimates. |
algorithm |
Algorithm used. |
algorithm_return |
Everything the algorithm returned. |
n_converge |
The number of Newton-Raphson iterations (including resets) that were required for convergence. |
x |
Model matrix derived from |
y |
Response matrix derived from |
call |
The original call to |
model |
Model frame object derived from |
terms |
Terms object derived from model frame. |
ci |
Confidence intervals of the parameter estimates. |
log_likelihood |
Value of log-likelihood calculated at the ML estimates of parameters. |
formula |
Passed formula. |
data |
Passed data. |
Methods supported: print
,
vcov
, coef
,
model.frame
,
model.matrix
,
summary
, predict
,
tidy
(broom
package),
logLik
.
Dunning AJ (2006). "A model for immunological correlates of protection." Statistics in Medicine, 25(9), 1485-1497. https://doi.org/10.1002/sim.2282.
library(sclr) fit1 <- sclr(status ~ logHI, one_titre_data) summary(fit1)
library(sclr) fit1 <- sclr(status ~ logHI, one_titre_data) summary(fit1)
Computing engine behind sclr
.
sclr_fit( y, x, tol = 10^(-7), algorithm = c("newton-raphson", "gradient-ascent"), nr_iter = 2000, ga_iter = 2000, n_conv = 3, conventional_names = FALSE, seed = NULL )
sclr_fit( y, x, tol = 10^(-7), algorithm = c("newton-raphson", "gradient-ascent"), nr_iter = 2000, ga_iter = 2000, n_conv = 3, conventional_names = FALSE, seed = NULL )
y |
A vector of observations. |
x |
A design matrix. |
tol |
Tolerance. |
algorithm |
Algorithms to run. "newton-raphson" or "gradient-ascent". If a character vector, the algorithms will be applied in the order they are present in the vector. |
nr_iter |
Maximum allowed iterations for Newton-Raphson. |
ga_iter |
Maximum allowed iterations for gradient ascent. |
n_conv |
Number of times the algorithm has to converge (to work around local maxima). |
conventional_names |
If |
seed |
Seed for the algorithms. |
The likelihood maximisation can use the Newton-Raphson or the gradient ascent algorithms.
Allows variation of all parameters and the creation of an arbitrary number of covariates.
sclr_ideal_data( n = 1000, theta = 0, beta_0 = -5, covariate_list = list(logHI = list(gen_fun = function(n) rnorm(n, 2, 2), true_par = 2)), outcome_name = "status", seed = NULL, attach_true_vals = FALSE, attach_seed = FALSE )
sclr_ideal_data( n = 1000, theta = 0, beta_0 = -5, covariate_list = list(logHI = list(gen_fun = function(n) rnorm(n, 2, 2), true_par = 2)), outcome_name = "status", seed = NULL, attach_true_vals = FALSE, attach_seed = FALSE )
n |
Number of observations. |
theta |
Baseline risk parameter on the logit scale. |
beta_0 |
Intercept of the linear part. |
covariate_list |
A list in the form of |
outcome_name |
Name to give to the outcome |
seed |
Seed to set. If |
attach_true_vals , attach_seed
|
Whether to attach additional attributes. |
A tibble
.
# One titre one_titre <- sclr_ideal_data( covariate_list = list( logHI = list(gen_fun = function(n) rnorm(n, 2, 2), true_par = 2) ) ) sclr(status ~ logHI, one_titre) # Verify # Two titres two_titre <- sclr_ideal_data( covariate_list = list( logHI = list(gen_fun = function(n) rnorm(n, 2, 2), true_par = 2), logNI = list(gen_fun = function(n) rnorm(n, 2, 2), true_par = 1) ) ) sclr(status ~ logHI + logNI, two_titre) # Verify
# One titre one_titre <- sclr_ideal_data( covariate_list = list( logHI = list(gen_fun = function(n) rnorm(n, 2, 2), true_par = 2) ) ) sclr(status ~ logHI, one_titre) # Verify # Two titres two_titre <- sclr_ideal_data( covariate_list = list( logHI = list(gen_fun = function(n) rnorm(n, 2, 2), true_par = 2), logNI = list(gen_fun = function(n) rnorm(n, 2, 2), true_par = 1) ) ) sclr(status ~ logHI + logNI, two_titre) # Verify
Computes the log-likelihood of the scaled logit model at a given set of
parameter estimates (or the MLE if pars
is not supplied). Either
fit
or x
, y
and pars
need to be supplied.
sclr_log_likelihood(fit = NULL, x = NULL, y = NULL, pars = NULL)
sclr_log_likelihood(fit = NULL, x = NULL, y = NULL, pars = NULL)
fit |
An object returned by |
x |
Model matrix. Will be taken from |
y |
Model response. Will be taken from |
pars |
A named vector of parameter values. Will be taken from |
sclr
object.Summarises the objects returned by sclr
into a tibble
.
## S3 method for class 'sclr' tidy(x, ci_level = 0.95, ...)
## S3 method for class 'sclr' tidy(x, ci_level = 0.95, ...)
x |
An object returned by |
ci_level |
Confidence level for the intervals. |
... |
Not used. Needed to match generic signature. |
A tibble
with one row per model parameter.
Columns:
term |
Name of model parameter. |
estimate |
Point estimate. |
std_error |
Standard error. |
conf_low |
Lower bound of the confidence interval. |
conf_high |
Upper bound of the confidence interval. |
A simulated dataset containing 5000 independent observations on antibody titres and the corresponding infection status. The data was simulated to resemble real influenza infection and haemagglutinin + neuraminidase titre data.
two_titre_data
two_titre_data
A data frame with 5000 observations and 3 variables:
haemagglutinin-inhibiting (HI) titre. True simulated titre on a log scale.
neuraminidase-inhibiting titre. True simulated titre on a log scale.
influenza infection status. 1 - infected. 0 - not infected
The model behind the simulation was
Where
- Inverse logit function
= 0.5
= -7.5
= 2
= 2