Package 'BayesChange'

Title: Bayesian Methods for Change Points Analysis
Description: Perform change points detection on univariate and multivariate time series according to the methods presented by Asael Fabian Martínez and Ramsés H. Mena (2014) <doi:10.1214/14-BA878> and Corradin, Danese and Ongaro (2022) <doi:10.1016/j.ijar.2021.12.019>. It also clusters different types of time dependent data with common change points, see "Model-based clustering of time-dependent observations with common structural changes" (Corradin,Danese,KhudaBukhsh and Ongaro, 2024) <doi:10.48550/arXiv.2410.09552> for details.
Authors: Luca Danese [aut, cre, cph] , Riccardo Corradin [aut], Andrea Ongaro [aut]
Maintainer: Luca Danese <[email protected]>
License: GPL (>= 3)
Version: 2.0.0
Built: 2025-03-11 19:19:25 UTC
Source: https://github.com/lucadanese/bayeschange

Help Index


Clustering time dependent obsevations with common change points.

Description

The clust_cp function cluster observations with common change points. Data can be time series or survival functions.

Usage

clust_cp(
  data,
  n_iterations,
  n_burnin = 0,
  params = list(),
  print_progress = TRUE,
  user_seed = 1234,
  kernel
)

Arguments

data

a vector or a matrix. If a vector the algorithm for univariate time series is used. If a matrix, where rows are the observations and columns are the times, then the algorithm for multivariate time series is used.

n_iterations

number of MCMC iterations.

n_burnin

number of iterations that must be excluded when computing the posterior estimate.

params

a list of parameters:

If the time series is univariate the following must be specified:

  • q probability of a split in the split-merge proposal and acceleration step.

  • B number of orders for the normalization constant.

  • L number of split-merge steps for the proposal step.

  • alpha_SM α\alpha for the split-merge proposal and acceleration step.

  • gamma,a,b,c parameters of the integrated likelihood.

If the time series is multivariate the following must be specified:

  • q probability of a split in the split-merge proposal and acceleration step.

  • B number of orders for the normalization constant.

  • L number of split-merge steps for the proposal step.

  • gamma,k_0,nu_0,phi_0,m_0 parameters of the integrated likelihood.

If data are survival functions:

  • q probability of a split in the split-merge proposal and acceleration step.

  • B number of orders for the normalization constant.

  • L number of split-merge steps for the proposal step.

  • alpha_SM α\alpha for the split-merge proposal and acceleration step.

  • M number of Monte Carlo iterations when computing the likelihood of the survival function.

  • gamma recovery rate fixed constant for each population at each time.

  • alpha α\alpha for the acceptance ration in the split-merge procedure.

  • dt,a0,b0,c0,d0 parameters for the computation of the integrated likelihood of the survival functions.

  • MH_var variance for the Metropolis-Hastings estimation of the proportion of infected at time 0.

  • S0,R0 parameters for the SDE solver.

  • p prior average number of change points for each order.

print_progress

If TRUE (default) print the progress bar.

user_seed

seed for random distribution generation.

kernel

can be "ts" if data are time series or "epi" if data are survival functions.

Value

A ClustCpObj class object containing

  • $data vector or matrix with the data.

  • $n_iterations number of iterations.

  • $n_burnin number of burn-in iterations.

  • $clust a matrix where each row corresponds to the output cluster of the corresponding iteration.

  • $orders a multidimensional array where each slice is a matrix and represent an iteration. The row of each matrix correspond the order associated to the corresponding cluster.

  • $time computational time.

  • $lkl a matrix where each row is the likelihood of each observation computed at the corresponding iteration.

  • $norm_vec a vector containing the normalisation constant computed at the beginning of the algorithm.

  • $rho a vector with the final estimate of the proportion of infected individuals at time 0.

  • $kernel_ts if TRUE data are time series.

  • $kernel_epi if TRUE data are survival function.

  • $univariate_ts TRUE if data is an univariate time series, FALSE if it is a multivariate time series.

References

Corradin, R., Danese, L., KhudaBukhsh, W. R., & Ongaro, A. (2024). Model-based clustering of time-dependent observations with common structural changes. arXiv preprint arXiv:2410.09552.

Examples

## Univariate time series

data_mat <- matrix(NA, nrow = 5, ncol = 100)

data_mat[1,] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_mat[2,] <- as.numeric(c(rnorm(50,0,0.125), rnorm(50,1,0.225)))
data_mat[3,] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))
data_mat[4,] <- as.numeric(c(rnorm(25,0,0.135), rnorm(75,1,0.225)))
data_mat[5,] <- as.numeric(c(rnorm(25,0,0.155), rnorm(75,1,0.280)))

out <- clust_cp(data = data_mat, n_iterations = 5000, n_burnin = 1000,
                params = list(L = 1, B = 1000, gamma = 0.5), kernel = "ts")

print(out)

## Multivariate time series


data_array <- array(data = NA, dim = c(3,100,5))

data_array[1,,1] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_array[2,,1] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_array[3,,1] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))

data_array[1,,2] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_array[2,,2] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_array[3,,2] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))

data_array[1,,3] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))
data_array[2,,3] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))
data_array[3,,3] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))

data_array[1,,4] <- as.numeric(c(rnorm(25,0,0.135), rnorm(75,1,0.225)))
data_array[2,,4] <- as.numeric(c(rnorm(25,0,0.135), rnorm(75,1,0.225)))
data_array[3,,4] <- as.numeric(c(rnorm(25,0,0.135), rnorm(75,1,0.225)))

data_array[1,,5] <- as.numeric(c(rnorm(25,0,0.155), rnorm(75,1,0.280)))
data_array[2,,5] <- as.numeric(c(rnorm(25,0,0.155), rnorm(75,1,0.280)))
data_array[3,,5] <- as.numeric(c(rnorm(25,0,0.155), rnorm(75,1,0.280)))

out <- clust_cp(data = data_array, n_iterations = 3000, n_burnin = 1000,
                params = list(B = 1000, L = 1, gamma = 0.5, k_0 = 0.25,
                              nu_0 = 5, phi_0 = diag(0.1,3,3),
                              m_0 = rep(0,3)), kernel = "ts")

print(out)

## Epidemiological data



data_mat <- matrix(NA, nrow = 5, ncol = 50)

betas <- list(c(rep(0.45, 25),rep(0.14,25)),
              c(rep(0.55, 25),rep(0.11,25)),
              c(rep(0.50, 25),rep(0.12,25)),
              c(rep(0.52, 10),rep(0.15,40)),
              c(rep(0.53, 10),rep(0.13,40)))

inf_times <- list()

for(i in 1:5){

  inf_times[[i]] <- sim_epi_data(10000, 10, 50, betas[[i]], 1/8)

  vec <- rep(0,50)
  names(vec) <- as.character(1:50)

  for(j in 1:50){
    if(as.character(j) %in% names(table(floor(inf_times[[i]])))){
      vec[j] = table(floor(inf_times[[i]]))[which(names(table(floor(inf_times[[i]]))) == j)]
    }
  }
  data_mat[i,] <- vec
}

out <- clust_cp(data = data_mat, n_iterations = 100, n_burnin = 10,
                params = list(M = 100, L = 1, B = 1000), kernel = "epi")

print(out)

Clustering Epidemiological survival functions with common changes in time

Description

Clustering Epidemiological survival functions with common changes in time

Usage

clust_cp_epi(
  data,
  n_iterations,
  M,
  B,
  L,
  gamma = 1/8,
  alpha = 1,
  q = 0.1,
  dt = 0.1,
  a0 = 4,
  b0 = 10,
  c0 = 1,
  d0 = 1,
  MH_var = 0.01,
  S0 = 1,
  R0 = 0,
  p = 0.003,
  print_progress = TRUE,
  user_seed = 1234L
)

Arguments

data

a matrix where each entry is the number of infected for a population (row) at a specific discrete time (column).

n_iterations

Second value

M

number of Monte Carlo iterations when computing the likelihood of the survival function.

B

number of orders for the normalisation constant.

L

number of split-merge steps for the proposal step.

gamma

recovery rate fixed constant for each population at each time.

alpha

α\alpha for the acceptance ration in the split-merge procedure.

q

probability of performing a split when updating the single order for the proposal procedure.

dt, a0, b0, c0, d0

parameters for the computation of the integrated likelihood of the survival functions.

MH_var

variance for the Metropolis-Hastings estimation of the proportion of infected at time 0.

S0, R0

parameters for the SDE solver.

p

prior average number of change points for each order.

print_progress

If TRUE (default) print the progress bar.

user_seed

seed for random distribution generation.

Value

Function clust_cp_epi returns a list containing the following components:

  • $clust a matrix where each row corresponds to the output cluster of the corresponding iteration.

  • $orders a multidimensional matrix where each slice is a matrix with the orders associated to the output cluster of that iteration.

  • time computational time in seconds.

  • $llik a matrix containing the log-likelihood of each population at each iteration.

  • $rho traceplot for the proportion of infected individuals at time 0.

Examples

data_mat <- matrix(NA, nrow = 5, ncol = 50)

betas <- list(c(rep(0.45, 25),rep(0.14,25)),
              c(rep(0.55, 25),rep(0.11,25)),
              c(rep(0.50, 25),rep(0.12,25)),
              c(rep(0.52, 10),rep(0.15,40)),
              c(rep(0.53, 10),rep(0.13,40)))

 inf_times <- list()

 for(i in 1:5){

   inf_times[[i]] <- sim_epi_data(10000, 10, 50, betas[[i]], 1/8)

   vec <- rep(0,50)
   names(vec) <- as.character(1:50)

   for(j in 1:50){
     if(as.character(j) %in% names(table(floor(inf_times[[i]])))){
       vec[j] = table(floor(inf_times[[i]]))[which(names(table(floor(inf_times[[i]]))) == j)]
     }
   }
   data_mat[i,] <- vec
 }

 out <- clust_cp_epi(data = data_mat, n_iterations = 3000, M = 250, B = 1000, L = 1)

Clustering multivariate times series with common changes in time

Description

Clustering multivariate times series with common changes in time

Usage

clust_cp_multi(
  data,
  n_iterations,
  B,
  L,
  gamma,
  k_0,
  nu_0,
  phi_0,
  m_0,
  q = 0.5,
  alpha_SM = 0.1,
  print_progress = TRUE,
  user_seed = 1234L
)

Arguments

data

a multidimensional matrix where each element is a matrix whose rows are the observations and columns the dimensions.

n_iterations

number of MCMC iterations.

B

number of orders for the normalisation constant.

L

number of split-merge steps for the proposal step.

gamma, k_0, nu_0, phi_0, m_0

parameters of the integrated likelihood.

q

probability of a split in the split-merge proposal and acceleration step.

alpha_SM

α\alpha for the split-merge proposal and acceleration step.

print_progress

If TRUE (default) print the progress bar.

user_seed

seed for random distribution generation.

Value

Function clust_cp_multi returns a list containing the following components:

  • $clust a matrix where each row corresponds to the output cluster of the corresponding iteration.

  • $orders a multidimensional array where each slice is a matrix and represent an iteration. The row of each matrix correspond the order associated to the corresponding cluster.

  • time computational time in seconds.

  • $norm_vec a vector containing the normalisation constant computed at the beginning of the algorithm.

Examples

data_array <- array(data = NA, dim = c(3,100,5))

data_array[1,,1] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_array[2,,1] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_array[3,,1] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))

data_array[1,,2] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_array[2,,2] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_array[3,,2] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))

data_array[1,,3] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))
data_array[2,,3] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))
data_array[3,,3] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))

data_array[1,,4] <- as.numeric(c(rnorm(25,0,0.135), rnorm(75,1,0.225)))
data_array[2,,4] <- as.numeric(c(rnorm(25,0,0.135), rnorm(75,1,0.225)))
data_array[3,,4] <- as.numeric(c(rnorm(25,0,0.135), rnorm(75,1,0.225)))

data_array[1,,5] <- as.numeric(c(rnorm(25,0,0.155), rnorm(75,1,0.280)))
data_array[2,,5] <- as.numeric(c(rnorm(25,0,0.155), rnorm(75,1,0.280)))
data_array[3,,5] <- as.numeric(c(rnorm(25,0,0.155), rnorm(75,1,0.280)))

out <- clust_cp_multi(data = data_array, n_iterations = 3000, B = 1000, L = 1,
                        gamma = 0.1, k_0 = 0.25, nu_0 = 5, phi_0 = diag(0.1,3,3), m_0 = rep(0,3))

Clustering univariate times series with common changes in time

Description

Clustering univariate times series with common changes in time

Usage

clust_cp_uni(
  data,
  n_iterations,
  B,
  L,
  gamma,
  a = 1,
  b = 1,
  c = 1,
  q = 0.5,
  alpha_SM = 0.1,
  print_progress = TRUE,
  user_seed = 1234L
)

Arguments

data

a matrix where each row is an observation and each column corresponds to a discrete time.

n_iterations

number of MCMC iterations.

B

number of orders for the normalisation constant.

L

number of split-merge steps for the proposal step.

gamma, a, b, c

parameters γ\gamma of the integrated likelihood.

q

probability of a split in the split-merge proposal and acceleration step.

alpha_SM

α\alpha for the split-merge proposal and acceleration step.

print_progress

If TRUE (default) print the progress bar.

user_seed

seed for random distribution generation.

Value

Function clust_cp_uni returns a list containing the following components:

  • $clust a matrix where each row corresponds to the output cluster of the corresponding iteration.

  • $orders a multidimensional array where each slice is a matrix and represent an iteration. The row of each matrix correspond the order associated to the corresponding cluster.

  • $time computational time in seconds.

  • $norm_vec a vector containing the normalisation constant computed at the beginning of the algorithm.

Examples

data_mat <- matrix(NA, nrow = 5, ncol = 100)

data_mat[1,] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_mat[2,] <- as.numeric(c(rnorm(50,0,0.125), rnorm(50,1,0.225)))
data_mat[3,] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))
data_mat[4,] <- as.numeric(c(rnorm(25,0,0.135), rnorm(75,1,0.225)))
data_mat[5,] <- as.numeric(c(rnorm(25,0,0.155), rnorm(75,1,0.280)))

out <- clust_cp_uni(data = data_mat, n_iterations = 5000, B = 1000, L = 1, gamma = 0.5)

ClustCpObj class constructor

Description

A constructor for the ClustCpObj class. The class ClustCpObj contains...

Usage

ClustCpObj(
  data = NULL,
  n_iterations = NULL,
  n_burnin = NULL,
  clust = NULL,
  orders = NULL,
  time = NULL,
  lkl = NULL,
  norm_vec = NULL,
  rho = NULL,
  kernel_ts = NULL,
  kernel_epi = NULL,
  univariate_ts = NULL
)

Arguments

data

a vector or a matrix containing the values of the time series;

n_iterations

number of iterations of the MCMC algorithm;

n_burnin

number of MCMC iterations to exclude in the posterior estimate;

clust

a matrix with the clustering of each iteration.

orders

a matrix where each row corresponds to the output order of the corresponding iteration;

time

computational time in seconds;

lkl

a vector with the likelihood of the final clustering.

norm_vec

a vector with the estimated normalization constant.

rho

a vector with the estimated proportion of infected individuals for each observation.

kernel_ts

if TRUE data are time series.

kernel_epi

if TRUE data are survival functions.

univariate_ts

TRUE/FALSE if time series is univariate or not;


Detect change points on time series.

Description

The detect_cp function detect change points on univariate and multivariate time series.

Usage

detect_cp(
  data,
  n_iterations,
  n_burnin = 0,
  params = list(),
  print_progress = TRUE,
  user_seed = 1234
)

Arguments

data

a vector or a matrix. If a vector the algorithm for univariate time series is used. If a matrix, where rows are the observations and columns are the times, then the algorithm for multivariate time series is used.

n_iterations

number of MCMC iterations.

n_burnin

number of iterations that must be excluded when computing the posterior estimate.

params

a list of parameters:

If the time series is univariate the following must be specified:

  • q probability of performing a split at each iteration.

  • phi parameter ϕ\phi of the integrated likelihood function.

  • a, b, c parameters of the Normal-Gamma prior for μ\mu and λ\lambda.

  • par_theta_c, par_theta_d parameters of the shifted Gamma prior for θ\theta.

If the time series is multivariate the following must be specified:

  • q probability of performing a split at each iteration.

  • k_0, nu_0, phi_0, m_0 parameters for the Normal-Inverse-Wishart prior for (μ,λ)(\mu,\lambda).

  • par_theta_c, par_theta_d parameters for the shifted Gamma prior for θ\theta.

  • prior_var_gamma parameters for the Gamma prior for γ\gamma.

  • print_progress If TRUE (default) print the progress bar.

  • user_seed seed for random distribution generation.

print_progress

If TRUE (default) print the progress bar.

user_seed

seed for random distribution generation.

Value

A DetectCpObj class object containing

  • $data vector or matrix with the data.

  • $n_iterations number of iterations.

  • $n_burnin number of burn-in iterations.

  • $orders matrix where each entries is the assignment of the realization to a block. Rows are the iterations and columns the times.

  • $time computational time.

  • $gammaMCMC traceplot for γ\gamma.

  • $gamma_MCMC_01 a 0/10/1 vector, the nn-th element is equal to 11 if the proposed γ\gamma was accepted, 00 otherwise.

  • $sigma_MCMC traceplot for σ\sigma.

  • $sigma_MCMC_01 a 0/10/1 vector, the nn-th element is equal to 11 if the proposed σ\sigma was accepted, 00 otherwise.

  • $theta_MCMC traceplot for θ\theta.

  • $univariate_ts TRUE if data is an univariate time series, FALSE if it is a multivariate time series.

References

Martínez, A. F., & Mena, R. H. (2014). On a Nonparametric Change Point Detection Model in Markovian Regimes. Bayesian Analysis, 9(4), 823–858. doi:10.1214/14-BA878

Corradin, R., Danese, L., & Ongaro, A. (2022). Bayesian nonparametric change point detection for multivariate time series with missing observations. International Journal of Approximate Reasoning, 143, 26–43. doi:10.1016/j.ijar.2021.12.019

Examples

## Univariate time series

data_vec <- as.numeric(c(rnorm(50,0,0.1), rnorm(50,1,0.25)))


out <- detect_cp(data = data_vec, n_iterations = 2500, n_burnin = 500,
                 params = list(q = 0.25, phi = 0.1, a = 1, b = 1, c = 0.1))

print(out)

## Multivariate time series

data_mat <- matrix(NA, nrow = 3, ncol = 100)

data_mat[1,] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_mat[2,] <- as.numeric(c(rnorm(50,0,0.125), rnorm(50,1,0.225)))
data_mat[3,] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))


out <- detect_cp(data = data_mat, n_iterations = 2500, n_burnin = 500,
                 params = list(q = 0.25, k_0 = 0.25, nu_0 = 4, phi_0 = diag(1,3,3), m_0 = rep(0,3),
                               par_theta_c = 2, par_theta_d = 0.2, prior_var_gamma = 0.1))

print(out)

Detect Change Points on multivariate time series

Description

Detect Change Points on multivariate time series

Usage

detect_cp_multi(
  data,
  n_iterations,
  q,
  k_0,
  nu_0,
  phi_0,
  m_0,
  par_theta_c = 1,
  par_theta_d = 1,
  prior_var_gamma = 0.1,
  print_progress = TRUE,
  user_seed = 1234L
)

Arguments

data

a matrix where each row is a component of the time series and the columns correpospond to the times.

n_iterations

number of MCMC iterations.

q

probability of performing a split at each iteration.

k_0, nu_0, phi_0, m_0

parameters for the Normal-Inverse-Wishart prior for (μ,λ)(\mu,\lambda).

par_theta_c, par_theta_d

parameters for the shifted Gamma prior for θ\theta.

prior_var_gamma

parameters for the Gamma prior for γ\gamma.

print_progress

If TRUE (default) print the progress bar.

user_seed

seed for random distribution generation.

Value

Function detect_cp_multi returns a list containing the following components:

  • $orders a matrix where each row corresponds to the output order of the corresponding iteration.

  • time computational time in seconds.

  • $gamma_MCMC traceplot for γ\gamma.

  • $gamma_MCMC_01 a 0/10/1 vector, the nn-th element is equal to 11 if the proposed γ\gamma was accepted, 00 otherwise.

  • $sigma_MCMC traceplot for σ\sigma.

  • $sigma_MCMC_01 a 0/10/1 vector, the nn-th element is equal to 11 if the proposed σ\sigma was accepted, 00 otherwise.

  • $theta_MCMC traceplot for θ\theta.

Examples

data_mat <- matrix(NA, nrow = 3, ncol = 100)

data_mat[1,] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_mat[2,] <- as.numeric(c(rnorm(50,0,0.125), rnorm(50,1,0.225)))
data_mat[3,] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))

out <- detect_cp_multi(data = data_mat,
                              n_iterations = 2500,
                              q = 0.25,k_0 = 0.25, nu_0 = 4, phi_0 = diag(1,3,3), m_0 = rep(0,3),
                              par_theta_c = 2, par_theta_d = 0.2, prior_var_gamma = 0.1)

Detect Change Points on an univariate time series.

Description

Detect Change Points on an univariate time series.

Usage

detect_cp_uni(
  data,
  n_iterations,
  q,
  phi,
  a,
  b,
  c,
  par_theta_c = 1,
  par_theta_d = 1,
  print_progress = TRUE,
  user_seed = 1234L
)

Arguments

data

vector of observations.

n_iterations

number of MCMC iteration.

q

probability of performing a split at each iterations.

phi

parameter ϕ\phi of the integrated likelihood function.

a, b, c

parameters of the Normal-Gamma prior for μ\mu and λ\lambda.

par_theta_c, par_theta_d

parameters of the shifted Gamma prior for θ\theta.

print_progress

If TRUE (default) print the progress bar.

user_seed

seed for random distribution generation.

Value

Function detect_cp_uni returns a list containing the following components:

  • $orders a matrix where each row corresponds to the output order of the corresponding iteration.

  • time computational time in seconds.

  • $sigma_MCMC traceplot for σ\sigma.

  • $sigma_MCMC_01 a 0/10/1 vector, the nn-th element is equal to 11 if the proposed σ\sigma was accepted, 00 otherwise.

  • $theta_MCMC traceplot for θ\theta.

Examples

data_vec <- as.numeric(c(rnorm(50,0,0.1), rnorm(50,1,0.25)))

out <- detect_cp_uni(data = data_vec,
                            n_iterations = 2500,
                            q = 0.25,
                            phi = 0.1, a = 1, b = 1, c = 0.1)

DetectCpObj class constructor

Description

A constructor for the DetectCpObj class. The class DetectCpObj contains...

Usage

DetectCpObj(
  data = NULL,
  n_iterations = NULL,
  n_burnin = NULL,
  orders = NULL,
  time = NULL,
  gamma_MCMC = NULL,
  gamma_MCMC_01 = NULL,
  sigma_MCMC = NULL,
  sigma_MCMC_01 = NULL,
  theta_MCMC = NULL,
  univariate_ts = NULL
)

Arguments

data

a vector or a matrix containing the values of the time series;

n_iterations

number of iterations of the MCMC algorithm;

n_burnin

number of MCMC iterations to exclude in the posterior estimate;

orders

a matrix where each row corresponds to the output order of the corresponding iteration;

time

computational time in seconds;

gamma_MCMC

traceplot for γ\gamma;

gamma_MCMC_01

a 0/10/1 vector, the nn-th element is equal to 11 if the proposed γ\gamma was accepted, 00 otherwise;

sigma_MCMC

traceplot for σ\sigma;

sigma_MCMC_01

a 0/10/1 vector, the nn-th element is equal to 11 if the proposed σ\sigma was accepted, 00 otherwise;

theta_MCMC

traceplot for θ\theta;

univariate_ts

TRUE/FALSE if time series is univariate or not;


Plot estimated partition

Description

The plot method plots the estimates partition through the salso algorithm, for a ClustCpObj class object.

Usage

## S3 method for class 'ClustCpObj'
plot(
  x,
  y = NULL,
  loss = "VI",
  maxNClusters = 0,
  nRuns = 16,
  maxZealousAttempts = 10,
  ...
)

Arguments

x

an object of class ClustCpObj.

y

parameter of the generic method.

loss

The loss function used to estimate the final partition, it can be "VI", "binder", "omARI", "NVI", "ID", "NID".

maxNClusters

maximum number of clusters in salso procedure.

nRuns

number of runs in salso procedure.

maxZealousAttempts

maximum number of zealous attempts in salso procedure.

...

parameter of the generic method.

Value

The function returns a ggplot object representing the time series or the survival functions colored according to the final partition.

Examples

## Time series

data_mat <- matrix(NA, nrow = 5, ncol = 100)

data_mat[1,] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_mat[2,] <- as.numeric(c(rnorm(50,0,0.125), rnorm(50,1,0.225)))
data_mat[3,] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))
data_mat[4,] <- as.numeric(c(rnorm(25,0,0.135), rnorm(75,1,0.225)))
data_mat[5,] <- as.numeric(c(rnorm(25,0,0.155), rnorm(75,1,0.280)))

out <- clust_cp(data = data_mat, n_iterations = 5000, n_burnin = 1000,
                params = list(L = 1, B = 1000, gamma = 0.5), kernel = "ts")

plot(out)


## Survival functions

data_mat <- matrix(NA, nrow = 5, ncol = 50)

betas <- list(c(rep(0.45, 25),rep(0.14,25)),
              c(rep(0.55, 25),rep(0.11,25)),
              c(rep(0.50, 25),rep(0.12,25)),
              c(rep(0.52, 10),rep(0.15,40)),
              c(rep(0.53, 10),rep(0.13,40)))

inf_times <- list()

for(i in 1:5){
  inf_times[[i]] <- sim_epi_data(10000, 10, 50, betas[[i]], 1/8)
  vec <- rep(0,50)
  names(vec) <- as.character(1:50)
  for(j in 1:50){
    if(as.character(j) %in% names(table(floor(inf_times[[i]])))){
      vec[j] = table(floor(inf_times[[i]]))[which(names(table(floor(inf_times[[i]]))) == j)]
    }
  }
  data_mat[i,] <- vec
}

out <- clust_cp(data = data_mat, n_iterations = 100, n_burnin = 10,
                params = list(M = 100, L = 1, B = 100), kernel = "epi")

plot(out)

Plot estimated change points

Description

The plot method plots the estimates change points estimated through the salso algorithm, for a DetectCpObj class object.

Usage

## S3 method for class 'DetectCpObj'
plot(
  x,
  y = NULL,
  plot_freq = FALSE,
  loss = "VI",
  maxNClusters = 0,
  nRuns = 16,
  maxZealousAttempts = 10,
  ...
)

Arguments

x

an object of class DetectCPObj.

y, ...

parameters of the generic method.

plot_freq

if TRUE also the histogram with the empirical frequency of each change point is plotted.

loss

The loss function used to estimate the final partition, it can be "VI", "binder", "omARI", "NVI", "ID", "NID".

maxNClusters

maximum number of clusters in salso procedure.

nRuns

number of runs in salso procedure.

maxZealousAttempts

maximum number of zealous attempts in salso procedure.

Value

The function returns a ggplot object representing the detected change points. If plot_freq = TRUE is plotted also an histogram with the frequency of times that a change point has been detected in the MCMC chain.

Examples

data_mat <- matrix(NA, nrow = 3, ncol = 100)

data_mat[1,] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_mat[2,] <- as.numeric(c(rnorm(50,0,0.125), rnorm(50,1,0.225)))
data_mat[3,] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))

out <- detect_cp(data = data_mat, n_iterations = 2500, n_burnin = 500,
                 params = list(q = 0.25, k_0 = 0.25, nu_0 = 4, phi_0 = diag(1,3,3),
                               m_0 = rep(0,3), par_theta_c = 2, par_theta_d = 0.2,
                               prior_var_gamma = 0.1))
plot(out)

Estimate the change points of the data

Description

The posterior_estimate method estimates the change points of the data making use of the salso algorithm, for a DetectCPObj class object.

Usage

## S3 method for class 'ClustCpObj'
posterior_estimate(
  object,
  loss = "VI",
  maxNClusters = 0,
  nRuns = 16,
  maxZealousAttempts = 10,
  ...
)

Arguments

object

an object of class ClustCpObj.

loss

The loss function used to estimate the final partition, it can be "VI", "binder", "omARI", "NVI", "ID", "NID".

maxNClusters

maximum number of clusters in salso procedure.

nRuns

number of runs in salso procedure.

maxZealousAttempts

maximum number of zealous attempts in salso procedure.

...

parameter of the generic method.

Details

put details here

Value

The function returns a vector with the cluster assignment of each observation.

References

#' D. B. Dahl, D. J. Johnson, and P. Müller (2022), Search Algorithms and Loss Functions for Bayesian Clustering, Journal of Computational and Graphical Statistics, 31(4), 1189-1201, doi:10.1080/10618600.2022.2069779.

Examples

data_mat <- matrix(NA, nrow = 5, ncol = 100)

data_mat[1,] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_mat[2,] <- as.numeric(c(rnorm(50,0,0.125), rnorm(50,1,0.225)))
data_mat[3,] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))
data_mat[4,] <- as.numeric(c(rnorm(25,0,0.135), rnorm(75,1,0.225)))
data_mat[5,] <- as.numeric(c(rnorm(25,0,0.155), rnorm(75,1,0.280)))

out <- clust_cp(data = data_mat, n_iterations = 5000, n_burnin = 1000,
                params = list(L = 1, B = 1000, gamma = 0.5), kernel = "ts")

posterior_estimate(out)

Estimate the change points of the data

Description

The posterior_estimate method estimates the change points of the data making use of the salso algorithm, for a DetectCPObj class object.

Usage

## S3 method for class 'DetectCpObj'
posterior_estimate(
  object,
  loss = "VI",
  maxNClusters = 0,
  nRuns = 16,
  maxZealousAttempts = 10,
  ...
)

Arguments

object

an object of class DetectCPObj.

loss

The loss function used to estimate the final partition, it can be "VI", "binder", "omARI", "NVI", "ID", "NID".

maxNClusters

maximum number of clusters in salso procedure.

nRuns

number of runs in salso procedure.

maxZealousAttempts

maximum number of zealous attempts in salso procedure.

...

parameter of the generic method.

Details

put details here

Value

The function returns a vector with the cluster assignment of each observation.

References

D. B. Dahl, D. J. Johnson, and P. Müller (2022), Search Algorithms and Loss Functions for Bayesian Clustering, Journal of Computational and Graphical Statistics, 31(4), 1189-1201, doi:10.1080/10618600.2022.2069779.

Examples

data_vec <- as.numeric(c(rnorm(50,0,0.1), rnorm(50,1,0.25)))


out <- detect_cp(data = data_vec, n_iterations = 2500, n_burnin = 500,
                 params = list(q = 0.25, phi = 0.1, a = 1, b = 1, c = 0.1))

posterior_estimate(out)

ClustCpObj print method

Description

The ClustCpObj method prints which algorithm was run.

Usage

## S3 method for class 'ClustCpObj'
print(x, ...)

Arguments

x

an object of class ClustCpObj.

...

parameter of the generic method.

Examples

data_mat <- matrix(NA, nrow = 5, ncol = 100)

data_mat[1,] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_mat[2,] <- as.numeric(c(rnorm(50,0,0.125), rnorm(50,1,0.225)))
data_mat[3,] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))
data_mat[4,] <- as.numeric(c(rnorm(25,0,0.135), rnorm(75,1,0.225)))
data_mat[5,] <- as.numeric(c(rnorm(25,0,0.155), rnorm(75,1,0.280)))

out <- clust_cp(data = data_mat, n_iterations = 5000, n_burnin = 1000,
                params = list(L = 1, B = 1000, gamma = 0.5), kernel = "ts")

print(out)

DetectCpObj print method

Description

The DetectCpObj method prints which algorithm was run.

Usage

## S3 method for class 'DetectCpObj'
print(x, ...)

Arguments

x

an object of class DetectCpObj.

...

parameter of the generic method.

Examples

data_mat <- matrix(NA, nrow = 3, ncol = 100)

data_mat[1,] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_mat[2,] <- as.numeric(c(rnorm(50,0,0.125), rnorm(50,1,0.225)))
data_mat[3,] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))

out <- detect_cp(data = data_mat, n_iterations = 2500, n_burnin = 500,
                params = list(q = 0.25, k_0 = 0.25, nu_0 = 4, phi_0 = diag(1,3,3), m_0 = rep(0,3),
                              par_theta_c = 2, par_theta_d = 0.2, prior_var_gamma = 0.1))
print(out)

Simulate epidemiological data

Description

Simulate epidemiological data

Usage

sim_epi_data(S0, I0, max_time, beta_vec, gamma_0, user_seed = 1234L)

Arguments

S0

number of individuals in the population.

I0

number of infected individuals at time 0.

max_time

maximum observed time.

beta_vec

vector with the infection rate for each discrete time.

gamma_0

the recovery rate. for the population, must be in (0,1)(0,1).

user_seed

seed for random distribution generation.

Value

Function sim_epi_data returns a vector with the simulated infection times.


ClustCpObj summary method

Description

The ClustCpObj method returns a summary of the algorithm.

Usage

## S3 method for class 'ClustCpObj'
summary(object, ...)

Arguments

object

an object of class ClustCpObj.

...

parameter of the generic method.

Examples

data_mat <- matrix(NA, nrow = 5, ncol = 100)

data_mat[1,] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_mat[2,] <- as.numeric(c(rnorm(50,0,0.125), rnorm(50,1,0.225)))
data_mat[3,] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))
data_mat[4,] <- as.numeric(c(rnorm(25,0,0.135), rnorm(75,1,0.225)))
data_mat[5,] <- as.numeric(c(rnorm(25,0,0.155), rnorm(75,1,0.280)))

out <- clust_cp(data = data_mat, n_iterations = 5000, n_burnin = 1000,
                params = list(L = 1, B = 1000, gamma = 0.5), kernel = "ts")

summary(out)

DetectCpObj summary method

Description

The DetectCpObj method returns a summary of the algorithm.

Usage

## S3 method for class 'DetectCpObj'
summary(object, ...)

Arguments

object

an object of class DetectCpObj;

...

parameter of the generic method.

Examples

data_mat <- matrix(NA, nrow = 3, ncol = 100)

data_mat[1,] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_mat[2,] <- as.numeric(c(rnorm(50,0,0.125), rnorm(50,1,0.225)))
data_mat[3,] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))

out <- detect_cp(data = data_mat, n_iterations = 2500, n_burnin = 500,
                params = list(q = 0.25, k_0 = 0.25, nu_0 = 4, phi_0 = diag(1,3,3), m_0 = rep(0,3),
                              par_theta_c = 2, par_theta_d = 0.2, prior_var_gamma = 0.1))
summary(out)