mosum

Moving Sum Based Procedures for Changes in the Mean

Submodules

Package Contents

Functions

mosum

criticalValue(n, G_left, G_right, alpha)

Computes the asymptotic critical value for the MOSUM test

multiscale_bottomUp(x[, G, threshold, alpha, ...])

Multiscale MOSUM algorithm with bottom-up merging

multiscale_localPrune(x[, G, max_unbalance, ...])

Multiscale MOSUM algorithm with localised pruning

bandwidths_default(→ int)

Default choice for the set of multiple bandwidths

testData([model, lengths, means, sds, rand_gen, seed, ...])

Test data with piecewise constant mean

persp3D_multiscaleMosum(x[, mosum_args, threshold, ...])

3D Visualisation of multiscale MOSUM statistics

mosum.mosum(x, G, G_right=float('nan'), var_est_method=['mosum', 'mosum_min', 'mosum_max', 'custom'][0], var_custom=None, boundary_extension=True, threshold=['critical_value', 'custom'][0], alpha=0.1, threshold_custom=float('nan'), criterion=['eta', 'epsilon'][0], eta=0.4, epsilon=0.2, do_confint=False, level=0.05, N_reps=1000)

MOSUM procedure for multiple change point estimation

Computes the MOSUM detector, detects (multiple) change points and estimates their locations.

Parameters:
  • x (list) – input data

  • G (int) – bandwidth; should be less than ‘len(x)/2’

  • G_right (int) – if ‘G.right != G}, the asymmetric bandwidth ‘(G, G.right)’ will be used; if ‘max(G, G.right)/min(G, G.right) > 4’, a warning message is generated

  • var_est_method (how the variance is estimated; possible values are) – ‘mosum’ : both-sided MOSUM variance estimator ‘mosum_min’ : minimum of the sample variance estimates from the left and right summation windows ‘mosum_max’ : maximum of the sample variance estimates from the left and right summation windows ‘custom’ : a vector of ‘len(x)’ is to be parsed by the user; use ‘var.custom’ in this case to do so

  • var_custom (float) – vector (of the same length as ‘x}) containing local estimates of the variance or long run variance; use iff ‘var.est.method = “custom”’

  • boundary_extension (bool) – a logical value indicating whether the boundary values should be filled in with CUSUM values

  • threshold (Str) – indicates which threshold should be used to determine significance. By default, it is chosen from the asymptotic distribution at the given significance level ‘alpha`. Alternatively it is possible to parse a user-defined numerical value with ‘threshold.custom’.

  • alpha (float) – numeric value for the significance level with ‘0 <= alpha <= 1’; use iff ‘threshold = “critical_value”’

  • threshold_custom (float) – value greater than 0 for the threshold of significance; use iff ‘threshold = “custom”’

  • criterion (Str) – indicates how to determine whether each point ‘k’ at which MOSUM statistic exceeds the threshold is a change point; possible values are ‘eta’ : there is no larger exceeding in an ‘eta*G’ environment of ‘k’ ‘epsilon’ : ‘k’ is the maximum of its local exceeding environment, which has at least size ‘epsilon*G’

  • eta (float) – a positive numeric value for the minimal mutual distance of changes, relative to moving sum bandwidth (iff ‘criterion = “eta”’)

  • epsilon (float) – a numeric value in (0,1] for the minimal size of exceeding environments, relative to moving sum bandwidth (iff ‘criterion = “epsilon”’)

  • do_confint (bool) – flag indicating whether to compute the confidence intervals for change points

  • level (float) – use iff ‘do_confint = True’; a numeric value (‘0 <= level <= 1’) with which ‘100(1-level)%’ confidence interval is generated

  • N_reps (int) – use iff ‘do.confint = True’; number of bootstrap replicates to be generated

Returns:

  • mosum_obj object containing

  • x (list) – input data

  • G_left, G_right (int) – bandwidths

  • var_est_method, var_custom, boundary_extension (Str) – input

  • stat (list) – MOSUM statistics

  • rollsums (list) – MOSUM detector

  • var_estimation (list) – local variance estimates

  • threshold, alpha, threshold_custom – input

  • threshold_value (float) – threshold of MOSUM test

  • criterion, eta, epsilon – input

  • cpts (ndarray) – estimated change point

  • cpts_info (DataFrame) – information on change points, including detection bandwidths, asymptotic p-values, scaled jump sizes

  • do_confint (bool) – input

  • ci – confidence intervals

Examples

>>> import mosum
>>> xx = mosum.testData("blocks")["x"]
>>> xx_m  = mosum.mosum(xx, G = 50, criterion = "eta", boundary_extension = True)
>>> xx_m.summary()
>>> xx_m.print()
mosum.criticalValue(n, G_left, G_right, alpha)

Computes the asymptotic critical value for the MOSUM test

mosum.multiscale_bottomUp(x, G=None, threshold=['critical_value', 'custom'][0], alpha=0.1, threshold_function=None, eta=0.4, do_confint=False, level=0.05, N_reps=1000)

Multiscale MOSUM algorithm with bottom-up merging

Parameters:
  • x (list) – input data

  • G (int) –

    vector of bandwidths; given as either integers less than len(x)/2,

    or numbers between 0 and 0.5 describing the moving sum bandwidths relative to len(x)

  • threshold (Str) – indicates which threshold should be used to determine significance. By default, it is chosen from the asymptotic distribution at the given significance level ‘alpha`. Alternatively it is possible to parse a user-defined function with ‘threshold_function’.

  • alpha (float) – numeric value for the significance level with ‘0 <= alpha <= 1’; use iff ‘threshold = “critical_value”’

  • threshold_function (function) –

  • eta (float) – a positive numeric value for the minimal mutual distance of changes, relative to moving sum bandwidth (iff ‘criterion = “eta”’)

  • do_confint (bool) – flag indicating whether to compute the confidence intervals for change points

  • level (float) – use iff ‘do_confint = True’; a numeric value (‘0 <= level <= 1’) with which ‘100(1-level)%’ confidence interval is generated

  • N_reps (int) – use iff ‘do.confint = True’; number of bootstrap replicates to be generated

Returns:

  • multiscale_cpts object containing

  • x (list) – input data

  • G (int) – bandwidth vector

  • threshold, alpha, threshold_function, eta – input

  • cpts (ndarray) – estimated change point

  • cpts_info (DataFrame) – information on change points, including detection bandwidths, asymptotic p-values, scaled jump sizes

  • pooled_cpts (ndarray) – change point candidates

  • do_confint (bool) – input

  • ci – confidence intervals

Examples

>>> import mosum
>>> xx = mosum.testData("blocks")["x"]
>>> xx_m  = mosum.multiscale_bottomUp(xx, G = [50,100])
>>> xx_m.summary()
>>> xx_m.print()
mosum.multiscale_localPrune(x, G=None, max_unbalance=4, threshold='critical_value', alpha=0.1, threshold_function=None, criterion='eta', eta=0.4, epsilon=0.2, rule='pval', penalty='log', pen_exp=1.01, do_confint=False, level=0.05, N_reps=1000)

Multiscale MOSUM algorithm with localised pruning

xlist

input data

Gint
vector of bandwidths; given as either integers less than len(x)/2,

or numbers between 0 and 0.5 describing the moving sum bandwidths relative to len(x)

max_unbalancefloat

a numeric value for the maximal ratio between maximal and minimal bandwidths to be used for candidate generation, at least 1

thresholdStr

indicates which threshold should be used to determine significance. By default, it is chosen from the asymptotic distribution at the given significance level ‘alpha`. Alternatively it is possible to parse a user-defined function with ‘threshold_function’.

alphafloat

numeric value for the significance level with ‘0 <= alpha <= 1’; use iff ‘threshold = “critical_value”’

threshold_functionfunction

criterion : Str

indicates how to determine whether each point ‘k’ at which MOSUM statistic exceeds the threshold is a change point; possible values are ‘eta’ : there is no larger exceeding in an ‘eta*G’ environment of ‘k’ ‘epsilon’ : ‘k’ is the maximum of its local exceeding environment, which has at least size ‘epsilon*G’

etafloat

a positive numeric value for the minimal mutual distance of changes, relative to moving sum bandwidth (iff ‘criterion = “eta”’)

epsilonfloat

a numeric value in (0,1] for the minimal size of exceeding environments, relative to moving sum bandwidth (iff ‘criterion = “epsilon”’)

ruleStr

Choice of sorting criterion for change point candidates in merging step. Possible values are: ‘pval’ : smallest p-value ‘jump’ : largest (rescaled) jump size

penaltyStr

Type of penalty term to be used in Schwarz criterion; possible values are: ‘log’ : use ‘penalty = log(len(x))**pen_exp’ ‘polynomial’ : use ‘penalty = len(x)**pen_exp’

pen_expfloat

penalty exponent

do_confintbool

flag indicating whether to compute the confidence intervals for change points

levelfloat

use iff ‘do_confint = True’; a numeric value (‘0 <= level <= 1’) with which ‘100(1-level)%’ confidence interval is generated

N_repsint

use iff ‘do.confint = True’; number of bootstrap replicates to be generated

multiscale_cpts object containing x : list

input data

Gint

bandwidth vector

threshold, alpha, threshold_function, eta

input

cptsndarray

estimated change point

cpts_infoDataFrame

information on change points, including detection bandwidths, asymptotic p-values, scaled jump sizes

pooled_cptsndarray

change point candidates

do_confintbool

input

ci

confidence intervals

>>> import mosum
>>> xx = mosum.testData("mix")["x"]
>>> xx_m  = mosum.multiscale_localPrune(xx, G = [8,15,30,70])
>>> xx_m.summary()
>>> xx_m.print()
mosum.bandwidths_default(n, d_min=10, G_min=10, G_max=None) int

Default choice for the set of multiple bandwidths

mosum.testData(model=['custom', 'blocks', 'fms', 'mix', 'stairs10', 'teeth10'][1], lengths=None, means=None, sds=None, rand_gen=np.random.normal, seed=None, rand_gen_args=[0, 1])

Test data with piecewise constant mean

Generate piecewise stationary time series with independent innovations and change points in the mean.

Parameters:
  • model (str) – custom or pre-defined signal

  • lengths (int) – vector of segment lengths (custom only)

  • means (int) – vector of segment means (custom only)

  • sds (int) – vector of segment standard deviations (custom only)

  • rand_gen (function) – innovation function

  • seed (int) – random seed

  • rand_gen_args (ndarray) – arguments for rand_gen

Returns:

  • x (ndarray) – simulated data series

  • mu (ndarray) – signal

  • sigma (float) – standard deviation

  • cpts (ndarray) – true change points

Examples

>>> mosum.testData()
>>> mosum.testData("custom", lengths = [100,100], means=[0,1], sds= [1,1])
mosum.persp3D_multiscaleMosum(x, mosum_args=dict(), threshold=['critical_value', 'custom'][0], alpha=0.1, threshold_function=None, palette=cm.coolwarm, xlab='G', ylab='time', zlab='MOSUM')

3D Visualisation of multiscale MOSUM statistics

Parameters:
  • x (list) – input data

  • mosum_args (dict) – dictionary of keyword arguments to mosum

  • threshold (Str) – indicates which threshold should be used to determine significance. By default, it is chosen from the asymptotic distribution at the given significance level ‘alpha`. Alternatively it is possible to parse a user-defined function with ‘threshold_function’.

  • alpha (float) – numeric value for the significance level with ‘0 <= alpha <= 1’; use iff ‘threshold = “critical_value”’

  • threshold_function (function) –

  • palette (matplotlib.colors.LinearSegmentedColormap) – colour palette for plotting, accessible from matplotlib.cm

  • xlab (Str) – axis labels for plot

  • ylab (Str) – axis labels for plot

  • zlab (Str) – axis labels for plot

Examples

>>> import mosum
>>> xx = mosum.testData("blocks")["x"]
>>> mosum.persp3D_multiscaleMosum(xx)