Skip to content

mstorath/CSSD

Repository files navigation

CSSD — Cubic smoothing splines for discontinuous signals

PyPI Python License: MIT CI MATLAB

Cubic smoothing splines for signals with a-priori unknown discontinuities — joint estimation of the spline and the discontinuity set via dynamic programming, with K-fold cross-validation for the regularisation parameters.

This is a reference implementation for the algorithms described in the paper

M. Storath, A. Weinmann, "Smoothing splines for discontinuous signals", Journal of Computational and Graphical Statistics, 2023, [Preprint]

Overview of main functionalities

  1. cssd.m computes a cubic smoothing spline with discontinuities (CSSD) for data (x,y). It is a solution of the following model of a smoothing spline $f$ with a-priori unknown discontinuities $J$

$$\min_{f, J} p \sum_{i=1}^N \left(\frac{y_i - f(x_i)}{\delta_i}\right)^2 + (1-p) \int_{[x_1, x_N] \setminus J} (f''(t))^2 dt + \gamma |J|.$$

where

  • $y_i = g(x_i) + \epsilon_i$ are samples of piecewise smooth function $g$ at data sites $x_1, \ldots, x_N$, and an estimate $\delta_i$ of the standard deviation of the errors $\epsilon_i$
  • the minimum is taken over all possible sets of discontinuities between two data sites $J \subset [x_1, x_N]\setminus {x_1, \ldots, x_N}$ and all functions $f$ that are twice continuously differentiable away from the discontinuities.
  • The model parameter $p \in (0, 1)$ controls the relative weight of the smoothness term (second term) and the data fidelity term.
  • The last term is a penalty for the number of discontinuities $|J|$ weighted by a parameter $\gamma > 0.$
  1. cssd_cv.m automatically determines values for the model parameters $p$ and $\gamma$ based on K-fold cross validation.

Quickstart

Python (Rust core)

pip install cssd
import numpy as np
from cssd import cssd, cssd_cv

x = np.linspace(0, 1, 100)
y = np.sin(4 * np.pi * x) - np.sign(x - 0.3) - np.sign(0.72 - x)

out = cssd(x, y, p=0.999, gamma=8.0)
out.discont      # detected jump locations
out.pp(x)        # evaluate the piecewise spline

cv = cssd_cv(x, y, cv_type="random", cv_arg=5)
cv.p, cv.gamma, cv.fit.discont

The Python package wraps a Rust extension built with PyO3 and maturin; see crates/cssd-core for the algorithm crate and crates/cssd-py for the bindings.

MATLAB (reference implementation, unchanged)

  1. Execute "install_cssd.m" which adds the folder and all subfolders to the Matlab path.
  2. Execute any m-file from the demos folder

Examples

Synthetic data

Synthetic signal

A synthetic signal is sampled at $N = 100$ random data sites $x_i$ and corrupted by zero mean Gaussian noise with standard deviation $0.1.$ The results of the discussed model are shown for $p=0.999$ and different parameters of $\gamma,$ where $\gamma=\infty$ corresponds to classical smoothing splines. The thick lines represent the results of the shown sample realization. The shaded areas depict the $2.5 \%$ to $97.5 \%$ (pointwise) quantiles of $1000$ realizations. The histograms under the plots show the frequency of the detected discontinuity locations over all realizations.

Stock data

Stock

The dots represent the logarithm of the closing prices of the Meta stock from May 18, 2012, to May 19, 2022. The curve represents the CSSD with parameters determined by K-fold CV ($p = 0.4702$, $\gamma = 0.0069$). The dashed vertical lines indicate the discontinuities of the CSSD, and the ticks correspond to the date before the discontinuity.

Geyser data

Geyser

Fitting a CSSD to the Old Faithful data (circles): If the parameter is selected based on K-fold CV we obtain a result without discontinuities which coincides with a classical smoothing spline (solid curve). Keeping the selected $p$-parameter and lowering the $\gamma$ parameter sufficiently gives a two-phase regression curve (dashed curves) with a breakpoint near $x = 3$ (dashed vertical line), and the two curve segments are nearly linear. Both of the above parameter sets yield better CV-scores than a linear model (dotted line).

Reference

M. Storath, A. Weinmann, "Smoothing splines for discontinuous signals", Journal of Computational and Graphical Statistics, 2023

How to cite

If you use this software, please cite the paper above. GitHub's "Cite this repository" button on the repo page reads the version and date-released fields from CITATION.cff and renders BibTeX/APA.

See also

Sibling projects from the same research program on variational methods for signal and image processing:

  • Pottslab — multilabel image segmentation via the Potts / piecewise-constant Mumford-Shah model
  • L1TV — exact L1-TV regularisation of real- or circle-valued signals
  • MumfordShah2D — edge-preserving image restoration via the Mumford-Shah model
  • CircleMedianFilter — fast median filtering for phase or orientation data
  • DCEBE — bolus arrival time estimation for DCE-MRI signals

Related external projects:

License

Released under the MIT License. See LICENSE.

Packages

 
 
 

Contributors