Introduction
The efficient detection of abrupt changes in the statistical behavior of streaming data is a classical and fundamental problem in signal processing and statistics. The abrupt change-point usually corresponds to a triggering event that could have catastrophic consequences if it is not detected in a timely manner. Therefore, the goal is to detect the change as quickly as possible, subject to false alarm constraints. Such problems have been studied under the theoretical framework of sequential (or quickest) change detection [160], [194], [215]. With an increasing availability of high-dimensional streaming data, sequential change detection has become a centerpiece for many real-world applications, ranging from monitoring power networks [37], Internet traffic [100], cyber-physical systems [142], sensor networks [164], social networks [152], [165], epidemic detection [17], scientific imaging [162], genomic signal processing [179], seismology [7], video surveillance [109], and wireless communications [95].
various applications, the streaming data is high-dimensional and collected over networks, such as social networks, sensor networks, and cyber-physical systems. For this reason, the modern sequential change detection problem’s scope has been extended far beyond its traditional setting, often challenging the assumptions made by classical methods. These challenges include complex spatial and temporal dependence of the data streams, transient and dynamic changes, high-dimensionality, and structured changes, as explained below. These challenges have fostered new advances in sequential change detection theory and methods in recent years.
Complex data distributions. In modern applications, sequential data could have a complex spatial and temporal dependency, for instance, induced by the network structure [16], [68], [167]. In social networks, dependencies are usually due to interaction and information diffusion [116]: users in the social network have behavior patterns influenced by their past, while at the same time, each user in the network will be influenced by friends and connections. In sensor networks for river contamination monitoring [34], sensor observations tend to be spatially and temporally correlated.
Data dynamics. The statistical behavior of sequential data is often non-stationary, particularly in the post-change regime due to the dynamic behavior of the anomaly that causes the change. For example, after a linear outage in the power systems, the system’s transient behavior is dominated by the generators’ inertial response, and the post-change statistical behavior can be modeled using a sequence of temporally cascaded transient phases [171].
High-dimensionality. Sequential data in modern applications is usually high-dimensional. For example, in sensor networks, the Long Beach 3D seismic array consists of approximately 5300 seismic sensors that record data continuously for seismic activity detection and analysis. Changes in high-dimensional time series usually exhibit low-dimensional structures in the form of sparsity, low-rankness, and subset structures, which can be exploited to enhance the capability to detect weak signals quickly.
In this tutorial, our aim is to introduce standard methods and fundamental results in sequential change detection, along with recent advances. We also present new dimensions at the intersection of sequential change detection with other areas, as well as a selection of modern applications. We should emphasize that our focus is on sequential change detection, where the goal is to detect the change from sequential data in real-time and as soon as possible. Another important line of related research is offline change detection (e.g., [59], [188]), where the goal is to identify and localize changes in data sequence in a retrospective manner, which is not our focus here. Prior books and surveys on related topics include, for instance, change detection for dynamic systems [97], sequential analysis [98], [194], sequential change detection [19], [160], [215], Bayesian change detection [201], change detection assuming known pre- and post-change distributions [159] and using likelihood-based approaches [186], as well as time-series change detection [6].
The rest of the survey is organized as follows. In Section II, we present the basic problem setup and classical results. In Section III, we discuss several extensions and generalizations of the classical methods. In Section IV, we discuss new dimensions which intersect with sequential change detection, with some remarks on open questions. In Section V, we present some modern applications of sequential change detection. In Section VI, we make some concluding remarks.
Classical Results
A. Problem Definition
In the sequential change detection problem, also known as the quickest change detection (QCD) problem [131], [160], [215], the aim is to detect a possible change in the data generating distribution of a sequence of observations
To motivate the design of algorithms for sequential change detection, we consider the example of detecting a change in the mean of the data generating distribution. In Fig. 1(a), we plot a sample path of observations that are distributed according to a normal distribution with zero mean and unit variance
To motivate the need for sequential change detection procedures, we plot a sample path with samples distributed according to
B. Mathematical Preliminaries
Sequential change detection is closely related to the problem of statistical hypothesis testing, in which observations, whose distribution depends on the hypothesis, are used to decide which of the hypotheses is true. For the special case of binary hypothesis testing, we have two hypotheses, the null hypothesis and the alternative hypothesis. The classic Neyman-Pearson Lemma [136] establishes the form of the optimal test for this problem. In particular, consider the case of a single observation
The goal of sequential change detection is to design a stopping time on the observation sequence at which it is declared that a change has occurred. A stopping time is formally defined as follows:
Definition 1 (Stopping Time):
A stopping time with respect to a random sequence
The main results on stopping times that are most useful for sequential change detection problems include Doob’s Optional Stopping Theorem [43] and Wald’s Identity [185].
A quantity that plays an important role in the performance of sequential change detection algorithms is the Kullback-Leibler (KL) divergence between two distributions.
Definition 2 (KL Divergence):
The KL divergence between two pdfs
Note that
Define the log-likelihood ratio for an observation \begin{equation*} \ell (X) \mathrel {\mathrel {\mathop:}\hspace {-0.0672em}=} \log f_{1}(X)/f_{0}(X).\tag{1}\end{equation*}
C. Common Sequential Change Detection Procedures
We now present several commonly used sequential change detection procedures, including the Shewhart chart, CUSUM, and Shiryaev-Roberts procedure, which enjoy certain optimality properties that we will make more precise later in Section II-D. These algorithms can be efficiently implemented in an online setting, which makes them useful in practice. We also briefly discuss some other sequential change detection procedures.
1) Shewhart Chart:
One of the earliest sequential change detection procedures is the Shewhart chart [180], [181], which is widely used in industrial quality control [130]. The Shewhart chart was first introduced for the Gaussian model and based on comparing the instant observation to a threshold. We consider the log-likelihood-based modification and generalization of the standard Shewhart chart, where we compute the log-likelihood ratio based on the current observation (or the current batch of observations) and compare it with a threshold (called the control limit) to make a decision about the change. The property of the log-likelihood ratio discussed in Section II-B is utilized, which motivates the Shewhart chart:\begin{equation*} \tau _{\scriptscriptstyle \text {Sh}}= \inf \left \{{n\geq 1: \ell (X_{n}) > b }\right \},\end{equation*}
2) Cumulative Sum (CUSUM) Procedure:
The CUSUM procedure, first introduced by Page [149], addresses the problem of “information loss” in the Shewhart chart. The CUSUM procedure uses past observations and thus can achieve a significant performance gain, especially when the change is small. Although the CUSUM procedure was developed heuristically, it was later shown in [96], [122], [132], [170] that it has very strong optimality properties, which we will discuss further in Section II-D1.
The CUSUM procedure utilizes the properties of the cumulative log-likelihood ratio sequence:\begin{equation*} S_{n} = \sum _{k=1}^{n} \ell \left ({X_{k}}\right).\end{equation*}
\begin{equation*} \tau _{\scriptscriptstyle \text {C}}= \inf \left \{{n\geq 1: W_{n} = \left ({S_{n} - \min _{0\leq k \leq n } S_{k}}\right) \geq b }\right \}.\tag{2}\end{equation*}
\begin{equation*} W_{n} = \max _{0 \leq k \leq n }\sum _{i=k+1}^{n} \ell (X_{i}) = \max _{1\leq k \leq n+1 }\sum _{i=k}^{n} \ell (X_{i}).\tag{3}\end{equation*}
\begin{equation*} W_{n} = \left ({W_{n-1} + \ell (X_{n})}\right)^{+}, \quad W_{0} = 0,\end{equation*}
3) Shiryaev-Roberts Procedure:
The maximum likelihood interpretation of the CUSUM procedure is closely related to another popular algorithm in the literature, called the Shiryaev-Roberts (SR) procedure. In the SR procedure, the maximum in (3) is replaced by a sum and the log-likelihood ratio is replaced by likelihood ratio. The detection statistic for the SR procedure is then defined as:\begin{equation*} T_{n} \mathrel {\mathrel {\mathop:}\hspace {-0.0672em}=} \sum _{1\leq k \leq n }\prod _{i=k}^{n} e^{\ell (X_{i})},\tag{4}\end{equation*}
\begin{equation*} \tau _{\scriptscriptstyle \text {SR}}= \inf \left \{{n\geq 1: T_{n} \geq b}\right \}.\end{equation*}
\begin{equation*} T_{n} = \left ({1+T_{n-1}}\right) e^{\ell (X_{n})}, \quad T_{0} = 0.\end{equation*}
D. Optimality
We now briefly summarize optimality results in the existing literature for the above procedures. We begin by considering the non-Bayesian setting, where we do not assume a prior on the change-point
A fundamental problem in sequential change detection is to optimize the tradeoff between the false-alarm rate and the average detection delay, as illustrated in Section II-A using the example in Fig. 1. Controlling the false-alarm rate is commonly achieved by setting an appropriate threshold on a test statistic such as the one in (2). But the threshold also affects the average detection delay. A larger threshold incurs fewer false alarms but leads to a larger detection delay, and vice versa.
1) Minimax Optimality:
In non-Bayesian settings, the change-point is assumed to be a deterministic and unknown variable. In this case, the average run length (\begin{equation*} {\mathsf {ARL}}(\tau)= \mathbb {E}_{\infty }[\tau],\tag{5}\end{equation*}
\begin{equation*} {\mathsf {FAR}}(\tau) = \frac {1}{ {\mathsf {ARL}}(\tau)}=\frac {1}{ \mathbb {E}_{\infty }[\tau]}.\tag{6}\end{equation*}
\begin{equation*} { \mathcal {D}}_{\alpha }=\left \{{\tau: {\mathsf {FAR}}(\tau) \leq \alpha }\right \}.\tag{7}\end{equation*}
Finding a uniformly powerful test that minimizes the delay over all possible values of the change-point
Lorden considers the supremum of the average detection delay conditioned on the worst possible realizations. In particular, Lorden defines1:\begin{equation*} {\mathsf {WADD}}(\tau) = \underset {n \geq 1}{\mathrm {\sup }}~\mathop {\mathrm {ess\,sup}} ~\mathbb {E}_{n}\left [{(\tau -n)^{+}| X_{1}, {\dots }, X_{n-1}}\right],\tag{8}\end{equation*}
\begin{equation*} \text {minimize } {\mathsf {WADD}}(\tau) {~\text {subject to }} {\mathsf {FAR}}(\tau) \leq \alpha. \tag{9}\end{equation*}
Although the CUSUM procedure is exactly optimal under Lorden’s formulation, \begin{equation*} {\mathsf {CADD}}(\tau) = \underset {n \geq 1}{\mathrm {\sup }}~\mathbb {E}_{n}\left [{\tau -n| \tau \geq n}\right],\tag{10}\end{equation*}
In general, it may be challenging to exactly solve the problem in (9) and the corresponding problem defined using \begin{equation*} \frac { {\mathsf {CADD}}(\tau)}{\inf _{\tau \in { \mathcal {D}}_{\alpha }} {\mathsf {CADD}}(\tau)} \rightarrow 1, ~~\text {as } \alpha \rightarrow 0;\end{equation*}
\begin{equation*} { {\mathsf {CADD}}(\tau)}-{\inf _{\tau \in { \mathcal {D}}_{\alpha }} {\mathsf {CADD}}(\tau)} =O(1);\end{equation*}
\begin{equation*} { {\mathsf {CADD}}(\tau)}-{\inf _{\tau \in { \mathcal {D}}_{\alpha }} {\mathsf {CADD}}(\tau)} =o(1).\end{equation*}
Pollak’s formulation has been studied for the i.i.d. data in [154] and [197]. The first-order asymptotic optimality for Lorden’s formulation can also be extended to Pollak’s formulation. To show this, Lorden in [122] established a universal lower bound for
Theorem 1 (Lower Bound for{\mathsf {CADD}}
[96]):
As \begin{equation*} \inf _{\tau \in { \mathcal {D}}_{\alpha }} {\mathsf {CADD}}(\tau) \geq \frac {|\log \alpha |}{D(f_{1}|| f_{0})} \left ({1+o(1) }\right).\end{equation*}
\begin{equation*} {\mathsf {CADD}}(\tau _{\scriptscriptstyle \text {C}}) = {\mathsf {WADD}}(\tau _{\scriptscriptstyle \text {C}}) \sim \frac {|\log \alpha |}{D(f_{1} || f_{0})},\end{equation*}
The SR procedure is also asymptotically optimal and it was shown in [197] that by setting the threshold \begin{equation*} {\mathsf {CADD}}(\tau _{\scriptscriptstyle \text {SR}}) = \frac {|\log \alpha |}{D\left ({f_{1}|| f_{0}}\right)} + \xi + o(1),\end{equation*}
Finally, results in [133], [155], [196] show that the Shewhart chart is optimal for the criterion of maximizing the probability of detecting the change upon its occurrence subject to the \begin{equation*} \tau _{\scriptscriptstyle \text {Sh}}= \inf \left \{{n\geq 1~:~\frac {f_{\theta } (X_{n})}{f_{0}(X_{n})} > b}\right \},\end{equation*}
\begin{align*}&\text {maximize } \inf _{1\leq n < \infty } \mathbb P_{n}^{\theta } (\tau =n|\tau \geq n){~\text {subject to }} {\mathsf {FAR}}(\tau) \leq \alpha, \\\tag{11}\end{align*}
In summary, both the CUSUM and SR procedures are asymptotically optimal with respect to Lorden’s formulation and Pollak’s formulation. The FAR decays to zero exponentially with exponent
Tradeoff curve between
Some more optimality results are summarized as follows. Under Pollak’s criterion, it was shown in [197] that the SR algorithm is second-order asymptotically optimal, and that the SRP algorithm (Pollak’s version of the SR algorithm that starts from a quasi-stationary distribution of the SR statistic) is third-order asymptotically optimal (as was also first established in [154]). More importantly, in [197], it was proved that the SR-
2) Bayesian Optimality:
In the Bayesian setting, it is assumed that the change-point is a random variable \begin{align*} {\mathsf {ADD}}(\tau)=&\mathbb {E}\left [{(\tau - \Gamma)^{+}}\right] = \sum _{n=0}^{\infty } \pi _{n} \mathbb {E}_{n} \left [{(\tau - \Gamma)^{+}}\right], \tag{12}\\ \mathsf {PFA}(\tau)=&\mathbb {P}(\tau < \Gamma) = \sum _{n=0}^{\infty } \pi _{n} \mathbb {P}_{n} (\tau < \Gamma).\tag{13}\end{align*}
\begin{align*}&\text {minimize } {\mathsf {ADD}}(\tau) {~\text {subject to }} \mathsf {PFA}(\tau) \leq \alpha. \quad (\text {Shiryaev}) \\\tag{14}\end{align*}
\begin{equation*} \pi _{n} = \mathbb {P}\{ \Gamma = n \} = \rho (1-\rho)^{n-1}\mathbb I_{\{n \geq 1\}}, \quad \pi _{0} =0,\tag{15}\end{equation*}
The detection statistic of the Shiryaev algorithm is the posterior probability that the change has taken place given the observations so far. Denote by \begin{equation*} p_{n} = \mathbb {P}\left ({\Gamma \leq n \; | \; X_{1}^{n} }\right)\tag{16}\end{equation*}
\begin{equation*} p_{n+1} = \frac {\tilde {p}_{n} e^{\ell (X_{n+1})}}{ \tilde {p}_{n} e^{\ell (X_{n+1})} + \left ({1-\tilde {p}_{n} }\right)},\tag{17}\end{equation*}
\begin{equation*} \tau _{\scriptscriptstyle \text {S}}= \inf \left \{{ n\geq 1: p_{n} \geq b_{\alpha } }\right \},\tag{18}\end{equation*}
Theorem 2 (Optimal Bayesian Procedure[183],[184]):
When the threshold
An equivalent form of the Shiryaev statistic can be developed using the idea of the likelihood ratio test. This builds a connection to the earlier SR statistic defined in (4), and it reveals useful insights about the nature of the procedure. Consider two hypotheses: “\begin{equation*} R_{n+1,\rho } = \frac {1+R_{n,\rho }}{1-\rho } e^{\ell (X_{n+1})}, \quad R_{0,\rho } = 0.\tag{19}\end{equation*}
A generalized Shewhart chart is also Bayesian optimal, as shown in [155], in the sense that it minimizes the expected loss where the loss function is
3) Evaluating the Performance Metrics:
In the definition of the \begin{align*} {\mathsf {CADD}}(\tau _{\scriptscriptstyle \text {C}})=&{\mathsf {WADD}}(\tau _{\scriptscriptstyle \text {C}}) = \mathbb {E}_{1} \left [{ \tau _{\scriptscriptstyle \text {C}}- 1}\right],\\ {\mathsf {CADD}}(\tau _{\scriptscriptstyle \text {SR}})=&{\mathsf {WADD}}(\tau _{\scriptscriptstyle \text {SR}}) = \mathbb {E}_{1} \left [{ \tau _{\scriptscriptstyle \text {SR}}- 1}\right].\end{align*}
E. Other Sequential Change Detection Procedures
1) Mixture and Generalized Likelihood Ratio (GLR) Statistics:
The CUSUM and SR procedures require full knowledge of pre- and post-change distributions to obtain the log-likelihood ratio \begin{equation*} W_{n}^{\text {G}} = \max _{1\leq k \leq n+1 } \sup _{\theta \in \Theta } \sum _{i=k}^{n} \ell _{\theta } (X_{i}),\tag{20}\end{equation*}
The mixture method replaces the supremum over \begin{equation*} W_{n}^{\text {m}} = \max _{1\leq k \leq n+1 } \log \int _{\Theta }\prod _{i=k}^{n}\frac {f_{\theta } (X_{i})}{f_{0}(X_{i})} \omega (\theta) d\theta,\tag{21}\end{equation*}
2) EWMA:
Note that the CUSUM and SR procedures can achieve a significant gain in performance when compared to the Shewhart chart by making use of past observations, i.e., CUSUM and SR have memory. The exponentially weighted moving average (EWMA) chart is another type of sequential change detection procedure that employs past observations. The EWMA detection statistic was originally defined as
Generalizations and Extensions
A. General Asymptotic Theory for Non-i.i.d. Data
There has been a considerable amount of effort to generalize the optimality results for sequential change detection to the non-i.i.d. setting. Lai [96] initiated the development of a general minimax asymptotic theory for both Lorden’s and Pollak’s formulations, while Tartakovsky and Veeravalli [206] initiated the development of a general Bayesian asymptotic theory.
1) General Minimax Asymptotic Theory:
Under the minimax setting, Lai in [96] obtained a general lower bound for non-i.i.d. data on the
We now present a generalized CUSUM procedure for non-i.i.d. data. In this setting, conditional distributions are used to compute the likelihood ratios. In the pre- and post-change regimes, the conditional distribution of \begin{equation*} Y_{i} = \log \frac {f_{1,i}\left ({X_{i}|X_{1}^{i-1}}\right)}{f_{0,i}\left ({X_{i}|X_{1}^{i-1}}\right)},\quad \text {and } C_{n} = \max _{1\leq k\leq n+1} \sum _{i=k}^{n} Y_{i}.\end{equation*}
\begin{equation*} \tau _{\scriptscriptstyle \text {G}}= \inf \left \{{ n\geq 1: C_{n} \geq b }\right \}.\tag{22}\end{equation*}
The minimax optimality of the generalized CUSUM for the non-i.i.d. data was established in [96]. Under some regularity conditions, by setting the threshold \begin{equation*} \underset {m\leq t }{\mathrm {\max }} \frac {1}{t}\sum _{i=n}^{n+m} Y_{i} \to I \quad \text {a.s. } \mathbb {P}_{n}, ~~\text {as } t \to \infty \quad \forall n,\tag{23}\end{equation*}
\begin{align*} {\mathsf {CADD}}(\tau _{\scriptscriptstyle \text {G}})\sim&{\mathsf {WADD}}(\tau _{\scriptscriptstyle \text {G}}) \sim \underset {\tau \in { \mathcal {D}}_{\alpha }}{\mathrm {\inf }}~{\mathsf {WADD}}(\tau) \\\sim&\underset {\tau \in { \mathcal {D}}_{\alpha }}{\mathrm {\inf }}~{\mathsf {CADD}}(\tau) \sim \frac {|\log \alpha |}{I},\tag{24}\end{align*}
2) General Bayesian Asymptotic Theory:
Under the Bayesian setting, when the samples conditioned on the change-point are non-i.i.d., it is generally difficult to find an exact solution to the Shiryaev problem in (14). Tartakovsky and Veeravalli [206] showed that the Shiryaev algorithm is asymptotically optimal as
Similar to the i.i.d. case, we can define the posterior probability \begin{equation*} d = - \lim _{n\to \infty } \frac {\log \mathbb {P} (\Gamma > n)}{n},\end{equation*}
\begin{equation*} \frac {1}{t}\sum _{i=n}^{n+t} Y_{i} \to I \quad \text {a.s. } \mathbb {P}_{n} ~~\text {as } t \to \infty \quad \forall n,\tag{25}\end{equation*}
\begin{equation*} {\mathsf {ADD}}(\tau _{\scriptscriptstyle \text {S}}) \sim \inf _{\tau: \mathsf {PFA}(\tau)\leq \alpha } {\mathsf {ADD}}(\tau)\sim \frac {|\log \alpha |}{I + d}.\tag{26}\end{equation*}
B. Change-of-Measure for Accurate ARL Approximations
For CUSUM and SR procedures with i.i.d. samples, it may be relatively easy to evaluate their performance (such as the
1) Using Change-of-Measure to Analyze the {\mathsf{ARL}}
:
The main idea here is to relate finding
The analysis usually involves two steps. First, we aim to find the probability
Second, we will relate the above probability to the
2) Example: Analyzing MMD-Based Sequential Change Detection Procedure:
Below, we illustrate the change-of-measure technique by analyzing the non-parametric kernel-based maximum mean discrepancy (MMD) statistics (details can be found in [115]). The kernel MMD divergence, which measures the distance between two arbitrary distributions, is widely adopted in signal processing and machine learning. Given two sets of samples \begin{align*} \text {MMD}(X, Y)=&\frac {1}{n(n-1)} \sum _{i\neq j} \left \{{ k(x_{i}, x_{j}) + k\left ({y_{i}, y_{j}}\right)}\right. \\&\qquad \qquad \qquad \left.{-\,\, k\left ({x_{i}, y_{j}}\right) - k\left ({x_{j}, y_{i}}\right)}\right \},\end{align*}
The sequential change detection procedure based on the MMD statistic is then defined as follows [115]. At each time \begin{equation*} U_{t} = \frac {1} N \sum _{i=1}^{N} \text {MMD}\left ({\tilde X_{i}, X_{t-B+1}^{t}}\right).\end{equation*}
\begin{equation*} \tau _{\scriptscriptstyle \text {M}}= \inf \left \{{t~:~ Z_{t}'> b}\right \}.\end{equation*}
Theorem 3 ({\mathsf {ARL}}
of MMD-based Procedure[115]):
Let \begin{equation*} \frac {e^{b^{2}/2}}{b} \left \{{\frac {2B-1}{\sqrt {2\pi } B(B-1)} \nu \left ({b \sqrt {\frac {2(2B-1)}{B(B-1)}}}\right)}\right \}^{-1}(1+o(1)),\end{equation*}
We present the main step of the proof to Theorem3 to illustrate the change-of-measure technique. First, note that the event \begin{align*}&\mathbb P_{\infty }\left \{{\sup _{2\leq t\leq m}Z_{t}' \geq b}\right \} \\&\;= \mathbb E_{\infty }\left \{{ \frac {\sum _{t=2}^{m} e^{\xi _{t}}}{\sum _{s=2}^{m}e^{\xi _{s}}}\mathbb I_{\left \{{ \sup _{2\leq t\leq m}Z_{t}' \geq b }\right \}} }\right \} \\&\;= e^{-b^{2}/2}\sum _{t=2}^{m} \mathbb E_{t} \left \{{R_{t} e^{- \left [{\xi _{t} -b^{2}/2+ \log M_{t}}\right] }\mathbb I_{\left \{{\xi _{t} - b^{2}/2+ \log M_{t} \geq 0}\right \}}}\right \},\end{align*}
The numerical example in Fig. 3 demonstrates that the threshold
C. Non-Stationary and Multiple Changes
In various modern applications, for instance, line outage detection in power systems [171] and stochastic power supply control in data centers [173], the change is not stationary. There can be a sequence of multiple changes: one followed by another. Below, we review some recent advances in sequential detection of dynamic changes.
1) Sequential Change Detection Under Transient Dynamics:
In classical sequential change detection formulations [19], [160], [194], [215], the statistical behavior of the observations is characterized by one pre-change distribution and one post-change distribution (known or unknown). In other words, the statistical behavior after the change is stationary. This assumption may be too restrictive for many practical applications with more involved statistical behavior after the change-point.
An example of the problem where the observations are non-stationary after the change, is sequential change detection under transient dynamics, which was studied in [171], [173], [174], [251]. Specifically, the pre-change distribution does not change to a persistent post-change distribution instantaneously, but after several transient phases, each phase is associated with a distinct data generating distribution. The goal is to detect the change as quickly as possible, either during the transient phases or during the persistent phase. This problem is fundamentally different from detecting a transient change (see, e.g., [51], [52], [64]), where the system goes back to the pre-change mode after a single transient phase, and the goal is to detect the change within the transient phase. The problem is also related to sequential change detection in the presence of a nuisance change, where the presence of the nuisance change can be modeled as a transient phase. However, an alarm should be raised only if the critical change occurs [103].
Two algorithms were proposed and investigated in [171], [251] for the minimax setting, the dynamic-CUSUM (D-CUSUM), and the weighted dynamic-CUSUM (WD-CUSUM), where the change-point and the transient durations are assumed to be unknown and deterministic. The basic idea is to construct a generalized likelihood based algorithm taking the supremum over the unknown change-point and the durations of transient phases. It was shown in [171], [251] that the D-CUSUM and WD-CUSUM test statistics can be updated recursively, and thus are computationally efficient. In [251], it was demonstrated that both algorithms are adaptive to the unknown transient dynamics, although durations of transient phases were unknown and were not employed in algorithm implementation. Moreover, both the D-CUSUM (under certain conditions) and the WD-CUSUM algorithms were shown to be first-order asymptotically optimal in [251]. The Bayesian setting was investigated in [174], where the change-point and the durations of transient phases are assumed to be geometrically distributed. The optimal test was constructed, and a computationally efficient alternative test based on thresholding the posterior probability that the change has occurred was also proposed.
2) Sequential Detection of Moving Anomaly:
Existing studies on sequential change detection in networks usually assume that the change is persistent once it affects a node. However, there are scenarios where the change may not necessarily be persistent at a particular node; instead, it is persistent across the network as a whole, e.g., a moving anomaly in a sensor network. In this case, existing approaches using CUSUM statistics from each node, e.g., [55], [66], [126], [255], cannot be applied. Recently, the problem of sequential moving anomaly detection in networks was studied in [175], [176]. Specifically, after an anomaly emerges in the network, one node is affected by the anomaly at each time instant and receives data from a post-change distribution. The anomaly dynamically moves across the network with an unknown trajectory, and the node that it affects changes with time. Two approaches have been proposed to model the trajectory of the anomaly: the hidden Markov model [176], and the worst-case approach [175], which we discuss in the following.
The first approach (hidden Markov model) [176] models the anomaly’s trajectory as a Markov chain, and thus the samples are generated according to a hidden Markov model. The advantage of this model is that it takes into consideration the network’s topology, i.e., that the anomaly only moves from a node to one of its neighbors. In [176], a windowed GLR based algorithm was constructed and was shown to be first-order asymptotically optimal. Alternative algorithms were also designed with performance guarantees, including the dynamic SR procedure, recursive change-point estimation, and a mixture CUSUM algorithm.
The second approach (worst-case approach) [175] assumes that the anomaly’s trajectory is unknown but deterministic and considers the worst-case performance over all possible trajectories. A CUSUM-type procedure was constructed. The main idea is to use the mixture likelihood to construct a test statistic, which is further used to build a procedure of the CUSUM-type. This procedure was shown to be exactly optimal in [175] when the sensors are homogeneous. This idea has been further generalized to solve the sequential moving anomaly detection problem with heterogeneous sensors and has been shown to be first-order asymptotically optimal [172].
3) Multiple Change Detection:
A related line of research is multiple change detection in the offline setting, which aims to estimate multiple change-points from observations in a retrospective study. Various methods were proposed to estimate the number and locations of change-points, including hierarchical clustering based method [125], binary segmentation type methods [9], [40], [41], [60], [61], [220], (penalized) least-squared methods [23], [106], [107], [108], [240], Schwarz criterion [239], kernel-based algorithms [8], [69], and so on. Another line of work aims to reduce the computational complexity of the multiple change detection methods, such as [71], [89], [169]. We refer to [210] for a recent review on multiple change detection. Some offline multiple change detection algorithms can motivate the development of their online versions.
4) Decentralized and Asynchronous Change Detection in Networks:
When the information for detection is distributed across a network of sensors, detection problems fall under the umbrella of distributed (or decentralized) detection [31], [212], [214], [216]. In the decentralized setting, each sensor sends messages to the fusion center based on the observations it has received so far. The fusion center may provide feedback to sensors and make the final decision. The problem of decentralized sequential change detection in distributed sensor systems was introduced in [217], considering the observation model where all sensors are affected by the change at the same time. There have been a number of papers on the topic since then, see, e.g., [127], [205], [207]. A more recent (and practical) perspective is that the change may affect sensors with delay, i.e., different sensors may observe the change at different times, which we will present in the following.
In the case of multiple data streams, the change may happen asynchronously for different sensors. When we desire to detect the first onset of change, it is proposed in [66] to monitor each data stream by local CUSUM procedures and raise the alarm when any sensor raises an alarm. The sum of local CUSUM statistics has been considered in [126] and was shown to be asymptotically optimal. The problem where the change propagates from one sensor to the next with known Markov dynamics after the change was studied in [164], and an asymptotically optimal test was developed. A recent procedure proposed in [231] finds an optimal combination of local data streams accounting for their delays in being affected by the change, which can boost the signal-to-noise ratio and reduce the detection delay especially when the signal is weak.
In [255], the problem of sequentially detecting a significant change (i.e., when at least
D. Robust Sequential Change Detection
Many classical procedures (for instance, CUSUM and SR) require exact knowledge of the pre- and post-change distributions. However, in real-world scenarios, the actual data distributions may be complex and different from what we have assumed. There can be adversarial attacks that significantly perturb the data distributions. This can lead to performance degradation of the optimal procedures. How to make the procedures more robust in the presence of model mismatch is the topic of robust sequential change detection.
1) Robustness to Model Uncertainties:
There have been many efforts to make the detection procedure more robust to model uncertainties. One approach is to treat the pre- and post-change distributions to belong to some parametric family with unknown parameters in uncertainty sets and then form the GLR based test as we discussed earlier in Section II-E. Another approach to developing good tests in the presence of model uncertainties is through the use of minimax robustness as the criterion as is done in the seminal work of Huber on robust hypothesis testing [82], [83]. The solution to the robust hypothesis testing problem usually relies on finding the least favorable distributions (LFDs) within the uncertainty classes, with likelihood ratio of these distributions used in constructing the robust tests. It can be shown that LFDs exist for uncertainty classes satisfying a certain joint stochastic boundedness (JSB) condition [218]. The problem of minimax robust sequential change detection was explored in [213], in which an exactly optimal solution was obtained for uncertainty classes satisfying the JSB condition under a generalized Lorden criterion. An extension of this result to asymptotic minimax robust sequential change detection is studied in [129], where a weaker notion of stochastic boundedness is introduced.
A robust CUSUM algorithm is developed in [27] by making a connection to convex optimization, which is particularly useful for the high-dimensional setting and leads to a tractable formulation. For instance, assuming the covariance matrix lies in an uncertainty set centered around a nominal value, the problem of finding LFDs can be cast as solving a semidefinite program and can be solved efficiently.
2) Robustness to Adversarial Attacks:
The problem of sequential change detection in sensor networks in the presence of adversarial attacks [102] was investigated in [20], [56]. In the presence of Byzantine attacks, an adversary may modify observations arbitrarily to defer the detection of a change and increase the false alarm rate. In [20], it is assumed that the change affects all but one compromised sensor, and the detection strategy is to raise a global alarm until two local CUSUMs exceed the threshold. In [56], a more general setting was investigated, where an unknown subset of sensors can be compromised. Sequential detection strategies were designed by waiting until
E. Data-Efficient Sequential Change Detection
There is usually a cost associated with making observations in practical engineering applications, e.g., the power consumption in sensor networks. An extension of Shiryaev’s formulation (Section II-D2) was investigated in [11] by including an additional constraint on the average number of observations taken before the change. The cost of observations after the change is included in the detection delay. Specifically, whether to take an observation at time
F. High-Dimensional Streaming Data
High-dimensional data usually have low-dimensional structures, such as sparsity and low-rankness, which can be leveraged to achieve improved detection performance and computational efficiency. Meanwhile, missing data is very common for high-dimensional streaming data. In this section, we review recent advances in these directions.
1) Sparse Change in Multiple Data Streams:
For multiple independent streams of data, a mixture procedure was developed in [236] to monitor parallel streams for a change-point that affects only a subset of them (usually sparse). Both the subset being affected and the post-change distribution are unknown. The mixture model hypothesizes that each sensor is affected with a small probability \begin{equation*} \sum _{n=1}^{N} \log \left [{1-\varrho + \varrho f_{1}\left ({X_{t}^{(n)}}\right)/f_{0}\left ({X_{t}^{(n)}}\right)}\right],\end{equation*}
2) Subspace Change Detection:
In many applications, the change in high-dimensional data covariance structure can be represented as a low-rank change. For instance, in seismic signal detection [232], a similar waveform is observed at a subset of sensors after the change. Such a change can be modeled as the covariance matrix shifts from an identity matrix to a “spiked” covariance model [88]. The subspace-CUSUM procedure was developed in [232], in which the unknown subspace in the post-change spiked model is estimated sequentially and further used to obtain the log-likelihood ratio statistic. A CUSUM procedure for detecting switching subspace (from a known subspace to another target subspace) was studied in [86].
3) Missing Data:
In high-dimensional time series, it is common that we cannot observe all the entries at each time. The missing components in the observed data handicap conventional approaches. In [234], a mixture type of approach was proposed by combining subspace tracking with missing data to model the underlying dynamic of data geometry (submanifold). Specifically, streaming data is used to track a submanifold approximation, to measure deviations from this approximation, and to calculate a series of statistics of the deviations for detecting when the underlying manifold has changed.
4) Sketching to Conquer High-Dimensionality:
To detect changes quickly over high-dimensional data, we may need to conquer the challenges presented by the data’s high dimensionality. Sketching is a commonly used strategy to reduce data dimensionality, which performs linear projections of high-dimensional data into a small number of sketches. A GLR procedure based on data sketches was studied in [237], with the precise characterization of performance metrics and the minimum number of sketches needed to achieve good performance. Multiple types of sketching matrices can be used, such as Gaussian random matrices, expander graphs, and network topology constrained matrices. The sketching procedure is relevant to large power networks where we cannot place a sensor on each node or edge. Instead, each sensor will measure aggregates of the network states at a few edges or nodes. In [237], the mean-shift detection problem in power networks is studied, where each measurement corresponds to a linear combination of the state at an edge, e.g., real power flow. This leads to a sketching matrix determined by the network topology.
G. Joint Detection and Estimation
It is common that the distribution after the change is unknown. For instance, before the change in industrial process monitoring applications, the production line is in-control and well-calibrated (thus the distribution before the change is known). However, after the change, an anomaly causes a shift to the operation into an unknown status. Therefore, it is interesting to incorporate estimates of the possible post-change status into the detection statistic when performing detection; this problem is related to robust sequential change detection, as discussed in Section III-D1. In other situations, we need to estimate the post-change distribution in retrospect for identifying the change. There has been much work establishing the theoretical foundation for joint detection and estimation. For instance, [135] combines the Bayesian formulation of the estimation and detection and develops an optimal procedure to achieve a tradeoff between detection power and estimation quality. In another context, it is also referred to as sequential change diagnosis [46]. Quickest searching of the change-point (e.g., quickest search for rare events) has been developed in [78], [79], [193].
H. Spatio-Temporal Change Detection
When modeling discrete event data, the point process model [45] is frequently used due to its capability of modeling the time intervals between events directly. Point processes assume that time intervals between events are exponentially distributed. For example, in Poisson processes the intervals are independent, and in Hawkes processes the intervals are dependent, and the intensity depends on the events that occurred in the past [54]. The “autoregressive” nature of Hawkes processes makes them attractive in modeling temporal dependence and causal relationships, including market models [209], earthquake event prediction [144], inferring leadership in e-mail networks [57], and topic models [75]. The multi-dimensional Hawkes process model over networks can model highly correlated discrete event data [168] and capture dependence over networks and propagation of the signal in such settings.
Detection of changes for point processes has attracted much attention for both single event stream and multiple streams over networks (or over multiple locations). For example, there are works focusing on Poisson processes [76], [179], [246], and some recent work on one-dimensional [124], [153] and multi-dimensional (network) point processes [116], [226]. In particular, [116] studied the change detection for networked streaming event data and constructed GLR type procedures; [226] developed the penalized dynamic programming algorithm to detect coefficient changes in discrete-time high-dimensional self-exciting Poisson processes in an offline setting.
This topic is also related to the multisource quickest detection problem, mostly assuming independence between multiple data streams. For instance, the quickest detection of the minimum of change-points for two independent compound Poisson processes was considered in [21] and optimal Bayesian sequential detection procedures were developed.
I. Change Detection-Isolation-Identification
In addition to detecting the change quickly after it occurs, sometimes we are also interested in identifying the post-change model and/or isolating a subset of nodes within a large network affected by the change. In [47], an asymptotically optimal Bayesian detection—isolation scheme was proposed assuming the post-change model is one of the finitely many distinct alternatives. In a series of works, Nikiforov introduced a minimax optimal detection-isolation algorithm for stochastic dynamical systems [137], developed a recursive variant of the algorithm that achieves better computational efficiency [138], and provided an asymptotic lower bound for the mean detection-isolation delay with constraints on the probability of false isolation and the average time before a false alarm [139]. Natural generalizations of CUSUM and SR procedures for detection-isolation problems were discussed in [198]. See [194], [196] for more detailed overviews.
J. Alternative Performance Metrics
Other than what have been presented in this survey, many alternative performance metrics have also been considered. For instance, [161] investigated an exponential penalty of delay rather than a linear penalty (as used in the definition of
New Dimensions
A. Machine Learning and Change Detection
Modern machine learning approaches can be adopted for solving sequential change detection problems, which we will review in this subsection.
1) Density Ratio Estimation:
Instead of estimating the post-change density
2) Anomaly Detection:
Change detection is closely related to anomaly detection, which is a popular topic in machine learning and data mining, and many machine learning techniques have been developed. In particular, an recurrent neural network (RNN) based approach computes the detection statistic (referred to as the anomaly score) in an online fashion and compares with a threshold for anomaly detection [177]. The RNN-based approach can benefit certain situations since they are known to capture complex temporal dependencies for multivariate time series. We refer to [30] for a recent survey on deep learning techniques for anomaly detection. Developing mathematical theory for RNN-based sequential change detection is still an open question.
3) Online Learning and Change Detection:
Online implementation is one of the most critical aspects of sequential change detection algorithms in practice. Although many algorithms enjoy recursive structure, such as CUSUM and SR procedures, some sequential detection procedures face a significant hurdle of online implementation due to their non-recursive nature. For instance, window-limited GLR statistic, although enjoying robust performance in the presence of unknown post-change distributions, is not recursive since the parameters need to be continuously estimated by incorporating new samples. To tackle this challenge, inspired by online learning, [26] develops an online mirror descent-based GLR procedure to update the estimate of the unknown post-change parameter with new data. Another highly cited work [2] develops an online change detection procedure based on Bayesian computing. In recent work, [208] develops a framework for joint sequential change detection and online model fitting, which will be particularly suitable for parameterized models. A GLR procedure is developed in this framework using estimates of the unknown high-dimensional parameter obtained by the gradient descent update.
4) Tracking Data Dynamics:
Many sequential data are dynamic even before the change has happened; for instance, solar flare detection from satellite video streaming [233], [234]. To build methods that work with real-world scenarios, we need to develop robust methods that can adapt to normal data dynamics without mislabeling them as change-points. A possible strategy is to combine tracking with detection. For instance, [233], [234] developed a procedure to detect sparse changes when the pre-change high-dimensional data is time-varying. The data dynamic is captured by tracking a time-varying manifold using variants of subspace tracking (e.g., GROUSE [245], PETRELS [39], or MOUSSE algorithm [234]). Another instance is the network Hawkes process model, where we may track the Hawkes process through online learning techniques [67].
5) Active Learning and Change Detection:
For certain applications such as material science and recovering seafloor depth, data acquisition is expensive. Thus, it is desirable to collect data that is most useful in a sequential fashion, which is the theme of active learning (see, e.g., [29], [190]). The combination of active learning and change detection was introduced as active change-point detection (ACPD) problem in [74]. The task is to adaptively determine the next input to detect the change-point in a black-box expensive-to-evaluate function, with as few evaluations as possible. The method utilizes the existing change detection method to compute change scores and a Bayesian optimization method to determine the next input. A CUSUM procedure with an adaptive sampling strategy to detect mean shifts was developed in [120].
6) Detection With Data Privacy:
As data privacy has growing importance in modern applications in social settings, it also leads to developing private change detection algorithms. Both offline and online change detection methods through the lens of differential privacy have been developed in [44]. A different privacy-aware sequential change detection method was studied in [104], using maximal leakage as the privacy metric, which is a weaker form of privacy compared with [44].
7) Change Detection for Reinforcement Learning:
Reinforcement learning is a major type of sequential decision-making methodology in the era of artificial intelligence. How to implement reinforcement learning in a non-stationary and changing environment is still a mostly unexplored area. Recently, there have been some attempts to combine sequential change detection and reinforcement learning [147], where change detection algorithms are utilized to detect the transition of the environment and trigger transitions of reinforcement learning algorithms.
B. Distribution-Free Methods
Distribution-free methods aim to detect the change without making explicit distributional assumptions on the data. Such methods are particularly attractive in machine learning, such as kernel MMD based method discussed in SectionIII-B.2, due to their flexibility in working with complex data. There have been kernel-based non-parametric methods developed in terms of change detection, both for the offline setting [8], [70], [72] and the online setting [115]. MMD statistics have also been used for anomalous sequence detection, for instance, [252], [254]. Besides MMD, other distribution-free methods have been developed for change detection. For instance, dissimilarity measures based on the kernel support vector machine (SVM) were built in [50], and generalized likelihood test directly using data empirical distributions when the true distributions are supported on a finite alphabet were constructed in [24], [105], [140], [141].
There are many other types of distribution-free non-parametric tests for change detection developed in various contents. For instance, the maximal
C. Non-Stationary Multi-Armed Bandits With Changes
Multi-armed bandit is a class of fundamental problems in online learning and sequential decision-making. A learning agent aims to maximize its expected cumulative reward by repeatedly selecting to pull one arm at each time step. Change detection can play a role in the scenario where the reward distributions may change in a piece-wise-stationary fashion at unknown time steps. To handle dynamic multi-armed bandit problems, various change detection methods were considered, including the Page-Hinkley test [73], a windowed mean-shift detection [243], CUSUM test [119], and sample mean based test [25]. Usually, the algorithm will reset once a change is detected. From a Bayesian perspective, the Thompson sampling strategy equipped with a Bayesian change-point mechanism was considered in [128]. The adversarial multi-armed bandit problem with change points was also considered in [5].
D. Optimization for Change Detection and Estimation
Optimization is becoming a centerpiece in developing modern machine learning algorithms. Recent advances in convex optimization have enabled solving many large-scale problems. A line of research aims to casts (offline) change detection and estimation (of their locations) as an optimization problem. The benefits of this optimization-based approach typically include computational efficiency (when the optimization problem is convex) and theoretical performance guarantees based on optimization theory. Below we give some examples.
The univariate change detection for a mean shift using an optimization approach has been studied in [117], and performance guarantees were established by relating the
Multivariate change detection using an optimization approach has also been studied. For instance, a dynamic programming approach was developed for recovering an unknown number of change-points from multivariate autoregressive models [225]. A network binary segmentation method for change detection was proposed in [223], which has been extended for covariance matrix change detection in [222]. Finally, the work [191] combined the filtered derivative with convex optimization methods to estimate change-points for multi-dimensional data.
Modern Applications
Sequential change detection has traditionally been used in industrial process monitoring applications, which was probably the original motivation for change detection procedures to be developed in the early days. The wide adoption of change detection in industrial quality engineering and manufacturing initiates the field of statistical process control (SPC) (see, e.g., [143], [182]). Recently, there have been many more modern applications for sequential change detection, and we present a selection of them here.
A. Smart Grids
The sequential change detection methodology has been recently successfully applied for sequential line outage detection in power transmission systems. In modern smart grids, high-speed synchronized voltage phase angle measurements are taken from phasor measurement units (PMU). Based on PMU measurements, a linearized incremental small-signal power system model was developed in [37]. Once a line outage occurs, there is a change in the covariance matrix of incremental phases, by monitoring which, line outages can be detected and identified using sequential change detection algorithms. In [171], the transient dynamics of the power system following a line outage is further incorporated. The D-CUSUM algorithm was then developed to incorporate the dynamic nature of the line outage in [171] (see Section III-C1 for more details).
There have been other works on sequential change detection for smart grids. The generalized local likelihood ratio test was applied for voltage quality monitoring [114], photovoltaic systems [36], attack detection in the multi-agent reputation systems [113], wide-area monitoring [112], and cyber-attacks detection in discrete-time linear dynamic system [92], [93]. The decentralized detection with level-triggered sampling was considered in [241]. In [77], a general stochastic graphical framework for modeling the bus measurements and a data-adaptive data-acquisition and decision-making processes were designed for the quickest search and localization of anomaly in power grids.
B. Cybersecurity
Cybersecurity has become a critical problem with the development of wireless communication, networking, and the Internet of Things. It is of practical importance to detect attacks and intrusions in real-time from network streaming data, e.g., denial-of-service attacks, worm-based attacks, port-scanning, and man-in-the-middle attacks. The sequential change detection approach is a natural fit since the attacks usually change network traffic distribution. In [203], multi-channel generalizations of the CUSUM procedure and non-parametric tests were proposed. In [204], adaptive sequential methods were proposed for early detection of subtle network attacks, utilizing data from multiple layers of the network protocol. In [202], a multi-cyclic detection procedure based on the SR procedure was proposed. In [199], score-based CUSUM and SR procedures were exploited for network anomaly detection, and a hybrid detection system was proposed. The application to cybersecurity was also discussed in books [194], [196], and recent reviews [84], [85].
C. Sensors Networks
Sensor networks collecting sequential data have been widely used for geophysical, environmental, traffic, and Internet traffic monitoring applications, which we will briefly summarize in this subsection.
Seismology is experiencing rapid growth in the quantity of data. Earthquake detection aims to identify seismic events in continuous data – a fundamental operation for seismology [242]. Modern ultra-dense seismic sensor arrays have obtained a massive amount of continuous data for seismic studies, and many such data are publicly available through IRIS [1]. In the old days, network seismology treated seismic signals individually - one sensor at a time - and detected an earthquake upon multiple impulsive arrivals consistent with a source within the Earth [87]. Recently, with advances in sensor technology, which bring densely sampled data, high-performance computing and high-speed communication, we are able to use a network-based detection by exploiting correlations between sensors to extract coherence signals. This will enhance the systematic detection of weak and unusual events that currently go undetected using individual sensors. Detecting such weak events is very crucial for earthquake prediction [145], [219], oil field exploration, volcano monitoring, and deeper earth studies [80]. Towards this goal, in [232], a subspace-CUSUM procedure was developed for network-based detection by exploiting the low-rank subspace structure induced by waveform similarity.
Sensor networks have also been deployed to monitor drinking water safety from the water tower to private residences. Sequential change detection using residual chlorine concentration measurements from the sensors network was developed in [65]. Methods have also been developed for monitoring river contamination [34], [35], which specifically consider the spatio-temporal correlation in observations along the sensor network due to water dynamics.
Sequential monitoring of traffic flow using traffic sensors has been considered in [166], and a distributed, online, sequential algorithm for detecting multiple faults in a sensor network was presented therein. Recently, Hawkes processes models for correlated traffic anomalies using data collected by inductive-loop traffic detectors were developed in [250].
D. Wireless Communications
Sequential change detection has been used for wireless communications, including online user activity detection for multi-user direct-sequence/code-division multiple-access (DS-CDMA) environment [146], detecting “spectrum opportunities” in the cognitive radio setting by identifying the occupancy and idle of channels from primary user’s activities [95], [235]. More recently, [81] established a change detection framework for low probability of detection (LPD) communication, where a transmitter, Alice, wants to hide her transmission to a receiver, Bob, from an adversary, Willie; three different sequential tests were considered, including Shewhart, CUSUM, and SR procedures, to model Willie’s detection process.
E. Video Processing and Computer Vision
Change detection is one of the most commonly encountered low-level tasks in computer vision and video processing [163], and many such problems are essentially sequential. A plethora of practical algorithms have been developed to date; for instance, scene change detection [110], street-view change detection [3], and change detection in video sequences [211]. In [48], a pixel-based weightless neural network (WNN) method was developed to detect changes in the field of view of a camera. In [118], multiple images from reference and mission passes of a scene of interest were used to improve detection performance. There are still many open questions regarding how to leverage the power of statistical sequential change detection for computer vision and video processing. We present an example of solar flare detection from video sequences in Fig. 4, which has been considered in several works along this line including [234].
Solar flare detection with the mixture procedure as considered in [234]; the first minor solar flare at
F. Social Networks
The wide-spread use of social networks and the great availability of information networks (e.g., Twitter, Facebook, blogs) lead to a large amount of user-generated data [91], which are quite valuable in studying many social phenomena. One important aspect is to detect change-points in streaming social network data [90], which may represent the collective anticipation of or response to external events or system “shocks” [152]. Detecting such changes could provide a better understanding of the patterns of social life. In other cases, early detection of change-points can predict or even prevent social stress due to disease or international threat, for instance, detecting self-exciting changes (modeled by network Hawkes processes) in social networks [116]. A related topic is distributed hypothesis testing in social networks: Reference [101] showed the exponential convergence rate of a Bayesian update scheme of nodal belief (distribution estimate) in the social learning setting.
G. Epidemiology
Sequential change detection can potentially play an important role in public health and disease surveillance. Early detection of epidemics is a very important topic. In [17], [18], Baron cast the early detection of epidemics as a Bayes sequential change detection problem and proposed an asymptotically pointwise optimal stopping rule, which is computationally efficient for complicated prior distributions arising in epidemiology. In [244], a modified CUSUM procedure was proposed for the susceptible—infected—recovered (SIR) epidemic model to detect change-point in the infection rate parameter. Moreover, change detection has been incorporated into studying the intervention’s effectiveness, based on the premise that the underlying epidemiological model may change over time due to interventions. Evaluating intervention measures’ effectiveness requires detecting underlying change-points, which becomes even more important in the COVID-19 era [249]. Such works include [151], [228], which estimate the change-points in time series to assess the effectiveness of interventions such as lock-down and mask usage; in [49], the problem of detecting the growth rate change for the COVID-19 spread in Germany was studied, where results were further incorporated into forecasting. There are still many open questions in this area regarding developing effective sequential change detection procedures suitable for infectious disease early detection.
Conclusion
Our goal in this survey was to provide a glimpse of the past and recent advances in sequential change detection, and its application in various domains. We have covered different types of sequential change detection procedures, both theoretically optimal and practical. We also discussed how the intersection of sequential change detection with other areas has created interesting new directions for research.
ACKNOWLEDGMENT
The authors are grateful to the Guest Editor and the anonymous reviewers for their helpful comments.