Stochastic approximation has been a very active research area for the last 60 years (see e.g., [1], [2]). The pattern for a stochastic approximation algorithm is provided by the recursion \theta_{n}=\theta_{n-1}+\gamma_{n}Y_{n}, where \theta_{n} is typically a {\BBR}^{d}-valued sequence of parameters, Y_{n} is a sequence of random observations, and \gamma_{n} is a deterministic sequence of step sizes. An archetypal example of such algorithms is provided by stochastic gradient algorithms. These are characterized by the fact that Y_{n}=-\nabla g(\theta_{n-1})+\xi_{n} where \nabla g is the gradient of a function g to be minimized, and where (\xi_{n})_{n\geq 0} is a noise sequence corrupting the observations.
In the traditional setting, sensing, and processing capabilities needed for the implementation of a stochastic approximation algorithm are centralized on one machine. Alternatively, distributed versions of these algorithms where the updates are done by a network of communicating nodes (or agents) have recently aroused a great deal of interest. Applications include decentralized estimation, control, optimization, and parallel computing.
In this paper, we consider a network composed by N nodes (sensors, robots, computing units, and so on). Node i generates a {\BBR}^{d}-valued stochastic process (\theta_{n,i})_{n\geq 1} through a two-step iterative algorithm: a local and a so-called gossip step. At time n:
[{\tt Local\;step}] Node i generates a temporary iterate {\tilde \theta}_{n,i} given by {\tilde \theta}_{n,i}=\theta_{n-1,i}+\gamma_{n}\, Y_{n,i},\eqno{\hbox{(1)}}View Source
{\tilde \theta}_{n,i}=\theta_{n-1,i}+\gamma_{n}\, Y_{n,i},\eqno{\hbox{(1)}} where \gamma_{n} is a deterministic positive step size and where the {\BBR}^{d}-valued random process (Y_{n,i})_{n\geq 1} represents the observations made by agent i.
[{\tt Gossip\;step}] Node i is able to observe the values {\tilde \theta}_{n,j} of some other js and computes the weighted average \theta_{n,i}=\sum_{j=1}^{N}w_{n}(i,j)\,{\tilde \theta}_{n,j},\eqno{\hbox{(2)}}View Source
\theta_{n,i}=\sum_{j=1}^{N}w_{n}(i,j)\,{\tilde \theta}_{n,j},\eqno{\hbox{(2)}} where the w_{n}(i,j)s are scalar nonnegative random coefficients such that \sum_{j=1}^{N}w_{n}(i,j)=1 for any i. The sequence of random matrices W_{n}:=[w_{n}(i,j)]_{i,j=1}^{N} represents the time-varying communication network between the nodes.
Contributions. This paper studies a distributed stochastic approximation algorithm in the context of random row-stochastic gossip matrices W_{n}.
Under the assumption that the algorithm is stable, we prove convergence of the algorithm to the sought consensus. The unanimous convergence of the estimates is also established in the case where the frequency of information exchange between the nodes converges to zero at some controlled rate. In practice, this means that matrices W_{n} become more and more likely to be equal to identity as n\to\infty. The benefits of this possibility in terms of power devoted to communications are obvious.
We provide verifiable sufficient conditions for stability.
We establish a central limit theorem (CLT) on the estimates in the case where the W_{n} are doubly stochastic. We show in particular that the node estimates tend to fluctuate synchronously for large n, i.e., the disagreement between the nodes is negligible at the CLT scale. Interestingly, the distributed algorithm under study has the same asymptotic variance as its centralized analog.
We also consider a CLT on the sequences averaged over time as introduced in [3]. We show that averaging always improves the rate of convergence and the asymptotic variance.
Motivations and examples. The algorithm under study is motivated by the emergence of various decentralized network structures such as sensor networks, computer clouds or wireless ad hoc networks. One of the main application targets is distributed optimization. In this context, one seeks to minimize a sum of some local objective differentiable functions f_{i} of the agents {\tt Minimize}\;\sum_{i=1}^{N}F_{i}(\theta).\eqno{\hbox{(3)}}View Source
{\tt Minimize}\;\sum_{i=1}^{N}F_{i}(\theta).\eqno{\hbox{(3)}} Function F_{i} is supposed to be unknown by any other agent j\ne i. In this context, the distributed algorithm (1)–(2) would reduce to a distributed stochastic gradient algorithm by letting Y_{n,i}=-\nabla_{\theta}F_{i}(\theta_{n-1,i})+\xi_{n,i} where \nabla_{\theta} is the gradient w.r.t. \theta and \xi_{n,i} represents some possible random perturbation \xi_{n,i} at time n.
In a machine learning context, F_{i} is typically the risk function of a classifier indexed by \theta and evaluated based on a local training set at agent i [4]. In a wireless ad-hoc network, F_{i} represents some (negative) performance measure of a transmission such as the Shannon capacity, and the aim is typically to search for a relevant resource allocation vector \theta (see [5] for more details). As a third example, an application framework to statistical estimation is provided in Section V. In that case, it is assumed that node i receives some i.i.d. time series (X_{n,i})_{n} with probability density function f_{\ast}(x). The system designer considers that the density of (X_{n,1},\ldots,X_{n,N}) belongs to a parametric family \{f(\theta,{\mbi x})\}_{\theta} where f(\theta,{\mbi x})=\prod_{i=1}^{n}f_{i}(\theta,x_{i}). Then, a well-known contrast for the estimation of \theta is given by the Kullback–Leibler divergence D(f_{\ast}\,\Vert\, f(\theta,\cdot)) [6]. Finding a minimizer boils down to the minimization of (3) by setting F_{i}(\theta)=D(f_{i,\ast}\,\Vert\, f_{i}(\theta,\cdot)) where f_{i,\ast} is the ith marginal of f_{\ast}. Then, algorithm (1)–(2) coincides with a distributed online maximum likelihood (ML) estimator by setting Y_{n,i}=-\nabla_{\theta}\log f_{i}(\theta_{n-1,i}, X_{n,i}). Under some regularity conditions, it can be easily checked that Y_{n,i}=-\nabla_{\theta}F_{i}(\theta_{n-1,i})+\xi_{n,i} where \xi_{n,i} is a martingale increment sequence.
Position w.r.t. existing works. There is a rich literature on distributed estimation and optimization algorithms, see [7]–[13] as a nonexhaustive list. Among the first gossip algorithms are those considered in the treatise [14] and in [15], as well as in [16], the latter reference dealing with the case of a constant step size. The case where the gossip matrices are random and the observations are noiseless is considered in [17]. Nedic et al. [11] solve a constrained optimization by also using noiseless estimates. The contributions [10] and [13] consider the framework of linear regression models.
In this paper, the random gossip matrices W_{n} are assumed to be row stochastic, i.e., W_{n}{\bf 1}={\bf 1} where {\bf 1} is the vector whose components equal one, and column stochastic in the mean, i.e., {\bf 1}^{T}{\BBE}[W_{n}]={\bf 1}^{T}. Observe that the row stochasticity constraint W_{n}{\bf 1}={\bf 1} is local, since it simply requires that each agent makes a weighted sum of the estimates of its neighbors with weights summing to one. Alternatively, the column stochasticity constraint {\bf 1}^{T}W_{n}={\bf 1}^{T} which is assumed in many contributions (see e.g., [18], [11], [19], [20]) requires a coordination at the network level (nodes must coordinate their weights). This constraint is not satisfied by a large class of gossip algorithms. As an example, the well-known broadcast gossip matrices (see Section II-B) are only column stochastic in the mean. As opposed to the aforementioned papers, it is worth noting that some works such as [16], [12], [5] get rid of the column-stochasticity condition. As a matter of fact, assumption {\bf 1}^{T}{\BBE}[W_{n}]={\bf 1}^{T} is even relaxed in [16]. Nevertheless, considering for instance Problem (3), this comes at the price of losing the convergence to the sought minima.
In many contributions (see e.g., [16], [8], or [10]), the gossip step is performed before the local step, contrary to what is done in this paper. The general techniques used in this paper to establish the convergence toward a consensus, the stability and the fluctuations of the estimates can be adapted without major difficulty to that situation.
In [19], projected stochastic (sub)gradient algorithms are considered in the case where matrices (W_{n})_{n} are doubly stochastic. Such results have later been extended by [5] to the case of nonconvex optimization, also relaxing the doubly stochastic assumption. It is worth noting that such works explicitly or implicitly rely on a projection step onto a compact convex set. In many scenarios (such as unconstrained optimization for example), the estimate is not naturally supposed to be confined into a known compact set. In that case, introducing an artificial projection step is known to modify the limit points of the algorithm. On the opposite, this paper addresses the issue of unprojected stochastic approximation algorithms. In this context, stability turns out to be a crucial issue which is addressed in this paper. Note that the stability issues are not considered in most of [16]. Finally, unlike previous works such as [19] or [5], we also address the issue of convergence rate and characterize the asymptotic fluctuations of the estimation error.
From a methodological viewpoint, our analysis does not rely on convex optimization tools such as in e.g., [18], [11], [19]) and does not rely on perturbed differential inclusions as in [5]. The almost sure convergence result is obtained following an approach inspired by [21] and [22] (other works such as [16] consider weak convergence approaches). The stability result is obtained by introducing a Lyapunov function and by jointly controlling the moments of this Lyapunov function and the second order moments of the disagreements between local estimates. Finally, the study of the asymptotic fluctuations of the estimate is based on recent results of [23] and is partly inspired by the works of [24].
This paper is organized as follows. In Section II, we state and comment our basic assumptions. The algorithm convergence is studied in Section III. The second-order behavior of the algorithm is described in Section IV. An application relative to distributed estimation is described in Section V, along with some numerical simulations. The appendix is devoted to the proofs.
SECTION II.
Model and the Basic Assumptions
Let us start by writing the distributed algorithm described in the previous section in a more compact form. Define the {\BBR}^{dN}-valued random vectors {\mmb\theta}_{n} and {\mbi Y}_{n} by {\mmb\theta}_{n}:=(\theta_{n,1}^{T},\ldots,\theta_{n,N}^{T})^{T} and {\mbi Y}_{n}:=(Y_{n,1}^{T},\ldots, Y_{n,N}^{T})^{T} where A^{T} denotes the transpose of the matrix A. The algorithm reduces to {\mmb\theta}_{n}=(W_{n}\otimes I_{d})\left({\mmb\theta}_{n-1}+\gamma_{n}{\mbi Y}_{n}\right),\eqno{\hbox{(4)}}View Source
{\mmb\theta}_{n}=(W_{n}\otimes I_{d})\left({\mmb\theta}_{n-1}+\gamma_{n}{\mbi Y}_{n}\right),\eqno{\hbox{(4)}} where \otimes denotes the Kronecker product and I_{d} is the d\times d identity matrix.
Note that we always assume {\BBE}\vert{\mmb\theta}_{0}\vert^{2}<\infty throughout the paper, where \vert\,.\,\vert represents the Euclidean norm.
A. Observation and Network Models
Let \left(\mu_{\mmb\theta}\right)_{{\mmb\theta}\in{\BBR}^{dN}} be a family of probability measures on {\BBR}^{dN} endowed with its Borel \sigma-field {\cal B}({\BBR}^{dN}) such that for any A\in{\cal B}({\BBR}^{dN}), {\mmb\theta}\mapsto\mu_{\mmb\theta}(A) is measurable from {\cal B}({\BBR}^{dN}) to {\cal B}([{0,1}]) where {\cal B}([{0,1}]) denotes the Borel \sigma-field on [{0,1}].
We consider the case when the random process ({\mbi Y}_{n},W_{n})_{n\geq 1} is adapted to a filtered probability space \left(\Omega,{\cal A},{\BBP},({\cal F}_{n})_{n\geq 0}\right) and satisfy
Assumption 1
(W_{n})_{n\geq 1} is a sequence of N\times N random matrices with nonnegative elements such that
W_{n} is row stochastic: W_{n}{\bf 1}={\bf 1},
{\BBE}(W_{n}) is column stochastic: {\bf 1}^{T}{\BBE}(W_{n})={\bf 1}^{T},
For any positive measurable functions f, g, and any n\geq 0, {\BBE}[f(W_{n+1})g({\mbi Y}_{n+1})\vert{\cal F}_{n}]={\BBE}[f(W_{n+1})]\, \int g({\mbi y})\mu_{\mmb\theta_{n}}(d{\mbi y}).\eqno{\hbox{(6)}}View Source
{\BBE}[f(W_{n+1})g({\mbi Y}_{n+1})\vert{\cal F}_{n}]={\BBE}[f(W_{n+1})]\, \int g({\mbi y})\mu_{\mmb\theta_{n}}(d{\mbi y}).\eqno{\hbox{(6)}}
The sequence (W_{n})_{n\geq 1} is identically distributed and the spectral norm \rho of matrix {\BBE}(W_{1}^{T}(I_{N}-{\bf 1}{\bf 1}^{T}/N)W_{1}) satisfies \rho<1.
Assumptions 1a and 1c capture the properties of the gossiping scheme within the network. Following the work of [17], random gossip is assumed in this paper. Assumption 1a has been commented in Section I. The assumption on the spectral norm in Assumption 1c is a connectivity condition of the underlying network graph which will be discussed in more details in Section II-B. Assumption 1b implies that 1) the random variables (r.v.) W_{n} and {\mbi Y}_{n} are independent conditionally to the past, 2) the r.v. (W_{n})_{n\geq 1} are independent, and 3) the conditional distribution of {\mbi Y}_{n+1} given the past is \mu_{{\mmb\theta}_{n}}. This assumption is quite usual in the framework of stochastic approximation and is sometimes refer to as a Robbins–Monro setting. As a particular case, this assumption holds if {\mbi Y}_{n+1} has the form {\mbi Y}_{n+1}=g(\mmb\theta_{n})+\xi_{n+1} where \xi_{n+1} is an i.i.d. process.
It is also assumed that the step-size sequence (\gamma_{n})_{n\geq 1} in the stochastic approximation scheme (1) satisfies the following conditions which are rather usual in the framework of stochastic approximation algorithms [2].
Assumption 2
The deterministic sequence (\gamma_{n})_{n\geq 1} is positive and such that \sum_{n}\gamma_{n}=\infty and \sum_{n}\gamma_{n}^{2}<\infty.
B. Illustration: Some Examples of Gossip Schemes
We describe three standard gossip schemes so-called pairwise, broadcast, and dropout schemes. The reader may refer to [25] for a more complete picture and for more general gossip strategies. The network of agents is represented as a nondirected graph ({\ssr E},{\ssr V}) where {\ssr E} is the set of edges and {\ssr V} is the set of N vertices.
1. Pairwise Gossip
This example can be found in [17] on average consensus (see also [5]).
At time n, two connected nodes—say i and j—wake up, independently from the past. Nodes i and j compute the weighted average \theta_{n,i}=\theta_{n,j}=0.5{\tilde \theta}_{n,i}+0.5{\tilde \theta}_{n,j}; and for k\notin\{i,j\}, the nodes do not gossip: \theta_{n,k}={\tilde \theta}_{n,k}. In this example, given the edge \{i,j\} wakes up, W_{n} is equal to I_{N}-(e_{i}-e_{j})(e_{i}-e_{j})^{T}/2 where e_{j} denotes the ith vector of the canonical basis in {\BBR}^{N}; and the matrices (W_{n})_{n\geq 0} are i.i.d. and doubly stochastic. Assumption 1a is obviously satisfied. Conditions for Assumption 1c can be found in [17]: the spectral norm \rho of the matrix {\BBE}(W_{n}(I_{N}-{\bf 1}{\bf 1}^{T}/N)W_{n}^{T}) is in [0,1) if and only if the weighted graph ({\ssr E},{\ssr V},{\ssr W}) is connected, where the wedge \{i,j\} is weighted by the probability that the nodes i, j communicate.
2. Broadcast Gossip
This example is adapted from the broadcast scheme in [26]. At time n, a node i wakes up at random with uniform probability and broadcasts its temporary update {\tilde \theta}_{n,i} to all its neighbors {\cal N}_{i}. Any neighbor j computes the weighted average \theta_{n,j}=\beta{\tilde \theta}_{n,i}+(1-\beta){\tilde \theta}_{n,j}. On the other hand, the nodes k which do not belong to the neighborhood of i (including i itself) sets \theta_{n,k}={\tilde \theta}_{n,k}. Note that, as opposed to the pairwise scheme, the transmitter node i does not expect any feedback from its neighbors. Then, given i wakes up, the (k,\ell)th component of W_{n} is given by w_{n}(k,\ell)=\cases{1 & if $k\notin{\cal N}_{i}$ and $k=\ell$,\cr \beta & if $k\in{\cal N}_{i}$ and $\ell=i$,\cr 1-\beta & if $k\in{\cal N}_{i}$ and $k=\ell$,\cr 0 & otherwise.}View Source
w_{n}(k,\ell)=\cases{1 & if $k\notin{\cal N}_{i}$ and $k=\ell$,\cr \beta & if $k\in{\cal N}_{i}$ and $\ell=i$,\cr 1-\beta & if $k\in{\cal N}_{i}$ and $k=\ell$,\cr 0 & otherwise.} This matrix W_{n} is not doubly stochastic but {\bf 1}^{T}{\BBE}(W_{n})={\bf 1}^{T} (see for instance [26]). Thus, the matrices (W_{n})_{n\geq 1} are i.i.d. and satisfy the Assumption 1a. Here again, it can be shown that the spectral norm \rho of {\BBE}(W_{n}(I_{N}-{\bf 1}{\bf 1}^{T}/N)W_{n}^{T}) is in [0,1) if and only if ({\ssr E},{\ssr V}) is a connected graph (see [26]).
3. Network Dropouts
In this simple example, the network is subjected from time to time to a dropout: consider any sequence of gossip matrices W_{n} satisfying Assumptions 1a and 1c, and put W^{\prime}_{n}=B_{n}W_{n}+(1-B_{n}) I_{N} where B_{n} is a sequence of i.i.d. Bernoulli random variables independent of the W_{n}. The network whose gossip matrices are the W^{\prime}_{n} incurs a dropout at the moments where B_{n}=0. At these moments, the nodes locally update their estimates and skip the gossip step. It is easy to show that the sequence W^{\prime}_{n} satisfies Assumptions 1a and 1c.
SECTION III.
Convergence Results
In this section, we address the asymptotic behavior when n\to\infty of the algorithm (4) and of its averaged version (5). To that goal, we write {\mmb\theta}_{n} as the sum of a vector in the consensus space and a disagreement vector. LetJ:=({\bf 1}{\bf 1}^{T}/N)\otimes I_{d},\qquad\qquad{J_{\bot}}:=I_{dN}-J,\eqno{\hbox{(7)}}View Source
J:=({\bf 1}{\bf 1}^{T}/N)\otimes I_{d},\qquad\qquad{J_{\bot}}:=I_{dN}-J,\eqno{\hbox{(7)}} be resp. the projector onto the consensus subspace \left\{{\bf 1}\otimes\theta:\theta\in{\BBR}^{d}\right\} and the projector onto the orthogonal subspace. For any vector {\mbi x}\in{\BBR}^{dN}, define the vector of {\BBR}^{d} \langle{\mbi x}\rangle:={{1}\over{N}}({{\bf 1}^{T}\otimes I_{d}}){\mbi x},\eqno{\hbox{(8)}}View Source
\langle{\mbi x}\rangle:={{1}\over{N}}({{\bf 1}^{T}\otimes I_{d}}){\mbi x},\eqno{\hbox{(8)}} so that J{\mbi x}={\bf 1}\otimes\langle{\mbi x}\rangle. Note that \langle{\mbi x}\rangle=(x_{1}+\cdots+x_{N})/N in case we write {\mbi x}=(x_{1}^{T},\ldots,x_{N}^{T})^{T}, x_{i} in {\BBR}^{d}. Set {\mbi x}_{\bot}:={J_{\bot}}{\mbi x}\eqno{\hbox{(9)}}View Source
{\mbi x}_{\bot}:={J_{\bot}}{\mbi x}\eqno{\hbox{(9)}} so that {\mbi x}={\bf 1}\otimes\langle{\mbi x}\rangle+{\mbi x}_{\bot}. We will refer to {\mmb\theta}_{\bot,n}:={J_{\bot}}{\mmb\theta}_{n} as the disagreement vector.
The convergence results rely on the following equations: under Assumption 1a, it holds \eqalignno{\langle{\mmb\theta}_{n}\rangle=&\,\langle{\mmb\theta}_{n-1}\rangle+\gamma_{n}\langle(W_{n}\otimes I_{d})({\mbi Y}_{n}+\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1})\rangle, \cr &&{\hbox{(10)}}\cr \gamma_{n+1}^{-1}{\mmb\theta}_{\bot,n}=&\,{{\gamma_{n}}\over{\gamma_{n+1}}}{J_{\bot}}(W_{n}\otimes I_{d})\left(\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1}+{J_{\bot}}{\mbi Y}_{n}\right).\cr &&{\hbox{(11)}}}View Source
\eqalignno{\langle{\mmb\theta}_{n}\rangle=&\,\langle{\mmb\theta}_{n-1}\rangle+\gamma_{n}\langle(W_{n}\otimes I_{d})({\mbi Y}_{n}+\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1})\rangle, \cr &&{\hbox{(10)}}\cr \gamma_{n+1}^{-1}{\mmb\theta}_{\bot,n}=&\,{{\gamma_{n}}\over{\gamma_{n+1}}}{J_{\bot}}(W_{n}\otimes I_{d})\left(\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1}+{J_{\bot}}{\mbi Y}_{n}\right).\cr &&{\hbox{(11)}}} We then first address the almost sure convergence of the sequence ({\mmb\theta}_{n})_{n\geq 1} 1) by showing that the nonhomogeneous controlled Markov chain (\gamma_{n-1}^{-1}{\mmb\theta}_{\bot,n})_{n} is stable enough so that ({\mmb\theta}_{\bot,n})_{n} converges almost surely to zero and, 2) by applying results on the convergence of stochastic approximation algorithms with state-dependent noise in order to identify the limiting points of the sequence (\langle{\mmb\theta}_{n}\rangle)_{n\geq 1}. These results are stated in Theorem 1 (and Theorem 2 in the case of vanishing communication rate); we prove that all agents eventually reach an agreement on the value of their estimate: the limit points of ({\mmb\theta}_{n})_{n\geq 1} (resp. (\bar{\mmb\theta}_{n})_{n\geq 1}) given by (4) (resp. (5)) are of the form {\bf 1}\otimes\theta_{\star}.
It is known that convergence of stochastic approximation algorithms to an attractive set is established provided that the sequence remains in a compact set with probability one and is, with probability 1, infinitely often in the domain of attraction of this attractive set. Our convergence result is stated under assumptions implying the recurrence property provided the sequence remains almost-surely in a compact set. Therefore, our convergence results are derived under a boundedness assumption, and we then provide in Theorem 3 sufficient conditions for this boundedness condition to be satisfied.
All these convergence results are obtained under conditions on the state-dependent noise sequence in the stochastic approximation scheme (10). These conditions roughly speaking assume 1) that there exist a Lyapunov function and an attractive set associated with the mean field of the noisy ordinary differential (10), 2) regularity-in-{\mmb\theta} of the probability distributions (\mu_{\mmb\theta})_{{\mmb\theta}\in{\BBR}^{dN}}. The exact assumptions are stated herein.
A. Assumptions on the Distributions \mu_{\mmb\theta}
Define the function h:{\BBR}^{d}\to{\BBR}^{d} byh(\theta):=\int\langle{\mbi y}\rangle\,\mu_{{\bf 1}\otimes\theta}(d{\mbi y}).\eqno{\hbox{(12)}}View Source
h(\theta):=\int\langle{\mbi y}\rangle\,\mu_{{\bf 1}\otimes\theta}(d{\mbi y}).\eqno{\hbox{(12)}} We shall refer to h as the mean field. The key ingredient to prove the convergence of a stochastic approximation procedure is the existence of a Lyapunov function V for the mean field h, i.e., a function V:{\BBR}^{d}\to{\BBR}^{+} such that \nabla V^{T}\;h\leq 0. Precisely, it is assumed.
Assumption 3
There exists a function V:{\BBR}^{d}\to{\BBR}^{+} such that
V is continuously differentiable.
For any \theta\in{\BBR}^{d}, \nabla V(\theta)^{T}h(\theta)\leq 0, where h is given by (12).
For any M>0, the level set \{\theta\in{\BBR}^{d}: V(\theta)\leq M\} is compact.
The set {\cal L}:=\{\theta\in{\BBR}^{d}:\nabla\!V(\theta)^{T}h(\theta)=0\} is nonempty and there exists M_{0} such that {\cal L}\subseteq\{V\leq M_{0}\}.
The function h given by (12) is continuous on {\BBR}^{d}.
V({\cal L}):=\{V(\theta)\,:\theta\in{\cal L}\} has an empty interior.
Observe that Assumptions 3d and 3f are trivially satisfied when {\cal L} is finite.
When h is a gradient field i.e., h=-\nabla g, a natural candidate for the Lyapunov function is V=g. In this case, {\cal L}=\{\nabla g=0\}; when g is d-times differentiable, Sard's theorem implies that g(\{\nabla g=0\}) has an empty interior. If g is strictly convex and it reaches its minimum at a finite \theta_{\star}, the function \theta\mapsto\vert\theta-\theta_{\star}\vert^{2} is also a Lyapunov function. In this case, {\cal L}=\{\theta_{\star}\}.
Assumption 4
For any M>0,
\sup_{\vert{\mmb\theta}\vert\leq M}\int\left\vert{\mbi y}\right\vert^{2}\,\mu_{\mmb\theta}(d{\mbi y})<\infty.
there exists a constant C_{M} such that for any \vert{\mmb\theta}\vert\leq M, \left\vert\int\langle{\mbi y}\rangle\,\mu_{\mmb\theta}(d{\mbi y})-\int\langle{\mbi y}\rangle\,\mu_{{\bf 1}\otimes\langle{\mmb\theta}\rangle}(d{\mbi y})\right\vert\leq C_{M}\vert{\mmb\theta}_{\bot}\vert.\eqno{\hbox{(13)}}View Source
\left\vert\int\langle{\mbi y}\rangle\,\mu_{\mmb\theta}(d{\mbi y})-\int\langle{\mbi y}\rangle\,\mu_{{\bf 1}\otimes\langle{\mmb\theta}\rangle}(d{\mbi y})\right\vert\leq C_{M}\vert{\mmb\theta}_{\bot}\vert.\eqno{\hbox{(13)}}
The condition
(13) is a regularity condition on the distribution of
\langle{\mbi Y}_{n+1}\rangle given the past.
B. Almost Sure Convergence of the Distributed Algorithm
Define {\ssr d}(\theta,A):=\inf\{\vert\theta-\varphi\vert\,:\varphi\in A\} for any \theta\in{\BBR}^{d} and A\subset{\BBR}^{d}.
Theorem 1
Let us consider Assumptions 1–4. Assume in addition that \lim_{n}\gamma_{n}/\gamma_{n-1}=1 and {\BBP}\left\{\limsup_{n}\vert{\mmb\theta}_{n}\vert<\infty\right\}=1.\eqno{\hbox{(14)}}View Source
{\BBP}\left\{\limsup_{n}\vert{\mmb\theta}_{n}\vert<\infty\right\}=1.\eqno{\hbox{(14)}} Then, with probability 1 \lim_{n\to\infty}{\ssr d}(\langle{\mmb\theta}_{n}\rangle,{\cal L})=0,\qquad\qquad\lim_{n}{\mmb\theta}_{\bot,n}=0,\eqno{\hbox{(15)}}View Source
\lim_{n\to\infty}{\ssr d}(\langle{\mmb\theta}_{n}\rangle,{\cal L})=0,\qquad\qquad\lim_{n}{\mmb\theta}_{\bot,n}=0,\eqno{\hbox{(15)}} where {\cal L} is given by Assumption 3. Moreover, with probability one, (\langle{\mmb\theta}_{n}\rangle)_{n\geq 1} converges to a connected component of {\cal L}.
Theorem 1 is proved in Appendix B. Theorem 1 shows that when the stability condition (14) holds true, the vector of iterates {\mmb\theta}_{n} given by (4) converges almost surely to the consensus space as n\to\infty so that the network asymptotically achieves consensus. Moreover, this consensus belongs to the attractive set of the Lyapunov function.
Since V is continuous, Theorem 1 implies that with probability 1 (w.p.1), the sequence \{V(\langle{\mmb\theta}_{n}\rangle)\}_{n\geq 0} converges to a (random) point \upsilon_{\star}\in V({\cal L}). This can be used to show that (\langle{\mmb\theta}_{n}\rangle)_{n\geq 0} converges to a connected component of \{\theta\in{\cal L}:V(\theta)=\upsilon_{\star}\}. In general, this does not imply that (\langle{\mmb\theta}_{n}\rangle)_{n\geq 0} converges w.p.1 to some (random point) \theta_{\star}\in{\cal L}. Note nevertheless that this holds true w.p.1 when {\cal L} is finite.
Along any sequence ({\mmb\theta}_{n})_{n\geq 0} converging to {\bf 1}\otimes\theta_{\star} for some \theta_{\star}\in{\cal L}, Cesaro's lemma implies that the averaged sequence (\bar{\mmb\theta}_{n})_{n\geq 0} converges w.p.1 to {\bf 1}\otimes\theta_{\star}. Therefore, the averaged sequence (5) and the original sequence (4) have the same limiting value, if any.
C. Case of a Vanishing Communication Rate
Theorems 1 still holds true when the r.v. (W_{n})_{n\geq 1} are not identically distributed. An interesting example is when {\BBP}\left\{W_{n}=I_{N}\right\}\to 1 as n\to\infty. From a communication point of view, this means that the exchange of information between agents becomes rare as n\to\infty. This context is especially interesting in case of wireless networks, where it is often required to limit as much as possible the amount of communication between the nodes.
In such cases, Assumption 1c does no longer hold true. We prove a convergence result for the algorithms (4) and (5) when the spectral norm of the matrix {\BBE}(W_{n}^{T}(I_{N}-{\bf 1}{\bf 1}^{T}/N)W_{n}) and the step size sequence (\gamma_{n})_{n\geq 1} satisfy the following assumption.
Assumption 5
\sum_{n}\gamma_{n}=\infty and there exists \alpha>1/2 such that \eqalignno{&\lim_{n\to\infty}n^{\alpha}\gamma_{n}=0,\qquad\qquad\lim_{n\to\infty}n^{1+\alpha}\gamma_{n}=+\infty,&{\hbox{(16)}}\cr &\liminf_{n\to\infty}{{1-\rho_{n}}\over{n^{\alpha}\gamma_{n}}}>0,&{\hbox{(17)}}}View Source
\eqalignno{&\lim_{n\to\infty}n^{\alpha}\gamma_{n}=0,\qquad\qquad\lim_{n\to\infty}n^{1+\alpha}\gamma_{n}=+\infty,&{\hbox{(16)}}\cr &\liminf_{n\to\infty}{{1-\rho_{n}}\over{n^{\alpha}\gamma_{n}}}>0,&{\hbox{(17)}}} where \rho_{n} is the spectral norm of the matrix {\BBE}(W_{n}^{T}(I_{N}-{\bf 1}{\bf 1}^{T}/N)W_{n}).
Note that under Assumption 5, \lim_{n}n (1-\rho_{n})=+\infty. A typical framework where this assumption is useful is the following. Let (B_{n})_{n} be a Bernoulli sequence of independent r.v. with {\BBP}(B_{n}=1)=p_{n} and the probabilities p_{n} decrease in such a way that \liminf_{n}p_{n}/(n^{\alpha}\gamma_{n})>0: replace the matrices W_{n} described by Assumption 1 with B_{n}W_{n}+(1-B_{n})I_{N}. Here p_{n} represents the probability that a communication between the nodes takes place at time n.
We also have \sum_{n}\gamma_{n}^{2}<\infty so that the step-size sequence (\gamma_{n})_{n\geq 1} satisfies the standard conditions for stochastic approximation scheme to converge.
An example of sequences (\gamma_{n})_{n\geq 1}, (\rho_{n})_{n\geq 1} satisfying Assumption 5 is given by 1-\rho_{n}=a/n^{\eta} and \gamma_{n}=\gamma_{0}/n^{\xi} with \eta, \xi such that 0\leq\eta<\xi-1/2\leq 1/2. In particular, \xi\in (1/2,1] and \eta\in [0,1/2).
When the r.v. (W_{n})_{n\geq 1} are i.i.d., the spectral norm \rho_{n} is equal to \rho for any n, and (17) implies \rho<1: one is back to Assumption 1c. From this point of view, Assumption 5 is weaker than Assumption 1c. Nevertheless, stronger constraints than Assumption 1c are needed on the step size (\gamma_{n})_{n\geq 1}.
When substituting Assumption 1c by Assumption 5, we have following theorem.
Theorem 2
The statement of Theorem 1 remains valid under Assumptions 1a, 1b, and 2–5 and (14).
Theorem 2 is proved in Appendix B.
D. Stability
In this section, we provide sufficient conditions implying (14). These conditions are stated in the case of a vanishing communication rate but remain valid when Assumption 5 is replaced with Assumption 1c. The proof of Theorem 3 is given in Appendix C.
Theorem 3
Let us consider Assumptions 1a, 1b, 2, 3a–3e, and 5. Assume in addition that
ST1. \nabla V is Lipschitz on {\BBR}^{d}.
ST2. there exists a constant C such that for any {\mmb\theta}\in{\BBR}^{dN}, \eqalignno{&\int\left\vert{\mbi y}\right\vert^{2}\,\mu_{\mmb\theta}(d{\mbi y})\leq C\left(1+V(\langle{\mmb\theta}\rangle)+\vert{\mmb\theta}_{\bot}\vert^{2}\right),\cr &\left\vert\int\langle{\mbi y}\rangle\,\mu_{\mmb\theta}(d{\mbi y})-\int\langle{\mbi y}\rangle\,\mu_{{\bf 1}\otimes\langle{\mmb\theta}\rangle}(d{\mbi y})\right\vert\leq C\vert{\mmb\theta}_{\bot}\vert.}View Source
\eqalignno{&\int\left\vert{\mbi y}\right\vert^{2}\,\mu_{\mmb\theta}(d{\mbi y})\leq C\left(1+V(\langle{\mmb\theta}\rangle)+\vert{\mmb\theta}_{\bot}\vert^{2}\right),\cr &\left\vert\int\langle{\mbi y}\rangle\,\mu_{\mmb\theta}(d{\mbi y})-\int\langle{\mbi y}\rangle\,\mu_{{\bf 1}\otimes\langle{\mmb\theta}\rangle}(d{\mbi y})\right\vert\leq C\vert{\mmb\theta}_{\bot}\vert.}
Then,
{\BBP}\left\{\limsup_{n}\vert{\mmb\theta}_{n}\vert<\infty\right\}=1.
It is proved in Appendix C that under the assumptions of Theorem 3, a stronger result holds (see Lemma 5): the sequence ({\mmb\theta}_{\bot,n})_{n\geq 1} converges to zero with probability 1 and (\langle{\mmb\theta}_{n}\rangle)_{n\geq 1} is stable in the sense that \sup_{n}V(\langle{\mmb\theta}_{n}\rangle)<\infty.
Note that the Lipschitz assumption on the gradient \nabla V combined with Assumption ST2 implies that h is at most linearly increasing when \vert\mmb\theta\vert\to\infty.
The stability condition (14) could also be satisfied by modifying the algorithm (4) with a truncation step. Truncation on a fixed compact set of {\BBR}^{dN} is easy to implement and natural when constraints on the system are available a priori; nevertheless it becomes impractical and questionable in many situations of interest when a compact set containing the limiting set {\cal L} is not known a priori. Another stability strategy consists in truncations on randomly varying compact sets [27]; derivation of conditions implying the stability of Algorithm (4) without modifying its limiting set under such an approach is out of the scope of this paper and left to the interested reader.
SECTION IV.
Convergence Rates
In this section, we derive the convergence rate in L^{2} of the disagreement sequence ({\mmb\theta}_{\bot,n})_{n} defined {\mmb\theta}_{\bot,n}:={J_{\bot}}{\mmb\theta}_{n} [see (7) and (9)]. We also derive central limit theorems for the sequences ({\mmb\theta}_{n})_{n} and (\bar{\mmb\theta}_{n})_{n}: we show that averaging always improves the convergence rate and the asymptotic variance.
A. Convergence Rate of the Disagreement Vector {\mmb\theta}_{\bot,n}
Whereas Theorem 1 states that \lim_{n}{\mmb\theta}_{\bot,n}=0 almost surely, Theorem 4 provides an information on the convergence rate: {\mmb\theta}_{\bot,n} tends to zero in L^{2} at rate 1/\gamma_{n}. For a positive deterministic sequence (a_{n})_{n\geq 1}, {\cal O}(a_{n}) stands for a deterministic {\BBR}^{\ell}-valued sequence (x_{n})_{n\geq 1} such that \sup_{n}a_{n}^{-1}\vert x_{n}\vert<\infty. The proof of Theorem 4 is given in Appendix D.
Theorem 4
Let us consider Assumptions 1, 2, and 4a. For any M>0, \gamma_{n}^{-2}{\BBE}\left(\vert{\mmb\theta}_{\bot,n}\vert^{2}{\bf 1}_{\sup_{k\leq n-1}\vert\mmb\theta_{k}\vert\leq M}\right)\leq{{\rho C}\over{(1-\sqrt\rho)^{2}}}+{\cal O}\left(\rho^{n}\gamma_{n}^{-2}\right)\eqno{\hbox{(18)}}View Source
\gamma_{n}^{-2}{\BBE}\left(\vert{\mmb\theta}_{\bot,n}\vert^{2}{\bf 1}_{\sup_{k\leq n-1}\vert\mmb\theta_{k}\vert\leq M}\right)\leq{{\rho C}\over{(1-\sqrt\rho)^{2}}}+{\cal O}\left(\rho^{n}\gamma_{n}^{-2}\right)\eqno{\hbox{(18)}} where \rho is given by Assumption 1c and where C:=\limsup_{n\to\infty}{\BBE}(\vert{\mbi Y}_{\!\!\!\bot,n}\vert^{2}{\bf 1}_{\sup_{k\leq n-1}\vert\mmb\theta_{k}\vert\leq M}) is finite.
B. Central Limit Theorems
We derive central limit theorems for sequences ({\mmb\theta}_{n})_{n} and (\bar{\mmb\theta}_{n})_{n} converging to a point {\bf 1}\otimes\theta_{\star} for some \theta_{\star}\in{\cal L}. To that goal, we restrict our attention to the case when the matrix (W_{n})_{n} are doubly stochastic, i.e., {\bf 1}^{T}W_{n}={\bf 1}^{T}. The general case is far more technical and out of the scope of this paper. We also assume that the point \theta_{\star} and the r.v. {\mbi Y} satisfy
Assumption 6
\theta_{\star}\in{\cal L}.
The mean field h:{\BBR}^{d}\to{\BBR}^{d} given by (12) is twice continuously differentiable in a neighborhood of \theta_{\star}.
\nabla h(\theta_{\star}) is a Hurwitz matrix, i.e., the largest real part of its eigenvalues is {-}L for some L>0.
Assumption 7
There exist \delta>0 and \tau>0 such that \sup_{\vert{\mmb\theta}-{\bf 1}\otimes\theta_{\star}\vert\leq\delta}\int\vert\langle{\mbi y}\rangle\vert^{2+\tau}\mu_{\mmb\theta}(d{\mbi y})<\infty.
The functions {\mmb\theta}\mapsto\int\langle{\mbi y}\rangle\langle{\mbi y}\rangle^{T}\mu_{\mmb\theta}(d{\mbi y}) and {\mmb\theta}\mapsto\int\langle{\mbi y}\rangle\mu_{\mmb\theta}(d{\mbi y}) are continuous in a neighborhood of {\bf 1}\otimes\theta_{\star}.
We finally strengthen the assumptions on the step-size sequence (\gamma_{n})_{n\geq 0}. In the sequel, notations x_{n}=o(y_{n}) and x_{n}\sim y_{n} stand for x_{n}/y_{n}\to 0 and x_{n}/y_{n}\to 1, respectively.
Assumption 8
(\gamma_{n})_{n} is a positive deterministic sequence such that either \log (\gamma_{k}/\gamma_{k+1})=o(\gamma_{k}), or \log (\gamma_{k}/\gamma_{k+1})\sim\gamma_{k}/\gamma_{\star} for some \gamma_{\star}>1/(2L).
\sum_{n}\gamma_{n}=\infty and \sum_{n}\gamma_{n}^{2}<\infty.
\lim_{n}n\gamma_{n}=+\infty and \eqalignno{&\lim_{n}{{1}\over{\sqrt{n}}}\sum_{k=1}^{n}\gamma_{k}^{-1/2}\;\left\vert 1-{{\gamma_{k}}\over{\gamma_{k+1}}}\right\vert=0\cr &\qquad\quad\lim_{n}{{1}\over{\sqrt{n}}}\sum_{k=1}^{n}\gamma_{k}=0.}View Source
\eqalignno{&\lim_{n}{{1}\over{\sqrt{n}}}\sum_{k=1}^{n}\gamma_{k}^{-1/2}\;\left\vert 1-{{\gamma_{k}}\over{\gamma_{k+1}}}\right\vert=0\cr &\qquad\quad\lim_{n}{{1}\over{\sqrt{n}}}\sum_{k=1}^{n}\gamma_{k}=0.}
The step size
\gamma_{n}\sim\gamma_{\star}/n^{\xi} satisfies Assumptions 8a and 8b for any
1/2<\xi\leq 1 since
\log (\gamma_{k}/\gamma_{k+1})\sim\xi/k. Similarly, if
\gamma_{n}\sim\gamma_{\star}/n, Assumption 8a holds provided that
\gamma_{\star}>(1/2L). Observe that when the sequence
(\gamma_{n})_{n} is ultimately nonincreasing, then the condition
\lim_{n}n\gamma_{n}=+\infty implies
\lim_{n}\sqrt{n}^{-1}\sum_{k=1}^{n}\gamma_{k}^{-1/2}\;\left\vert 1-(\gamma_{k}/\gamma_{k+1})\right\vert=0 (see e.g.,
[21, Th. 26, Ch. 4]). Set
\displaylines{\qquad\Upsilon:=\int\langle{\mbi y}\rangle\langle{\mbi y}\rangle^{T}\;\mu_{{\bf 1}\otimes\theta_{\star}}(d{\mbi y})\hfill\cr \hfill-\left(\int\langle{\mbi y}\rangle\;\mu_{{\bf 1}\otimes\theta_{\star}}(d{\mbi y})\right)\left(\int\langle{\mbi y}\rangle\;\mu_{{\bf 1}\otimes\theta_{\star}}(d{\mbi y})\right)^{T}.}View Source
\displaylines{\qquad\Upsilon:=\int\langle{\mbi y}\rangle\langle{\mbi y}\rangle^{T}\;\mu_{{\bf 1}\otimes\theta_{\star}}(d{\mbi y})\hfill\cr \hfill-\left(\int\langle{\mbi y}\rangle\;\mu_{{\bf 1}\otimes\theta_{\star}}(d{\mbi y})\right)\left(\int\langle{\mbi y}\rangle\;\mu_{{\bf 1}\otimes\theta_{\star}}(d{\mbi y})\right)^{T}.}Theorem 5
Let us consider Assumptions 1, 4, 6, 7, 8a, and 8b. Assume in addition that {\bf 1}^{T}W_{n}={\bf 1}^{T} w.p.1. Then, under the conditional probability {\BBP}(\cdot\vert\lim_{k}{\mmb\theta}_{k}={\bf 1}\otimes\theta_{\star}), the sequence of r.v. (\gamma_{n}^{-1/2} ({\mmb\theta}_{n}-{\bf 1}\otimes\theta_{\star}))_{n\geq 0} converges in distribution to {\bf 1}\otimes Z where Z is a centered Gaussian distribution with covariance matrix \Sigma solution of the Lyapunov equation \nabla h(\theta_{\star})\Sigma+\Sigma\nabla h(\theta_{\star})^{T}=-\UpsilonView Source
\nabla h(\theta_{\star})\Sigma+\Sigma\nabla h(\theta_{\star})^{T}=-\Upsilon if \log (\gamma_{k}/\gamma_{k+1})=o(\gamma_{k}) and \left(I+2\gamma_{\star}\nabla h(\theta_{\star})\right)\Sigma+\Sigma\left(I+2\gamma_{\star}\nabla h(\theta_{\star})^{T}\right)=-\UpsilonView Source
\left(I+2\gamma_{\star}\nabla h(\theta_{\star})\right)\Sigma+\Sigma\left(I+2\gamma_{\star}\nabla h(\theta_{\star})^{T}\right)=-\Upsilon if \log (\gamma_{k}/\gamma_{k+1})\sim\gamma_{k}/\gamma_{\star}.
The proof of Theorem 5 is postponed to Appendix E. The asymptotic variance can be compared to the asymptotic variance in a centralized algorithm: formally, such an algorithm is obtained by setting W_{n}={\bf 1}{\bf 1}^{T}/N\otimes I_{d}. Interestingly, the distributed algorithm under study has the same asymptotic variance as its centralized analogue.
Theorem 5 shows that when \gamma_{n}\sim\gamma_{\star}/n^{\alpha} for some \alpha\in (1/2, 1], then the rate in the CLT is {\cal O}(1/n^{\alpha/2}). Therefore, the maximal rate of convergence is achieved with \gamma_{n}\sim\gamma_{\star}/n and in this case, the rate is {\cal O}(1/\sqrt{n}). Unfortunately, the use of such a rate necessitates to choose \gamma_{\star} as a function of \nabla h(\theta_{\star}) (through the upper bound L, see Assumption 8a, and in practice \nabla h(\theta_{\star}) is unknown. We will show in Theorem 6 that the optimal rate O(1/\sqrt{n}) can be reached by applying the averaged procedure (5) with \gamma_{n}\sim\gamma_{\star}/n^{\alpha} whatever \alpha\in (1/2, 1).
A second question is the scaling of the observations in the local step. Observe that during each local step of the algorithm (see (1)), each agent can use a common invertible matrix gain \Gamma and update the temporary iterate {\tilde \theta}_{n,i} as {\tilde \theta}_{n,i}=\theta_{n-1,i}+\gamma_{n}\,\Gamma Y_{n,i}.\eqno{\hbox{(19)}}View Source
{\tilde \theta}_{n,i}=\theta_{n-1,i}+\gamma_{n}\,\Gamma Y_{n,i}.\eqno{\hbox{(19)}} It is readily seen that the new mean field {\tilde h}:\theta\mapsto\int\langle(\Gamma\otimes I_{N}){\mbi y}\rangle\mu_{{\bf 1}\otimes\theta}(d{\mbi y}) is equal to \Gamma h and Assumptions 3 and 4 remain valid with ({\mbi Y}, h, V) replaced by ((\Gamma\otimes I_{N}){\mbi Y},\Gamma h,\Gamma^{-1}V). Therefore, introducing a gain matrix \Gamma does not change the limiting points of the algorithm (4) [and thus (5)] but changes the asymptotic variance. In the case of the optimal rate in Theorem 5 (i.e., the case \gamma_{n}\sim\gamma_{\star}/n for some \gamma_{\star}>1/(2 L)), it can be proved following the same lines as in [23] (see also [1, Proposition 4, Ch. 3, Part I]), that the optimal choice of the gain matrix is \Gamma_{\star}=-\gamma_{\star}^{-1}\nabla h(\theta_{\star})^{-1}. By optimal, we mean that, when weighting the observations by \Gamma_{\star} as in (19), the asymptotic covariance matrix \Sigma_{\star} obtained through Theorem 5 is smaller than the limiting covariance \Sigma_{\Gamma} associated with any other gain matrix \Gamma, i.e., \Sigma_{\Gamma}-\Sigma_{\star} is nonnegative. Moreover, \Sigma_{\star} is equal to \gamma_{\star}^{-1}\;\nabla h(\theta_{\star})^{-1}\Upsilon\nabla h(\theta_{\star})^{-T}.View Source
\gamma_{\star}^{-1}\;\nabla h(\theta_{\star})^{-1}\Upsilon\nabla h(\theta_{\star})^{-T}. Otherwise stated, (\sqrt{n}\;(\langle{\mmb\theta}_{n}\rangle-\theta_{\star}))_{n\geq 0} converges to a centered Gaussian vector with covariance matrix \nabla h(\theta_{\star})^{-1}\Upsilon\nabla h(\theta_{\star})^{-T}.
In practice, \nabla h(\theta_{\star}) is unknown and such a choice of gain matrix cannot be plugged in the algorithm (4). Fortunately, Theorem 6 shows that this optimal variance can be reached by averaging the sequence (\bar{\mmb\theta}_{n})_{n}.
Note that these two major features of averaging algorithms for stochastic approximation (optimal convergence rate and optimal limiting covariance matrix) has been pointed out by [3] (see also [28]) in case of centralized algorithms.
Theorem 6
Let (\gamma_{n})_{n} be a deterministic positive sequence such that \log (\gamma_{k}/\gamma_{k+1})=o(\gamma_{k}). Let us consider Assumptions 1, 4, 6, 7, and 8b–8c. Assume in addition that {\bf 1}^{T}W_{n}={\bf 1}^{T} w.p.1. Then, under the conditional probability {\BBP}(\cdot\vert\lim_{k}{\mmb\theta}_{k}={\bf 1}\otimes\theta_{\star}), the sequence of r.v. (\sqrt{n}\;(\bar{\mmb\theta}_{n}-{\bf 1}\otimes\theta_{\star}))_{n\geq 0} converges in distribution to {\bf 1}\otimes\bar Z where \bar Z is a centered Gaussian distribution with covariance matrix \nabla h(\theta_{\star})^{-1}\;\Upsilon\nabla h(\theta_{\star})^{-T}.View Source
\nabla h(\theta_{\star})^{-1}\;\Upsilon\nabla h(\theta_{\star})^{-T}.
The proof of Theorem 6 is postponed to Appendix F.
SECTION V.
Application Framework
A. Distributed Estimation
To illustrate the results, we describe in this section a distributed parameter estimation algorithm which converges to a limit point of the centralized ML estimator. Assume that node i receives at time n the {\BBR}^{m_{i}}-valued component X_{n,i} of the i.i.d. random process {\mbi X}_{n}=(X_{n,1}^{T},\ldots X_{n,N}^{T})^{T}\in{\BBR}^{\sum m_{i}}, where {\mbi X}_{1} has the unknown density f_{\ast}(x) with respect to the Lebesgue measure. The system designer considers that the density of {\mbi X}_{1} belongs to a family \{f(\theta,{\mbi x})\}_{\theta\in{\BBR}^{d}}. When f(\theta,{\mbi x}) satisfies some regularity and smoothness conditions, the limit points of the sequences {\hat \theta}_{n} that maximize the log-likelihood function L_{n}(\theta)=\sum_{k=1}^{n}\log f(\theta,{\mbi X}_{k}) are minimizers of the Kullback–Leibler divergence D(f_{\ast}\,\Vert\, f(\theta,\cdot)) [6]. Our aim is to design a distributed and iterative algorithm that exhibits the same asymptotic behavior in the case where f(\theta,{\mbi x}) is of the form f(\theta,{\mbi x})=\prod_{i=1}^{N}f_{i}(\theta, x_{i}) where {\mbi x}=(x_{1}^{T},\ldots, x_{N}^{T})^{T} is partitioned similarly to {{\mbi X}_{1}}. To that purpose, Algorithm (4) is implemented with the increments Y_{n+1,i}=\nabla_{\theta}\log f_{i}\left(\theta_{n,i}, X_{n+1,i}\right) where \nabla_{\theta} is the gradient with respect to \theta. In some sense, \log f_{i}(\theta_{n,i}, X_{n+1,i}) is a local log-likelihood function that is updated by node i at time n+1 by a gradient approach. Writing {\mmb\theta}=(\theta_{1}^{T},\ldots,\theta_{N}^{T})^{T}, the distribution \mu_{\mmb\theta} introduced in Section II-A is defined by the identity \displaylines{\qquad\qquad\int g({\mbi y})\mu_{\mmb\theta}(d{\mbi y})=\int g\!\big ((\nabla_{\theta}\log f_{1}(\theta_{1},x_{1})^{T},\ldots\hfill\cr \hfill\ldots,\nabla_{\theta}\log f_{N}(\theta_{N}, x_{N})^{T})^{T}\big)\;f_{\ast}({\mbi x})\, d{\mbi x}}View Source
\displaylines{\qquad\qquad\int g({\mbi y})\mu_{\mmb\theta}(d{\mbi y})=\int g\!\big ((\nabla_{\theta}\log f_{1}(\theta_{1},x_{1})^{T},\ldots\hfill\cr \hfill\ldots,\nabla_{\theta}\log f_{N}(\theta_{N}, x_{N})^{T})^{T}\big)\;f_{\ast}({\mbi x})\, d{\mbi x}} for every measurable function g:{\BBR}^{Nd}\to{\BBR}_{+}. The associated mean field given by (12) will be h(\theta)={{1}\over{N}}\int\nabla_{\theta}\log f(\theta,{\mbi x})\,f_{\ast}({\mbi x})\, d{\mbi x}.View Source
h(\theta)={{1}\over{N}}\int\nabla_{\theta}\log f(\theta,{\mbi x})\,f_{\ast}({\mbi x})\, d{\mbi x}. Since h(\theta)=-N^{-1}\nabla_{\theta}D(f_{\ast}\,\Vert\, f(\theta,\cdot)) (assuming \nabla_{\theta} and \int can be interchanged), our algorithm is of a gradient type with V(\theta)=D(f_{\ast}\,\Vert\, f(\theta,\cdot)) as the natural Lyapunov function. Under the assumptions of Theorems 1 or 2, we know that the \theta_{n,i}, i=1,\ldots,N converge unanimously to {\cal L}=\{\theta\,:\,\nabla V(\theta)=0\}. Here, we note that under some weak extra assumptions on the “noise” of the algorithm, it is possible to show that unstable points such as local maxima or saddle points of V(\theta) are avoided (see for instance [29]–[31]). Consequently, the first-order behavior of the distributed algorithm is identical to that of the centralized ML algorithm. We now consider the second-order behavior of these algorithms, restricting ourselves to the case where f_{\ast}({\mbi x})=\prod_{i=1}^{N}f_{i}(\theta_{\star}, x_{i}) for some \theta_{\star}\in{\BBR}^{d}. With some conditions on f_{\ast}, it is well known that any consistent sequence {\hat \theta}_{n} of estimates provided by the centralized ML algorithm satisfies \sqrt{n}({\hat \theta}_{n}-\theta_{\star})\buildrel{\cal D}\over{\longrightarrow}{\cal N}(0, F(\theta_{\star})^{-1}) where \rightarrow {\cal D} stands for the convergence in distribution, {\cal N}(0,\Sigma) represents the centered Gaussian distribution with covariance \Sigma and \displaylines{F(\theta_{\star})\hfill\cr \hfill=\sum_{i=1}^{N}\int\nabla_{\theta}\log f_{i}(\theta_{\star}, x_{i})\,\nabla_{\theta}\log f_{i}(\theta_{\star}, x_{i})^{T}\, f_{i}(\theta_{\star}, x_{i})\, dx_{i}}View Source
\displaylines{F(\theta_{\star})\hfill\cr \hfill=\sum_{i=1}^{N}\int\nabla_{\theta}\log f_{i}(\theta_{\star}, x_{i})\,\nabla_{\theta}\log f_{i}(\theta_{\star}, x_{i})^{T}\, f_{i}(\theta_{\star}, x_{i})\, dx_{i}} is the Fisher information matrix of f(\theta_{\star},\cdot) [6, Ch. 6]. We now turn to the distributed algorithm and to that end, we apply Theorems 5 and 6. Matrices \nabla h(\theta_{\star}) and \Upsilon found in the statements of these theorems coincide in our case with {-}N^{-1}F(\theta_{\star}) and N^{-2}F(\theta_{\star}), respectively (same value of \Upsilon for both theorems). Starting with the averaged case, Theorem 6 shows that on the set \{\lim_{n}{\mmb\theta}_{n}={\bf 1}\otimes\theta_{\star}\}, the averaged sequence \bar{\mmb\theta}_{n} satisfies \sqrt{n}(\bar{\mmb\theta}_{n}-{\bf 1}\otimes\theta_{\star})\rightarrow {\cal D}{\bf 1}\otimes Z where Z\sim{\cal N}(0, F(\theta_{\star})^{-1}). This implies that the averaged algorithm is asymptotically efficient, similarly to the centralized ML algorithm. Let us consider the nonaveraged algorithm. In order to make a fair comparison with the centralized ML algorithm, we restrict the use of Theorem 5 to the case where \gamma_{n} has the form \gamma_{n}=\gamma_{\star}/n. In that case, Assumption 8 is verified when \gamma_{\star}>N/(2\lambda_{\min}(F(\theta_{\star}))) where \lambda_{\min}(F(\theta_{\star})) is the smallest eigenvalue of F(\theta_{\star}). Theorem 5 shows that on the set \{\lim_{n}\mmb\theta_{n}={\bf 1}\otimes\theta_{\star}\}, the sequence of estimates {\mmb\theta}_{n} satisfies \sqrt{n}({\mmb\theta}_{n}-{\bf 1}\otimes\theta_{\star})\rightarrow {\cal D}{\bf 1}\otimes Z where Z\sim{\cal N}(0,\Sigma), and where \Sigma is the solution of the matrix equation \Sigma (2 N^{-1}\gamma_{\star}F(\theta_{\star})-I_{d}){+}(2 N^{-1}\gamma_{\star}F(\theta_{\star})-I_{d})\Sigma=2\gamma_{\star}^{2}N^{-2}F(\theta_{\star}). Solving this equation, we obtain \Sigma=\gamma_{\star}^{2}N^{-2}F(\theta_{\star}) (2\gamma_{\star}N^{-1}F(\theta_{\star})-I_{d})^{-1}. Notice that \Sigma-F(\theta_{\star})^{-1}{=}F(\theta_{\star})^{-1}(2\gamma_{\star}N^{-1}F(\theta_{\star})-I_{d})^{-1}(\gamma_{\star}N^{-1}F(\theta_{\star})-I_{d})^{2}>0, which quantifies the departure from asymptotic efficiency of the nonaveraged algorithm.
B. Application to Source Localization
The distributed algorithm described above is used here to localize a source by a collection of N=40 sensors. The unknown location of the source in the plane is represented by a parameter \theta_{\star}\in{\BBR}^{2}. The sensors are located in the square [{0, 50}]\times [{0, 50}] as shown by Fig. 1, and they receive scalar-valued signals from the source (m_{i}=1 for all i). It is assumed that the density of {\mbi X}_{1}\in{\BBR}^{N} is f_{\star}({\mbi x})=\prod_{i=1}^{N}f_{i}(\theta_{\star}, x_{i}) where f_{i}(\theta_{\star},\cdot)={\cal N}(1000/\vert\theta_{\star}-r_{i}\vert^{2},10^{-2}) where r_{i}\in{\BBR}^{2} is the location of Node i. The fitted model is f(\theta,{\mbi x})=\prod_{i=1}^{N}f_{i}(\theta, x_{i}) with f_{i}(\theta,\cdot)={\cal N}(1000/\vert\theta-r_{i}\vert^{2}, 10^{-2}) (see [32] for a similar model). The model for matrices W_{n} is the pairwise gossip model described in Section II-B. The step sequence \gamma_{n} is set to 10^{-3}/n^{0.7}. Note that in practice, setting adequately the step size in order to find the sought tradeoff between a short transient phase and a good asymptotic accuracy is known to be sensitive to the statistical model of interest. Finally, the initial value {\mmb\theta}_{0}\in{\BBR}^{2N} is chosen at random under the uniform distribution on the square [{0, 50}]\times [{0,50}].
The convergence of the distributed algorithm to the consensus subspace is illustrated in Fig. 2. Fig. 3 represents the empirical distribution of the normalized estimation error \gamma_{n}^{-1/2}(\langle{\mmb\theta}_{n}\rangle-\theta_{\star}) after n=50\,000 iterations, based on 180 Monte-Carlo runs of the trajectory \bar{\mmb\theta}_{n} initialized in the vicinity of \theta_{\star}. The empirical distribution is coherent with the asymptotic Gaussian distribution given by Theorem 5.
Appendix
For a positive deterministic sequence (a_{n})_{n\geq 1}, the notation x_{n}=o(a_{n}) refers to a deterministic {\BBR}^{\ell}-valued sequence (x_{n})_{n\geq 1} such that \lim_{n\to\infty}a_{n}^{-1}\vert x_{n}\vert=0. For p>0, we denote the L^{p}-norm of a random vector X by \Vert X\Vert_{p}:={\BBE}(\vert X\vert^{p})^{1/p}. The notation X_{n}=o_{L^{p}}(a_{n}) refers to a {\BBR}^{\ell}-valued r.v. (X_{n})_{n\geq 1} such that \lim_{n\to\infty}a_{n}^{-1}\Vert X_{n}\Vert_{p}=0, while X_{n}={\cal O}_{L^{p}}(a_{n}) refers to a {\BBR}^{\ell}-valued r.v. (X_{n})_{n\geq 1} such that \limsup_{n}a_{n}^{-1}\Vert X_{n}\Vert_{p}<\infty. Finally, X_{n}={\cal O}_{w.p.1.}(a_{n}) stands for any {\BBR}^{\ell}-valued r.v. (X_{n})_{n\geq 1} such that \limsup_{n}a_{n}^{-1}\vert X_{n}\vert is finite almost surely.
SECTION B.
Proof of Theorems 1 and 2
We give the proof of Theorem 2; the proof of Theorem 1 is on the same lines and details are omitted. We first prove the almost sure convergence to zero of ({\mmb\theta}_{\bot,n})_{n\geq 1}. The assumption {\BBP}\left\{\limsup_{n}\vert{\mmb\theta}_{n}\vert<\infty\right\}=1 implies {\BBP}\left\{\bigcup_{M\in\BBZ_{+}}\{\sup_{n}\vert{\mmb\theta}_{n}\vert\leq M\}\right\}=1 and we only have to prove that for any M>0, with probability 1, \lim_{n}{\mmb\theta}_{\bot,n}{\bf 1}_{\sup_{n}\vert{\mmb\theta}_{n}\vert\leq M}=0. To that goal, we write for any \delta>0, m\geq 1, \eqalignno{&{\BBP}\big\{\sup_{n\geq m}\vert{\mmb\theta}_{\bot,n}\vert{\bf 1}_{\sup_{n}\vert{\mmb\theta}_{n}\vert\leq M}\geq\delta\big\}\cr &\quad\leq{{1}\over{\delta^{2}}}{\BBE}\left(\sup_{n\geq m}\vert{\mmb\theta}_{\bot,n}\vert^{2}{\bf 1}_{\sup_{n}\vert{\mmb\theta}_{n}\vert\leq M}\right)\cr &\quad\leq{{1}\over{\delta^{2}}}\sum_{n\geq m}n^{-2\alpha}\sup_{n}{\BBE}\left(n^{2\alpha}\vert{\mmb\theta}_{\bot,n}\vert^{2}{\bf 1}_{\sup_{k\leq n-1}\vert{\mmb\theta}_{k}\vert\leq M}\right).}View Source
\eqalignno{&{\BBP}\big\{\sup_{n\geq m}\vert{\mmb\theta}_{\bot,n}\vert{\bf 1}_{\sup_{n}\vert{\mmb\theta}_{n}\vert\leq M}\geq\delta\big\}\cr &\quad\leq{{1}\over{\delta^{2}}}{\BBE}\left(\sup_{n\geq m}\vert{\mmb\theta}_{\bot,n}\vert^{2}{\bf 1}_{\sup_{n}\vert{\mmb\theta}_{n}\vert\leq M}\right)\cr &\quad\leq{{1}\over{\delta^{2}}}\sum_{n\geq m}n^{-2\alpha}\sup_{n}{\BBE}\left(n^{2\alpha}\vert{\mmb\theta}_{\bot,n}\vert^{2}{\bf 1}_{\sup_{k\leq n-1}\vert{\mmb\theta}_{k}\vert\leq M}\right).} Lemma 1 and Assumption 5 imply that ({\mmb\theta}_{\bot,n})_{n\geq 1} converges to zero w.p.1. on the set \{\sup_{n}\vert{\mmb\theta}_{n}\vert\leq M\}.
Lemma 1
Let us consider Assumptions 1a, 1b, 2, 4a, and 5. Then, for any M>0, \sup_{n}n^{2\alpha}{\BBE}\left(\vert{\mmb\theta}_{\bot,n}\vert^{2}\,{\bf 1}_{\sup_{k\leq n-1}\vert{\mmb\theta}_{k}\vert\leq M}\right)<\infty.View Source
\sup_{n}n^{2\alpha}{\BBE}\left(\vert{\mmb\theta}_{\bot,n}\vert^{2}\,{\bf 1}_{\sup_{k\leq n-1}\vert{\mmb\theta}_{k}\vert\leq M}\right)<\infty.
Proof
Fix M>0. Recalling that (A\otimes B) (C\otimes D)=(AC)\otimes (BD), let {\cal W}_{n}\!=\!(W_{n}^{T}\!\otimes\! I_{d}){J_{\bot}}(W_{n}\!\otimes\! I_{d})\!=\!(W_{n}^{T}(I\!-\!N^{-1}{\bf 1}{\bf 1}^{T}) W_{n})\!\otimes\! I_{d}. Since {\mmb\theta}_{\bot,n}={J_{\bot}}(W_{n}\otimes I_{d}) ({\mmb\theta}_{\bot,n-1}+\gamma_{n}{\mbi Y}_{n}), we have by Assumptions 1a and 1b \eqalignno{&{\BBE}\big [\vert{\mmb\theta}_{\bot,n}\vert^{2}\vert{\cal F}_{n-1}\big]\cr &\quad={\BBE}\left[({\mmb\theta}_{\bot,n-1}+\gamma_{n}{J_{\bot}}{\mbi Y}_{n})^{T}{\cal W}_{n}({\mmb\theta}_{\bot,n-1}+\gamma_{n}{\mbi Y}_{n})\,\vert{\cal F}_{n-1}\right]\cr &\quad\leq\rho_{n}{\BBE}\left[\vert{\mmb\theta}_{\bot,n-1}+\gamma_{n}{\mbi Y}_{n}\vert^{2}\,\vert{\cal F}_{n-1}\right]\cr &\quad\leq\rho_{n}\bigg (\vert{\mmb\theta}_{\bot,n-1}\vert^{2}+\gamma_{n}^{2}\int\vert{\mbi y}\vert^{2}\mu_{{\mmb\theta}_{n-1}}(d{\mbi y})\cr &\qquad\quad+2\gamma_{n}\vert{\mmb\theta}_{\bot,n-1}\vert\big (\int\vert{\mbi y}\vert^{2}\mu_{{\mmb\theta}_{n-1}}(d{\mbi y})\big)^{1/2}\bigg).}View Source
\eqalignno{&{\BBE}\big [\vert{\mmb\theta}_{\bot,n}\vert^{2}\vert{\cal F}_{n-1}\big]\cr &\quad={\BBE}\left[({\mmb\theta}_{\bot,n-1}+\gamma_{n}{J_{\bot}}{\mbi Y}_{n})^{T}{\cal W}_{n}({\mmb\theta}_{\bot,n-1}+\gamma_{n}{\mbi Y}_{n})\,\vert{\cal F}_{n-1}\right]\cr &\quad\leq\rho_{n}{\BBE}\left[\vert{\mmb\theta}_{\bot,n-1}+\gamma_{n}{\mbi Y}_{n}\vert^{2}\,\vert{\cal F}_{n-1}\right]\cr &\quad\leq\rho_{n}\bigg (\vert{\mmb\theta}_{\bot,n-1}\vert^{2}+\gamma_{n}^{2}\int\vert{\mbi y}\vert^{2}\mu_{{\mmb\theta}_{n-1}}(d{\mbi y})\cr &\qquad\quad+2\gamma_{n}\vert{\mmb\theta}_{\bot,n-1}\vert\big (\int\vert{\mbi y}\vert^{2}\mu_{{\mmb\theta}_{n-1}}(d{\mbi y})\big)^{1/2}\bigg).} By Assumption 4a,\sup_{n}\int\vert{\mbi y}\vert^{2}\mu_{{\mmb\theta}_{n-1}}(d{\mbi y}){\bf 1}_{\sup_{k\leq n}\vert{\mmb\theta}_{k}\vert\leq M}<\infty.View Source
\sup_{n}\int\vert{\mbi y}\vert^{2}\mu_{{\mmb\theta}_{n-1}}(d{\mbi y}){\bf 1}_{\sup_{k\leq n}\vert{\mmb\theta}_{k}\vert\leq M}<\infty. This implies that there exists a constant C>0 such that {\BBE}\left[\vert{\mmb\theta}_{\bot,n}\vert^{2}\vert{\cal F}_{n-1}\right]\leq\rho_{n}\vert{\mmb\theta}_{\bot,n-1}\vert^{2}+\gamma_{n}^{2}C+2\gamma_{n}\vert{\mmb\theta}_{\bot,n-1}\vert\sqrt{C}.View Source
{\BBE}\left[\vert{\mmb\theta}_{\bot,n}\vert^{2}\vert{\cal F}_{n-1}\right]\leq\rho_{n}\vert{\mmb\theta}_{\bot,n-1}\vert^{2}+\gamma_{n}^{2}C+2\gamma_{n}\vert{\mmb\theta}_{\bot,n-1}\vert\sqrt{C}. Therefore, \eqalignno{&{\BBE}\big [\vert{\mmb\theta}_{\bot,n}\vert^{2}{\bf 1}_{\sup_{k\leq n-1}\vert\mmb\theta_{k}\vert\leq M}\big]\cr &\quad\leq\rho_{n}{\BBE}\left[\vert{\mmb\theta}_{\bot,n-1}\vert^{2}{\bf 1}_{\sup_{k\leq n-2}\vert\mmb\theta_{k}\vert\leq M}\right]+\gamma_{n}^{2}C\cr &\quad+2\gamma_{n}\left(C\,{\BBE}\left[\vert{\mmb\theta}_{\bot,n-1}\vert^{2}{\bf 1}_{\sup_{k\leq n-2}\vert\mmb\theta_{k}\vert\leq M}\right]\right)^{1/2}.}View Source
\eqalignno{&{\BBE}\big [\vert{\mmb\theta}_{\bot,n}\vert^{2}{\bf 1}_{\sup_{k\leq n-1}\vert\mmb\theta_{k}\vert\leq M}\big]\cr &\quad\leq\rho_{n}{\BBE}\left[\vert{\mmb\theta}_{\bot,n-1}\vert^{2}{\bf 1}_{\sup_{k\leq n-2}\vert\mmb\theta_{k}\vert\leq M}\right]+\gamma_{n}^{2}C\cr &\quad+2\gamma_{n}\left(C\,{\BBE}\left[\vert{\mmb\theta}_{\bot,n-1}\vert^{2}{\bf 1}_{\sup_{k\leq n-2}\vert\mmb\theta_{k}\vert\leq M}\right]\right)^{1/2}.} The proof now follows the same lines as in the proof of [33, Lemma 1, Eq. (17)] [see also Lemma 3, (22)]. \hfill\blacksquare
Now, the study of the whole vector {\mmb\theta}_{n} is reduced to the analysis of its projection J{\mmb\theta}_{n}={\bf 1}\otimes\langle{\mmb\theta}_{n}\rangle onto the consensus space. We now focus on the average \langle{\mmb\theta}_{n}\rangle. The convergence of the sequence (\langle{\mmb\theta}_{n}\rangle)_{n\geq 1} is a direct consequence of Lemma 2 along with [22, Ths. 2.2. and 2.3.].
Lemma 2
Under Assumptions 1a, 1b, 2, 4, 5, and (14) it holds \langle{\mmb\theta}_{n}\rangle=\langle{\mmb\theta}_{n-1}\rangle+\gamma_{n}h(\langle{\mmb\theta}_{n-1}\rangle)+\gamma_{n}\zeta_{n}View Source
\langle{\mmb\theta}_{n}\rangle=\langle{\mmb\theta}_{n-1}\rangle+\gamma_{n}h(\langle{\mmb\theta}_{n-1}\rangle)+\gamma_{n}\zeta_{n} with \sup_{n}\vert\sum_{k=1}^{n}\gamma_{k}\zeta_{k}\vert<\infty with probability 1. Then, \lim_{n}{\ssr d}(\langle{\mmb\theta}_{n}\rangle,{\cal L})=0 with probability 1.
Proof
Equations (4) and (8) along with Assumption 1a yield \langle{\mmb\theta}_{n}\rangle=\langle{\mmb\theta}_{n-1}\rangle+\gamma_{n}\langle{\mbi Z}_{n}\rangle,\eqno{\hbox{(20)}}View Source
\langle{\mmb\theta}_{n}\rangle=\langle{\mmb\theta}_{n-1}\rangle+\gamma_{n}\langle{\mbi Z}_{n}\rangle,\eqno{\hbox{(20)}} where {\mbi Z}_{n}:=(W_{n}\otimes I_{d})({\mbi Y}_{n}+\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1}), upon noting that under Assumption 1a, (W_{n}\otimes I_{d}) J=J. We write \langle{\mbi Z}_{n}\rangle=h(\langle{\mmb\theta}_{n-1}\rangle)+e_{n}+\xi_{n} where\eqalignno{e_{n}:=&\,\langle (W_{n}\otimes I_{d})({\mbi Y}_{n}+\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1})\rangle-\int\langle{\mbi y}\rangle\mu_{{\mmb\theta}_{n-1}}(d{\mbi y})\cr \xi_{n}:=&\,\int\langle{\mbi y}\rangle\mu_{{\mmb\theta}_{n-1}}(d{\mbi y})-\int\langle{\mbi y}\rangle\mu_{{\bf 1}\otimes\langle{\mmb\theta}_{n-1}\rangle}(d{\mbi y}).}View Source
\eqalignno{e_{n}:=&\,\langle (W_{n}\otimes I_{d})({\mbi Y}_{n}+\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1})\rangle-\int\langle{\mbi y}\rangle\mu_{{\mmb\theta}_{n-1}}(d{\mbi y})\cr \xi_{n}:=&\,\int\langle{\mbi y}\rangle\mu_{{\mmb\theta}_{n-1}}(d{\mbi y})-\int\langle{\mbi y}\rangle\mu_{{\bf 1}\otimes\langle{\mmb\theta}_{n-1}\rangle}(d{\mbi y}).} By Assumption 4b and the inequality 2 a b\leq a^{2}+b^{2}, for any M>0 there exists a constant C such that \displaylines{{\BBE}\left\vert{\bf 1}_{\sup_{n}\vert{\mmb\theta}_{n}\vert\leq M}\sum_{n\geq 1}\gamma_{n}\xi_{n}\right\vert\hfill\cr \hfill\leq C \left(\sum_{n\geq 1}\gamma_{n}^{2}+\sum_{n\geq1}{\BBE}\left(\left\vert{\mmb\theta}_{\bot,n-1}\right\vert^{2}{\bf 1}_{\sup_{n}\vert{\mmb\theta}_{n}\vert\leq M}\right)\right).\quad{\hbox{(21)}}}View Source
\displaylines{{\BBE}\left\vert{\bf 1}_{\sup_{n}\vert{\mmb\theta}_{n}\vert\leq M}\sum_{n\geq 1}\gamma_{n}\xi_{n}\right\vert\hfill\cr \hfill\leq C \left(\sum_{n\geq 1}\gamma_{n}^{2}+\sum_{n\geq1}{\BBE}\left(\left\vert{\mmb\theta}_{\bot,n-1}\right\vert^{2}{\bf 1}_{\sup_{n}\vert{\mmb\theta}_{n}\vert\leq M}\right)\right).\quad{\hbox{(21)}}} Therefore, the RHS in (21) is finite under the condition 2 and Lemma 1, thus implying that \sum_{n\geq 1}\gamma_{n}\xi_{n} converges w.p.1. on the set \{\sup_{n}\vert{\mmb\theta}_{n}\vert\leq M\} for any M>0 and therefore w.p.1. since {\BBP}\left\{\sup_{n}\vert{\mmb\theta}_{n}\vert<\infty\right\}=1.
Since {\BBE}\left[e_{n}\,\vert{\cal F}_{n-1}\right]=0, the sequence \left(S_{n}:=\sum_{k=1}^{n}\gamma_{k}e_{k}{\bf 1}_{\sup_{\ell\leq k-1}\vert\mmb\theta_{\ell}\vert\leq M}\right)_{n\geq 1} is a martingale. We prove that it converges almost surely by estimating its second-order moment. For any k\geq 1, see the equation at the bottom of the page,\eqalignno{{\BBE}\left[\vert S_{k}\vert^{2}\right]\leq &\,\sum_{n\geq 1}\gamma_{n}^{2}\,{\BBE}\left[\left\vert e_{n}\right\vert^{2}{\bf 1}_{\sup_{\ell\leq n-1}\vert\mmb\theta_{\ell}\vert\leq M}\right]\cr \leq &\,\sum_{n\geq1}\gamma_{n}^{2}\,{\BBE}\left[({\mbi Y}_{n}+\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1})^{T}P_{n}({\mbi Y}_{n}+\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1}){\bf 1}_{\sup_{\ell\leq n-1}\vert\mmb\theta_{\ell}\vert\leq M}\right]}View Source
\eqalignno{{\BBE}\left[\vert S_{k}\vert^{2}\right]\leq &\,\sum_{n\geq 1}\gamma_{n}^{2}\,{\BBE}\left[\left\vert e_{n}\right\vert^{2}{\bf 1}_{\sup_{\ell\leq n-1}\vert\mmb\theta_{\ell}\vert\leq M}\right]\cr \leq &\,\sum_{n\geq1}\gamma_{n}^{2}\,{\BBE}\left[({\mbi Y}_{n}+\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1})^{T}P_{n}({\mbi Y}_{n}+\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1}){\bf 1}_{\sup_{\ell\leq n-1}\vert\mmb\theta_{\ell}\vert\leq M}\right]} where we set P_{n}:=N^{-2}W_{n}^{T}{\bf 1}{\bf 1}^{T}W_{n}\otimes I_{d}. Note that P_{n} is independent of Y_{n} conditionally to {\cal F}_{n-1}. Since W_{n} is a stochastic matrix, its spectral norm is bounded uniformly in n. Therefore, there exists a constant C>0 such that \eqalignno{{\BBE}\left[\vert S_{n}\vert^{2}\right]\!\leq&\, C\!\sum_{n\geq 1}\!\gamma_{n}^{2}{\BBE}\!\left[\left\vert{\mbi Y}_{n}\!+\!\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1}\right\vert^{2}{\bf 1}_{\sup_{\ell\leq n-1}\vert\mmb\theta_{\ell}\vert\leq M}\right]\cr \leq&\, 2C\sum_{n\geq 1}\gamma_{n}^{2}\,{\BBE}\left[\vert{\mbi Y}_{n}\vert^{2}{\bf 1}_{\sup_{\ell\leq n-1}\vert\mmb\theta_{\ell}\vert\leq M}\right]\cr &+2C\sum_{n\geq 1}{\BBE}\left[\vert{\mmb\theta}_{\bot,n-1}\vert^{2}{\bf 1}_{\sup_{\ell\leq n-1}\vert\mmb\theta_{\ell}\vert\leq M}\right].}View Source
\eqalignno{{\BBE}\left[\vert S_{n}\vert^{2}\right]\!\leq&\, C\!\sum_{n\geq 1}\!\gamma_{n}^{2}{\BBE}\!\left[\left\vert{\mbi Y}_{n}\!+\!\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1}\right\vert^{2}{\bf 1}_{\sup_{\ell\leq n-1}\vert\mmb\theta_{\ell}\vert\leq M}\right]\cr \leq&\, 2C\sum_{n\geq 1}\gamma_{n}^{2}\,{\BBE}\left[\vert{\mbi Y}_{n}\vert^{2}{\bf 1}_{\sup_{\ell\leq n-1}\vert\mmb\theta_{\ell}\vert\leq M}\right]\cr &+2C\sum_{n\geq 1}{\BBE}\left[\vert{\mmb\theta}_{\bot,n-1}\vert^{2}{\bf 1}_{\sup_{\ell\leq n-1}\vert\mmb\theta_{\ell}\vert\leq M}\right].} By Assumption 4a, \sup_{n}{\BBE}\left[\vert{\mbi Y}_{n}\vert^{2}{\bf 1}_{\sup_{\ell\leq n-1}\vert\mmb\theta_{\ell}\vert\leq M}\right]<\infty.View Source
\sup_{n}{\BBE}\left[\vert{\mbi Y}_{n}\vert^{2}{\bf 1}_{\sup_{\ell\leq n-1}\vert\mmb\theta_{\ell}\vert\leq M}\right]<\infty. By Lemma 1 and Assumption 2, it follows that \sup_{n}{\BBE}\left[\vert S_{n}\vert^{2}\right] is finite thus implying that the martingale (S_{n})_{n\geq 1} converges almost surely to a r.v. which is finite w.p.1. (see e.g., [34, Corollary 2.2.]).
We now consider the last term \sum_{k}\gamma_{k}e_{k}\left(1-{\bf 1}_{\sup_{\ell\leq k-1}\vert\mmb\theta_{\ell}\vert\leq M}\right). On the set \{\sup_{n}\vert{\mmb\theta}_{n}\vert\leq M\}, this sum is null. This concludes the proof since {\BBP}\left\{\sup_{n}\vert{\mmb\theta}_{n}\vert<\infty\right\}=1. \hfill\blacksquare
SECTION C.
Proof of Theorem 3
Our stability result relies on preliminary technical lemmas, Lemmas 3 and 4. Theorem 3 is a consequence of Lemma 5: it is established that \lim_{n}{\mmb\theta}_{\bot,n}=0 with probability 1, which implies that {\BBP}\left\{\limsup_{n}\vert{\mmb\theta}_{\bot,n}\vert<\infty\right\}=1. It is also established that {\BBP}\left\{\limsup_{n}\vert\langle{\mmb\theta}_{n}\rangle\vert<\infty\right\}=1.
Lemma 3
Let (\gamma_{n})_{n\geq 0}, (\rho_{n})_{n\geq 0} be respectively a positive and a [{0,1}]-valued sequence such that \sum_{n}\gamma_{n}^{2}<\infty; and u_{n}, v_{n} be two real sequences such that for n\geq n_{0}, \eqalignno{u_{n}\leq &\,\rho_{n}u_{n-1}+\gamma_{n}M\sqrt{u_{n-1}}(1+u_{n-1}+v_{n-1})^{1/2}\cr &+\gamma_{n}^{2}M\left(1+u_{n-1}+v_{n-1}\right),&\hbox{(22)}\cr v_{n}\leq &\, v_{n-1}+M u_{n-1}+\gamma_{n}M\sqrt{u_{n-1}}\left(1+u_{n-1}+v_{n-1}\right)^{1/2}\cr &+\gamma_{n}^{2}M(1+u_{n-1}+v_{n-1}).&{\hbox{(23)}}}View Source
\eqalignno{u_{n}\leq &\,\rho_{n}u_{n-1}+\gamma_{n}M\sqrt{u_{n-1}}(1+u_{n-1}+v_{n-1})^{1/2}\cr &+\gamma_{n}^{2}M\left(1+u_{n-1}+v_{n-1}\right),&\hbox{(22)}\cr v_{n}\leq &\, v_{n-1}+M u_{n-1}+\gamma_{n}M\sqrt{u_{n-1}}\left(1+u_{n-1}+v_{n-1}\right)^{1/2}\cr &+\gamma_{n}^{2}M(1+u_{n-1}+v_{n-1}).&{\hbox{(23)}}} Then, i) \sup_{n}v_{n}<\infty, ii) \limsup_{n}\phi_{n}u_{n}<\infty for any positive sequence (\phi_{n})_{n\geq 0} such that \eqalignno{&\limsup_{n}\left(\gamma_{n}\sqrt{\phi_{n}}+{{\phi_{n-1}}\over{\phi_{n}}}\right)<\infty,\cr &\liminf_{n}(\gamma_{n}\sqrt{\phi_{n}})^{-1}\left({{\phi_{n-1}}\over{\phi_{n}}}-\rho_{n}\right)>0, &{\hbox{(24)}}\cr &\sum_{n}\phi_{n}^{-1}<\infty.&{\hbox{(25)}}}View Source
\eqalignno{&\limsup_{n}\left(\gamma_{n}\sqrt{\phi_{n}}+{{\phi_{n-1}}\over{\phi_{n}}}\right)<\infty,\cr &\liminf_{n}(\gamma_{n}\sqrt{\phi_{n}})^{-1}\left({{\phi_{n-1}}\over{\phi_{n}}}-\rho_{n}\right)>0, &{\hbox{(24)}}\cr &\sum_{n}\phi_{n}^{-1}<\infty.&{\hbox{(25)}}}
Proof
Set {\tilde \gamma}_{n}=(1+M)\gamma_{n}. Define two sequences (a_{n},b_{n})_{n\geq n_{0}} such that a_{n_{0}}=b_{n_{0}}=\max (u_{n_{0}},v_{n_{0}}) and for each n\geq n_{0}+1: \eqalignno{a_{n}=&\,\rho_{n}a_{n-1}+{\tilde \gamma}_{n}\sqrt{a_{n-1}}\, (1+a_{n-1}+b_{n-1})^{1/2}\cr &+{\tilde \gamma}_{n}^{2}(1+a_{n-1}+b_{n-1}) &{\hbox{(26)}}\cr b_{n}=&\, b_{n-1}+M a_{n-1}+{\tilde \gamma}_{n}\sqrt{a_{n-1}}(1+a_{n-1}+b_{n-1})^{1/2}\cr &+{\tilde \gamma}_{n}^{2}(1+a_{n-1}+b_{n-1}).&{\hbox{(27)}}}View Source
\eqalignno{a_{n}=&\,\rho_{n}a_{n-1}+{\tilde \gamma}_{n}\sqrt{a_{n-1}}\, (1+a_{n-1}+b_{n-1})^{1/2}\cr &+{\tilde \gamma}_{n}^{2}(1+a_{n-1}+b_{n-1}) &{\hbox{(26)}}\cr b_{n}=&\, b_{n-1}+M a_{n-1}+{\tilde \gamma}_{n}\sqrt{a_{n-1}}(1+a_{n-1}+b_{n-1})^{1/2}\cr &+{\tilde \gamma}_{n}^{2}(1+a_{n-1}+b_{n-1}).&{\hbox{(27)}}} It is straightforward to show by induction that u_{n}\leq a_{n} and v_{n}\leq b_{n} for any n\geq n_{0}. In addition, b_{n}=b_{n-1}+a_{n}+(M-\rho_{n})a_{n-1}. Thus, for n\geq n_{0}+1, b_{n}=a_{n}+\sum_{k=n_{0}}^{n-1}(M+1-\rho_{k+1}) a_{k}.View Source
b_{n}=a_{n}+\sum_{k=n_{0}}^{n-1}(M+1-\rho_{k+1}) a_{k}. Define A_{n}:=(M+1)\sum_{k=n_{0}}^{n}a_{k}, n\geq n_{0}. The above equality implies that a_{n}\leq b_{n}\leq A_{n}. As a consequence, (26) implies \displaylines{a_{n}\leq\rho_{n}a_{n-1}+{\tilde \gamma}_{n}\sqrt{a_{n-1}}\, (1+2\,A_{n-1})^{1/2}\hfill\cr \hfill+{\tilde \gamma}_{n}^{2}(1+2A_{n-1}).\quad{\hbox{(28)}}}View Source
\displaylines{a_{n}\leq\rho_{n}a_{n-1}+{\tilde \gamma}_{n}\sqrt{a_{n-1}}\, (1+2\,A_{n-1})^{1/2}\hfill\cr \hfill+{\tilde \gamma}_{n}^{2}(1+2A_{n-1}).\quad{\hbox{(28)}}} As (A_{n})_{n\geq n_{0}} is a positive increasing sequence, for any n\geq n_{0}+1,\displaylines{{{a_{n}}\over{A_{n}}}\leq\rho_{n}{{a_{n-1}}\over{A_{n-1}}}+{\tilde \gamma}_{n}\sqrt{{a_{n-1}}\over{A_{n-1}}}\,\left({{1}\over{A_{n_{0}}}}+2\right)^{1/2}\hfill\cr \hfill+{\tilde \gamma}_{n}^{2}\left({{1}\over{A_{n_{0}}}}+2\right).\quad{\hbox{(29)}}}View Source
\displaylines{{{a_{n}}\over{A_{n}}}\leq\rho_{n}{{a_{n-1}}\over{A_{n-1}}}+{\tilde \gamma}_{n}\sqrt{{a_{n-1}}\over{A_{n-1}}}\,\left({{1}\over{A_{n_{0}}}}+2\right)^{1/2}\hfill\cr \hfill+{\tilde \gamma}_{n}^{2}\left({{1}\over{A_{n_{0}}}}+2\right).\quad{\hbox{(29)}}}
Define L^{2}:=1/A_{n_{0}}+2, and c_{n}:=\phi_{n}{a_{n}}/{A_{n}}. By (29), for any n\geq n_{0}+1, c_{n}\leq\rho_{n}{{\phi_{n}}\over{\phi_{n-1}}}c_{n-1}+L{\tilde \gamma}_{n}\sqrt{c_{n-1}\phi_{n}}\sqrt{{\phi_{n}}\over{\phi_{n-1}}}+L^{2}\;{\tilde \gamma}_{n}^{2}\phi_{n},\eqno{\hbox{(30)}}View Source
c_{n}\leq\rho_{n}{{\phi_{n}}\over{\phi_{n-1}}}c_{n-1}+L{\tilde \gamma}_{n}\sqrt{c_{n-1}\phi_{n}}\sqrt{{\phi_{n}}\over{\phi_{n-1}}}+L^{2}\;{\tilde \gamma}_{n}^{2}\phi_{n},\eqno{\hbox{(30)}} and under the assumption (24), there exist n_{1}\geq n_{0} and a constant \xi>0 such that for any n\geq n_{1}, \displaylines{\sqrt{{\phi_{n-1}}\over{\phi_{n}}}L\xi\left\{1+\xi L{\tilde \gamma}_{n}\sqrt{\phi_{n-1}}\right\}\hfill\cr \hfill\leq\left({{\phi_{n-1}}\over{\phi_{n}}}-\rho_{n}\right)\left({\tilde \gamma}_{n}\sqrt{\phi_{n}}\right)^{-1}.\quad{\hbox{(31)}}}View Source
\displaylines{\sqrt{{\phi_{n-1}}\over{\phi_{n}}}L\xi\left\{1+\xi L{\tilde \gamma}_{n}\sqrt{\phi_{n-1}}\right\}\hfill\cr \hfill\leq\left({{\phi_{n-1}}\over{\phi_{n}}}-\rho_{n}\right)\left({\tilde \gamma}_{n}\sqrt{\phi_{n}}\right)^{-1}.\quad{\hbox{(31)}}} Define A:=\max\left({{1}\over{\xi}},{{1}\over{\xi^{2}}},c_{n_{1}}\right).\eqno{\hbox{(32)}}View Source
A:=\max\left({{1}\over{\xi}},{{1}\over{\xi^{2}}},c_{n_{1}}\right).\eqno{\hbox{(32)}} We prove by induction on n that c_{n}\leq A for any n\geq n_{1}. The claim holds true for n=n_{1} by definition of A. Assume that c_{n-1}\leq A for some n-1\geq n_{1}. Using (30) and (32), for n\geq n_{1}+1, {{c_{n}}\over{A}}\leq\rho_{n}{{\phi_{n}}\over{\phi_{n-1}}}+{{L}\over{\sqrt{A}}}{\tilde \gamma}_{n}\sqrt{\phi_{n}}\sqrt{{\phi_{n}}\over{\phi_{n-1}}}+{{L^{2}}\over{A}}\;{\tilde \gamma}_{n}^{2}\phi_{n}.View Source
{{c_{n}}\over{A}}\leq\rho_{n}{{\phi_{n}}\over{\phi_{n-1}}}+{{L}\over{\sqrt{A}}}{\tilde \gamma}_{n}\sqrt{\phi_{n}}\sqrt{{\phi_{n}}\over{\phi_{n-1}}}+{{L^{2}}\over{A}}\;{\tilde \gamma}_{n}^{2}\phi_{n}. By (31), the RHS is less than one so that c_{n}\leq A. This proves that (c_{n})_{n\geq n_{0}} is a bounded sequence.
We prove that (A_{n})_{n\geq n_{0}} is a bounded sequence. Using the fact that \sup_{n\geq n_{1}}\rho_{n}\leq 1, (A_{n})_{n\geq n_{0}} is increasing and (28), it holds for n\geq n_{1}+1 \eqalignno{A_{n}\!=&\,A_{n-1}+a_{n}\cr \leq&\, A_{n-1}+a_{n-1}+{\tilde \gamma}_{n}\sqrt{a_{n-1}}\,\sqrt{A_{n-1}}L^{1/2}+{\tilde \gamma}_{n}^{2}L^{2}A_{n-1}\cr \leq &\,\left(\!1\!+\!c_{n-1}\phi_{n-1}^{-1}\!+\!L^{1/2}{\tilde \gamma}_{n}\phi_{n-1}^{-1/2}\sqrt{c_{n-1}}\!+\!{\tilde \gamma}_{n}^{2}L^{2}\!\right) A_{n-1}.}View Source
\eqalignno{A_{n}\!=&\,A_{n-1}+a_{n}\cr \leq&\, A_{n-1}+a_{n-1}+{\tilde \gamma}_{n}\sqrt{a_{n-1}}\,\sqrt{A_{n-1}}L^{1/2}+{\tilde \gamma}_{n}^{2}L^{2}A_{n-1}\cr \leq &\,\left(\!1\!+\!c_{n-1}\phi_{n-1}^{-1}\!+\!L^{1/2}{\tilde \gamma}_{n}\phi_{n-1}^{-1/2}\sqrt{c_{n-1}}\!+\!{\tilde \gamma}_{n}^{2}L^{2}\!\right) A_{n-1}.} Finally, since \sup_{n\geq n_{1}}c_{n}\leq A and (1+t^{2})\leq\exp (t^{2}), there exists C>0s.t. for any n\geq n_{1}+1, A_{n}\leq\exp\left(C\{\phi_{n-1}^{-1}+{\tilde \gamma}_{n}^{2}\}\right) A_{n-1} (note that under (24), \limsup_{n}\{{\tilde \gamma}_{n}/\sqrt{\phi_{n}}\}\phi_{n}<\infty). By assumptions, \sum_{n}\{\phi_{n-1}^{-1}+{\tilde \gamma}_{n}^{2}\}<\infty, (A_{n})_{n\geq n_{0}} is therefore bounded.
The proof of the lemma is concluded upon noting that v_{n}\leq b_{n}\leq A_{n} and u_{n}\leq a_{n}\leq{\tilde \gamma}_{n}^{2}c_{n}A_{n}.
\hfill\blacksquareLemma 4
Let V:{\BBR}^{d}\to{\BBR}^{+} be a differentiable function such that \nabla V is Lipschitz on {\BBR}^{d}. There exist constants C, C^{\prime} such that for any \theta\in{\BBR}^{d}, \vert\nabla V(\theta)\vert^{2}\leq C V(\theta), and for any \theta, \theta^{\prime}\in{\BBR}^{d}, V(\theta^{\prime})\leq V(\theta)+\nabla V(\theta)^{T}(\theta^{\prime}-\theta)+C^{\prime}\vert\theta^{\prime}-\theta\vert^{2}.\eqno{\hbox{(35)}}View Source
V(\theta^{\prime})\leq V(\theta)+\nabla V(\theta)^{T}(\theta^{\prime}-\theta)+C^{\prime}\vert\theta^{\prime}-\theta\vert^{2}.\eqno{\hbox{(35)}}
Proof
Given any \theta, \theta^{\prime}\in{\BBR}^{d}, we have \displaylines{V(\theta^{\prime})=V(\theta)+\nabla V(\theta)^{T}(\theta^{\prime}-\theta)\hfill\cr \hfill+\int_{0}^{1}\left(\nabla V(\theta+t(\theta^{\prime}-\theta))-\nabla V(\theta)\right)^{T}(\theta^{\prime}-\theta)\, dt.}View Source
\displaylines{V(\theta^{\prime})=V(\theta)+\nabla V(\theta)^{T}(\theta^{\prime}-\theta)\hfill\cr \hfill+\int_{0}^{1}\left(\nabla V(\theta+t(\theta^{\prime}-\theta))-\nabla V(\theta)\right)^{T}(\theta^{\prime}-\theta)\, dt.} This implies (35) since \nabla V is Lipschitz. Then, applying (35) with \theta^{\prime}=\theta-\mu\nabla V(\theta) where \mu>0 and recalling that V is nonnegative, we also have 0\leq V(\theta)-\mu(1-\mu C^{\prime})\vert\nabla V(\theta)\vert^{2}. Choosing \mu small enough, we thus get the result. \hfill\blacksquare
Lemma 5 (Agreement and Stability)
Let us consider Assumptions 1a, 1b, 2, 3a, 3b, and 5. Assume in addition ST1 and ST2. Then,
\sum_{n\geq 1}{\BBE}\left\vert{\mmb\theta}_{\bot,n}\right\vert^{2}<\infty and ({\mmb\theta}_{\bot,n})_{n\geq 1} converges to zero w.p.1.
\sup_{n\geq 1}{\BBE}V(\langle{\mmb\theta}_{n}\rangle)<\infty and \sup_{n}{\BBE}\left[\vert{\mbi Y}_{n}\vert^{2}\right]<\infty,
where
\langle{\mbi x}\rangle and
{\mbi x}_{\bot} are given by
(8) and
(9).
Proof
Define u_{n}:={\BBE}\left[\vert{\mmb\theta}_{\bot,n}\vert^{2}\right] and v_{n}:={\BBE}\left[V(\langle{\mmb\theta}_{n}\rangle)\right]. We prove that there exists a constant M>0 and an integer n_{0} such that for any n\geq n_{0}, inequalities (22) and (23) are satisfied. The proof is then concluded by application of Lemma 3 upon noting that under Assumption 2, the rate \phi_{n}=n^{2\alpha} satisfies the conditions (24) and (25).
Proof of (22)
As W_{n}{\bf 1}={\bf 1}, we have {J_{\bot}}(W_{n}\otimes I_{d})={J_{\bot}}(W_{n}\otimes I_{d}){J_{\bot}}. As a consequence, {\mmb\theta}_{\bot,n}={J_{\bot}}(W_{n}\otimes I_{d})({\mmb\theta}_{\bot,n-1}+\gamma_{n}{\mbi Y}_{n}). We expand the square Euclidean norm of the latter vector (see the equation given at the bottom of page), \vert{\mmb\theta}_{\bot,n}\vert^{2}=({\mmb\theta}_{\bot,n-1}+\gamma_{n}{\mbi Y}_{n})^{T}(\{W_{n}^{T}(I_{N}-{\bf 1}{\bf 1}^{T}/N)W_{n}\}\otimes I_{d})({\mmb\theta}_{\bot,n-1}+\gamma_{n}{\mbi Y}_{n})View Source
\vert{\mmb\theta}_{\bot,n}\vert^{2}=({\mmb\theta}_{\bot,n-1}+\gamma_{n}{\mbi Y}_{n})^{T}(\{W_{n}^{T}(I_{N}-{\bf 1}{\bf 1}^{T}/N)W_{n}\}\otimes I_{d})({\mmb\theta}_{\bot,n-1}+\gamma_{n}{\mbi Y}_{n})integrate both sides of the above equation w.r.t. the r.v. W_{n}; by Assumption 1b\BBE[\vert{\mmb\theta}_{\bot,n}\vert^{2}\,\vert{\cal F}_{n-1},{\mbi Y}_{n}]\leq\rho_{n}\vert{\mmb\theta}_{\bot,n-1}+\gamma_{n}{\mbi Y}_{n}\vert^{2}.View Source
\BBE[\vert{\mmb\theta}_{\bot,n}\vert^{2}\,\vert{\cal F}_{n-1},{\mbi Y}_{n}]\leq\rho_{n}\vert{\mmb\theta}_{\bot,n-1}+\gamma_{n}{\mbi Y}_{n}\vert^{2}. Under Assumption 5, \lim_{n}n (1-\rho_{n})=+\infty: then, there exists n_{0} such that \rho_{n}<1 for any n\geq n_{0}. We obtain \displaylines{{\BBE}[\vert{\mmb\theta}_{\bot,n}\vert^{2}]\leq\rho_{n}{\BBE}[\vert{\mmb\theta}_{\bot,n-1}\vert^{2}]+2\gamma_{n}{\BBE}[\vert{\mmb\theta}_{\bot,n-1}\vert\,\vert{\mbi Y}_{n}\vert]\hfill\cr \hfill+\gamma_{n}^{2}{\BBE}[\vert{\mbi Y}_{n}\vert^{2}],}View Source
\displaylines{{\BBE}[\vert{\mmb\theta}_{\bot,n}\vert^{2}]\leq\rho_{n}{\BBE}[\vert{\mmb\theta}_{\bot,n-1}\vert^{2}]+2\gamma_{n}{\BBE}[\vert{\mmb\theta}_{\bot,n-1}\vert\,\vert{\mbi Y}_{n}\vert]\hfill\cr \hfill+\gamma_{n}^{2}{\BBE}[\vert{\mbi Y}_{n}\vert^{2}],} for any n\geq n_{0}. From Cauchy–Schwartz inequality, {\BBE}[\vert{\mmb\theta}_{\bot,n-1}\vert\,\vert{\mbi Y}_{n}\vert]\leq\sqrt{u_{n-1}}({\BBE}[\vert{\mbi Y}_{n}\vert^{2}])^{1/2}. Thus, u_{n}\leq\rho_{n}u_{n-1}+2\gamma_{n}\sqrt{u_{n-1}}({\BBE}[\vert{\mbi Y}_{n}\vert^{2}])^{1/2}+\gamma_{n}^{2}{\BBE}[\vert{\mbi Y}_{n}\vert^{2}].View Source
u_{n}\leq\rho_{n}u_{n-1}+2\gamma_{n}\sqrt{u_{n-1}}({\BBE}[\vert{\mbi Y}_{n}\vert^{2}])^{1/2}+\gamma_{n}^{2}{\BBE}[\vert{\mbi Y}_{n}\vert^{2}]. By Assumption ST2, we have the following estimate {\BBE}[\vert{\mbi Y}_{n}\vert^{2}]\leq C_{1}\left(1+v_{n-1}+u_{n-1}\right). This completes the proof of (22), for any constant M larger than 1+C_{1}.
Proof of (23)
Lemma 4 is applied with \theta\leftarrow\langle{\mmb\theta}_{n}\rangle and \theta^{\prime}\leftarrow\langle{\mmb\theta}_{n-1}\rangle. We have to evaluate the difference \langle{\mmb\theta}_{n}\rangle-\langle{\mmb\theta}_{n-1}\rangle. By (4), \langle{\mmb\theta}_{n}\rangle=({{{\bf 1}^{T}W_{n}}\over{N}}\otimes I_{d})\left({\mmb\theta}_{n-1}+\gamma_{n}{\mbi Y}_{n}\right).View Source
\langle{\mmb\theta}_{n}\rangle=({{{\bf 1}^{T}W_{n}}\over{N}}\otimes I_{d})\left({\mmb\theta}_{n-1}+\gamma_{n}{\mbi Y}_{n}\right). Therefore, \eqalignno{\langle{\mmb\theta}_{n}\rangle-\langle{\mmb\theta}_{n-1}\rangle=&\,\left({{{\bf 1}^{T}W_{n}-{\bf 1}^{T}}\over{N}}\otimes I_{d}\right){\mmb\theta}_{n-1}\cr &+\left({{{\bf 1}^{T}W_{n}}\over{N}}\otimes I_{d}\right)\gamma_{n}{\mbi Y}_{n}\cr =&\,\left({{{\bf 1}^{T}W_{n}-{\bf 1}^{T}}\over{N}}\otimes I_{d}\right){\mmb\theta}_{\bot,n-1}\cr &+\left({{{\bf 1}^{T}W_{n}}\over{N}}\otimes I_{d}\right)\gamma_{n}{\mbi Y}_{n},&{\hbox{(36)}}}View Source
\eqalignno{\langle{\mmb\theta}_{n}\rangle-\langle{\mmb\theta}_{n-1}\rangle=&\,\left({{{\bf 1}^{T}W_{n}-{\bf 1}^{T}}\over{N}}\otimes I_{d}\right){\mmb\theta}_{n-1}\cr &+\left({{{\bf 1}^{T}W_{n}}\over{N}}\otimes I_{d}\right)\gamma_{n}{\mbi Y}_{n}\cr =&\,\left({{{\bf 1}^{T}W_{n}-{\bf 1}^{T}}\over{N}}\otimes I_{d}\right){\mmb\theta}_{\bot,n-1}\cr &+\left({{{\bf 1}^{T}W_{n}}\over{N}}\otimes I_{d}\right)\gamma_{n}{\mbi Y}_{n},&{\hbox{(36)}}} where the second equality is due to the fact that W_{n} is row-stochastic. Under Assumption 1a, {\BBE}(W_{n}) is doubly stochastic. Thus, using the Assumption 1b {\BBE}[\langle{\mmb\theta}_{n}\rangle-\langle{\mmb\theta}_{n-1}\rangle\vert{\cal F}_{n-1}]=\gamma_{n}\int\langle{\mbi y}\rangle\;\mu_{{\mmb\theta}_{n-1}}(d{\mbi y}).\eqno{\hbox{(37)}}View Source
{\BBE}[\langle{\mmb\theta}_{n}\rangle-\langle{\mmb\theta}_{n-1}\rangle\vert{\cal F}_{n-1}]=\gamma_{n}\int\langle{\mbi y}\rangle\;\mu_{{\mmb\theta}_{n-1}}(d{\mbi y}).\eqno{\hbox{(37)}} Plugging (37) into (35), there exists C^{\prime} such that for any n, \eqalignno{{\BBE}[V(\langle{\mmb\theta}_{n}\rangle)\vert{\cal F}_{n-1}]\leq &\,V(\langle{\mmb\theta}_{n-1}\rangle)\cr &+\gamma_{n}\nabla V(\langle{\mmb\theta}_{n-1}\rangle)^{T}\int\langle{\mbi y}\rangle\;\mu_{{\mmb\theta}_{n-1}}(d{\mbi y})\cr &+C^{\prime}{\BBE}[\vert\langle{\mmb\theta}_{n}\rangle-\langle{\mmb\theta}_{n-1}\rangle\vert^{2}\vert{\cal F}_{n-1}].}View Source
\eqalignno{{\BBE}[V(\langle{\mmb\theta}_{n}\rangle)\vert{\cal F}_{n-1}]\leq &\,V(\langle{\mmb\theta}_{n-1}\rangle)\cr &+\gamma_{n}\nabla V(\langle{\mmb\theta}_{n-1}\rangle)^{T}\int\langle{\mbi y}\rangle\;\mu_{{\mmb\theta}_{n-1}}(d{\mbi y})\cr &+C^{\prime}{\BBE}[\vert\langle{\mmb\theta}_{n}\rangle-\langle{\mmb\theta}_{n-1}\rangle\vert^{2}\vert{\cal F}_{n-1}].} By the Condition 3b, the quantity {-}\nabla V(\langle{\mmb\theta}_{n-1}\rangle)^{T}h(\langle{\mmb\theta}_{n-1}\rangle) is positive; therefore, \eqalignno{&{\BBE}[V(\langle{\mmb\theta}_{n}\rangle)\vert{\cal F}_{n-1}]\leq V(\langle{\mmb\theta}_{n-1}\rangle)\cr &\quad+\gamma_{n}\nabla V(\langle{\mmb\theta}_{n-1}\rangle)^{T}\left(\int\langle{\mbi y}\rangle\;\mu_{{\mmb\theta}_{n-1}}(d{\mbi y})-h(\langle{\mmb\theta}_{n-1}\rangle)\right)\cr &\quad+C^{\prime}{\BBE}[\vert\langle{\mmb\theta}_{n}\rangle-\langle{\mmb\theta}_{n-1}\rangle\vert^{2}\vert{\cal F}_{n-1}].}View Source
\eqalignno{&{\BBE}[V(\langle{\mmb\theta}_{n}\rangle)\vert{\cal F}_{n-1}]\leq V(\langle{\mmb\theta}_{n-1}\rangle)\cr &\quad+\gamma_{n}\nabla V(\langle{\mmb\theta}_{n-1}\rangle)^{T}\left(\int\langle{\mbi y}\rangle\;\mu_{{\mmb\theta}_{n-1}}(d{\mbi y})-h(\langle{\mmb\theta}_{n-1}\rangle)\right)\cr &\quad+C^{\prime}{\BBE}[\vert\langle{\mmb\theta}_{n}\rangle-\langle{\mmb\theta}_{n-1}\rangle\vert^{2}\vert{\cal F}_{n-1}].} Using successively the Conditions ST2 and Lemma 4, we have the estimate \eqalignno{&\nabla V(\langle{\mmb\theta}_{n-1}\rangle)^{T}\left(\int\langle{\mbi y}\rangle\;\mu_{{\mmb\theta}_{n-1}}(d{\mbi y})-h(\langle{\mmb\theta}_{n-1}\rangle)\right)\cr &\quad\leq\vert\nabla V(\langle{\mmb\theta}_{n-1}\rangle)\vert\,C_{2}\vert{\mmb\theta}_{\bot,n-1}\vert\cr &\quad\leq\sqrt{C}C_{2}\sqrt{V(\langle{\mmb\theta}_{n-1}\rangle)}\,\vert{\mmb\theta}_{\bot,n-1}\vert.}View Source
\eqalignno{&\nabla V(\langle{\mmb\theta}_{n-1}\rangle)^{T}\left(\int\langle{\mbi y}\rangle\;\mu_{{\mmb\theta}_{n-1}}(d{\mbi y})-h(\langle{\mmb\theta}_{n-1}\rangle)\right)\cr &\quad\leq\vert\nabla V(\langle{\mmb\theta}_{n-1}\rangle)\vert\,C_{2}\vert{\mmb\theta}_{\bot,n-1}\vert\cr &\quad\leq\sqrt{C}C_{2}\sqrt{V(\langle{\mmb\theta}_{n-1}\rangle)}\,\vert{\mmb\theta}_{\bot,n-1}\vert.} Using Cauchy–Schwartz inequality, the expectation of the above quantity is no larger than \sqrt{C}C_{2}\sqrt{u_{n-1}v_{n-1}}. We obtain \displaylines{v_{n}\leq v_{n-1}+\gamma_{n}\sqrt{C}C_{2}\sqrt{u_{n-1}(1+u_{n-1}+v_{n-1})}\hfill\cr \hfill+C^{\prime}{\BBE}[\vert\langle{\mmb\theta}_{n}\rangle-\langle{\mmb\theta}_{n-1}\rangle\vert^{2}],\quad{\hbox{(38)}}}View Source
\displaylines{v_{n}\leq v_{n-1}+\gamma_{n}\sqrt{C}C_{2}\sqrt{u_{n-1}(1+u_{n-1}+v_{n-1})}\hfill\cr \hfill+C^{\prime}{\BBE}[\vert\langle{\mmb\theta}_{n}\rangle-\langle{\mmb\theta}_{n-1}\rangle\vert^{2}],\quad{\hbox{(38)}}} where we used the fact that u_{n-1}\geq 0. We now need to find an estimate for {\BBE}[\vert\langle{\mmb\theta}_{n}\rangle-\langle{\mmb\theta}_{n-1}\rangle\vert^{2}]. Using Minkowski's inequality on (36), \eqalignno{\quad&{\BBE}[\vert\langle{\mmb\theta}_{n}\rangle-\langle{\mmb\theta}_{n-1}\rangle\vert^{2}]^{1/2}\cr &\quad\leq{\BBE}\left[\left\vert\left({{{\bf 1}^{T}W_{n}-{\bf 1}^{T}}\over{N}}\otimes I_{d}\right){\mmb\theta}_{\bot,n-1}\right\vert^{2}\right]^{1/2}\cr &\qquad+{\BBE}\left[\left\vert\left({{{\bf 1}^{T}W_{n}}\over{N}}\otimes I_{d}\right)\gamma_{n}{\mbi Y}_{n}\right\vert^{2}\right]^{1/2}.&{\hbox{(39)}}}View Source
\eqalignno{\quad&{\BBE}[\vert\langle{\mmb\theta}_{n}\rangle-\langle{\mmb\theta}_{n-1}\rangle\vert^{2}]^{1/2}\cr &\quad\leq{\BBE}\left[\left\vert\left({{{\bf 1}^{T}W_{n}-{\bf 1}^{T}}\over{N}}\otimes I_{d}\right){\mmb\theta}_{\bot,n-1}\right\vert^{2}\right]^{1/2}\cr &\qquad+{\BBE}\left[\left\vert\left({{{\bf 1}^{T}W_{n}}\over{N}}\otimes I_{d}\right)\gamma_{n}{\mbi Y}_{n}\right\vert^{2}\right]^{1/2}.&{\hbox{(39)}}} Focus on the first term of the RHS of the above inequality. Remark that \displaylines{{\BBE}[(W_{n}^{T}{\bf 1}-{\bf 1})({\bf 1}^{T}W_{n}-{\bf 1}^{T})\vert{\cal F}_{n-1}]\hfill\cr \hfill={\BBE}[W_{n}^{T}{\bf 1}{\bf 1}^{T}W_{n}]-{\bf 1}{\bf 1}^{T},}View Source
\displaylines{{\BBE}[(W_{n}^{T}{\bf 1}-{\bf 1})({\bf 1}^{T}W_{n}-{\bf 1}^{T})\vert{\cal F}_{n-1}]\hfill\cr \hfill={\BBE}[W_{n}^{T}{\bf 1}{\bf 1}^{T}W_{n}]-{\bf 1}{\bf 1}^{T},} where we used the Assumption 1b along with the fact that {\BBE}(W_{n}) is doubly stochastic (see the condition 1a)). Upon noting that the entries of W_{n} are in [{0,1}] (as a consequence of Assumption 1a, the spectral norm of {\BBE}[W_{n}^{T}{\bf 1}{\bf 1}^{T}W_{n}]-{\bf 1}{\bf 1}^{T} is bounded. Thus, there exists a constant C^{\prime} such that {\BBE}\left[\left\vert\left({{{\bf 1}^{T}W_{n}-{\bf 1}^{T}}\over{N}}\otimes I_{d}\right){\mmb\theta}_{\bot,n-1}\right\vert^{2}\right]\leq C^{\prime}u_{n-1}.View Source
{\BBE}\left[\left\vert\left({{{\bf 1}^{T}W_{n}-{\bf 1}^{T}}\over{N}}\otimes I_{d}\right){\mmb\theta}_{\bot,n-1}\right\vert^{2}\right]\leq C^{\prime}u_{n-1}. By similar arguments, there exists a constant C^{\prime\prime} such that \eqalignno{{\BBE}\left[\left\vert\left({{{\bf 1}^{T}W_{n}}\over{N}}\otimes I_{d}\right)\gamma_{n}{\mbi Y}_{n}\right\vert^{2}\right]\leq &\, C^{\prime\prime}\gamma_{n}^{2}\,{\BBE}\vert{\mbi Y}_{n}\vert^{2}\cr \leq &\, C_{2}C^{\prime\prime}\gamma_{n}^{2}\,\left(1+u_{n-1}+v_{n-1}\right)}View Source
\eqalignno{{\BBE}\left[\left\vert\left({{{\bf 1}^{T}W_{n}}\over{N}}\otimes I_{d}\right)\gamma_{n}{\mbi Y}_{n}\right\vert^{2}\right]\leq &\, C^{\prime\prime}\gamma_{n}^{2}\,{\BBE}\vert{\mbi Y}_{n}\vert^{2}\cr \leq &\, C_{2}C^{\prime\prime}\gamma_{n}^{2}\,\left(1+u_{n-1}+v_{n-1}\right)} where we used Assumption ST2. Putting this together with (39), \eqalignno{&{\BBE}[\vert\langle{\mmb\theta}_{n}\rangle-\langle{\mmb\theta}_{n-1}\rangle\vert^{2}]\cr &\quad\leq (\sqrt{C^{\prime}}\sqrt{u_{n-1}}+\gamma_{n}\sqrt{C_{2}C^{\prime\prime}}\,\sqrt{1+u_{n-1}+v_{n-1}})^{2}\cr &\quad\leq C(u_{n-1}+\gamma_{n}^{2}\,(1+u_{n-1}+v_{n-1})\cr &\quad+\gamma_{n}\sqrt{u_{n-1}(1+u_{n-1}+v_{n-1})})}View Source
\eqalignno{&{\BBE}[\vert\langle{\mmb\theta}_{n}\rangle-\langle{\mmb\theta}_{n-1}\rangle\vert^{2}]\cr &\quad\leq (\sqrt{C^{\prime}}\sqrt{u_{n-1}}+\gamma_{n}\sqrt{C_{2}C^{\prime\prime}}\,\sqrt{1+u_{n-1}+v_{n-1}})^{2}\cr &\quad\leq C(u_{n-1}+\gamma_{n}^{2}\,(1+u_{n-1}+v_{n-1})\cr &\quad+\gamma_{n}\sqrt{u_{n-1}(1+u_{n-1}+v_{n-1})})} where C>0 is some constant chosen large enough. Plugging the above inequality into (38), \eqalignno{v_{n}\leq&\, v_{n-1}+(C^{\prime}C) u_{n-1}\cr &+(\sqrt{C}C_{2}+C^{\prime}C)\gamma_{n}\sqrt{u_{n-1}(1+u_{n-1}+v_{n-1})}\cr &+C^{\prime}C\gamma_{n}^{2}\,(1+u_{n-1}+v_{n-1}).}View Source
\eqalignno{v_{n}\leq&\, v_{n-1}+(C^{\prime}C) u_{n-1}\cr &+(\sqrt{C}C_{2}+C^{\prime}C)\gamma_{n}\sqrt{u_{n-1}(1+u_{n-1}+v_{n-1})}\cr &+C^{\prime}C\gamma_{n}^{2}\,(1+u_{n-1}+v_{n-1}).} This proves that (23) holds for any M chosen large enough.
Proof of \sup_{n}{\BBE}\left[\vert{\mbi Y}_{n}\vert^{2}\right]<\infty
By Assumptions 1b and ST2: \displaylines{{\BBE}\left[\left\vert{\mbi Y}_{n}\right\vert^{2}\right]={\BBE}\left[{\BBE}_{{\mmb\theta}_{n-1}}\left[\left\vert{\mbi Y}\right\vert^{2}\right]\right]\hfill\cr \hfill\leq C_{2}\left(1+{\BBE}\left[V(\langle{\mmb\theta}_{n-1}\rangle)\right]+{\BBE}\left[\left\vert{\mmb\theta}_{\bot,n-1}\right\vert^{2}\right]\right).\quad{\hbox{(40)}}}View Source
\displaylines{{\BBE}\left[\left\vert{\mbi Y}_{n}\right\vert^{2}\right]={\BBE}\left[{\BBE}_{{\mmb\theta}_{n-1}}\left[\left\vert{\mbi Y}\right\vert^{2}\right]\right]\hfill\cr \hfill\leq C_{2}\left(1+{\BBE}\left[V(\langle{\mmb\theta}_{n-1}\rangle)\right]+{\BBE}\left[\left\vert{\mmb\theta}_{\bot,n-1}\right\vert^{2}\right]\right).\quad{\hbox{(40)}}} The proof follows since \sup_{n}{\BBE}\left[V(\langle{\mmb\theta}_{n}\rangle)\right]<\infty and {\BBE}\left[\left\vert{\mmb\theta}_{\bot,n}\right\vert^{2}\right]\leq\sum_{n}{\BBE}\left[\left\vert{\mmb\theta}_{\bot,n}\right\vert^{2}\right]<\infty. \hfill\blacksquare
Lemma 6
Let us consider Assumptions 1a, 1b, 2, 3a–3e, and 5. Assume in addition ST1 and ST2. Then, {\BBP}\left\{\limsup_{n}\vert\langle{\mmb\theta}_{n}\rangle\vert<\infty\right\}=1.
Proof
The sequence (\langle{\mmb\theta}_{n}\rangle)_{n\geq 1} satisfies the (10). The proof is an application of [22, Th. 2.2.]: in order to apply this theorem, we only have to prove that with probability 1 (i) the sequence (\langle{\mmb\theta}_{n}\rangle)_{n\geq 1} is infinitely often in a level set \{V\leq M\} i.e., {\BBP}\left\{\liminf_{n}V(\langle{\mmb\theta}_{n}\rangle)<\infty\right\}=1 and (ii) \sum_{n}\gamma_{n}\left((W_{n}\otimes I_{d})({\mbi Y}_{n}+\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1})-h(\langle{\mmb\theta}_{n-1}\rangle)\right)<\infty.View Source
\sum_{n}\gamma_{n}\left((W_{n}\otimes I_{d})({\mbi Y}_{n}+\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1})-h(\langle{\mmb\theta}_{n-1}\rangle)\right)<\infty. For the recurrence property, we have \eqalignno{{\BBE}\left(\liminf_{n}V(\langle{\mmb\theta}_{n}\rangle)\right)\leq&\,\liminf_{n}{\BBE}\left(V(\langle{\mmb\theta}_{n}\rangle)\right)\cr \leq&\,\sup_{n}{\BBE}\left(V(\langle{\mmb\theta}_{n}\rangle)\right).}View Source
\eqalignno{{\BBE}\left(\liminf_{n}V(\langle{\mmb\theta}_{n}\rangle)\right)\leq&\,\liminf_{n}{\BBE}\left(V(\langle{\mmb\theta}_{n}\rangle)\right)\cr \leq&\,\sup_{n}{\BBE}\left(V(\langle{\mmb\theta}_{n}\rangle)\right).} By Lemma 5, the RHS is finite thus showing that {\BBP}\left\{\liminf_{n}V(\langle{\mmb\theta}_{n}\rangle)<\infty\right\}=1. For the second property, we write \langle(W_{n}\otimes I_{d})({\mbi Y}_{n}+\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1})\rangle-h(\langle{\mmb\theta}_{n-1}\rangle)=e_{n}+\xi_{n-1} where \eqalignno{e_{n}:=&\,\langle (W_{n}\otimes I_{d})({\mbi Y}_{n}+\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1})\rangle-\int\langle{\mbi y}\rangle\mu_{{\mmb\theta}_{n-1}}(d{\mbi y})\cr \xi_{n-1}:=&\,\int\langle{\mbi y}\rangle\mu_{{\mmb\theta}_{n-1}}(d{\mbi y})-\int\langle{\mbi y}\rangle\mu_{{\bf 1}\otimes\langle{\mmb\theta}_{n-1}\rangle}(d{\mbi y}).}View Source
\eqalignno{e_{n}:=&\,\langle (W_{n}\otimes I_{d})({\mbi Y}_{n}+\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1})\rangle-\int\langle{\mbi y}\rangle\mu_{{\mmb\theta}_{n-1}}(d{\mbi y})\cr \xi_{n-1}:=&\,\int\langle{\mbi y}\rangle\mu_{{\mmb\theta}_{n-1}}(d{\mbi y})-\int\langle{\mbi y}\rangle\mu_{{\bf 1}\otimes\langle{\mmb\theta}_{n-1}\rangle}(d{\mbi y}).} By Assumption ST2 and the inequality 2 a b\leq a^{2}+b^{2}, there exists a constant C such that {\BBE}\left\vert\sum_{n\geq 1}\gamma_{n}\xi_{n-1}\right\vert\leq C \left(\sum_{n\geq 1}\gamma_{n}^{2}+\sum_{n\geq1}{\BBE}\left\vert{\mmb\theta}_{\bot,n-1}\right\vert^{2}\right).\eqno{\hbox{(41)}}View Source
{\BBE}\left\vert\sum_{n\geq 1}\gamma_{n}\xi_{n-1}\right\vert\leq C \left(\sum_{n\geq 1}\gamma_{n}^{2}+\sum_{n\geq1}{\BBE}\left\vert{\mmb\theta}_{\bot,n-1}\right\vert^{2}\right).\eqno{\hbox{(41)}} Therefore, the RHS in (41) is finite under the condition 2 and Lemma 5, thus implying that \sum_{n\geq 1}\gamma_{n}\xi_{n} converges w.p.1. Since {\BBE}\left[e_{n}\,\vert{\cal F}_{n-1}\right]=0, the sequence \left(S_{n}:=\sum_{k=1}^{n}\gamma_{k}e_{k}\right)_{n\geq 1} is a martingale. We prove that it converges almost surely by estimating its second-order moment. For any k\geq 1,\eqalignno{&{\BBE}\left[\vert S_{k}\vert^{2}\right]\leq \sum_{n\geq 1}\gamma_{n}^{2}\,{\BBE}\left[\left\vert e_{n}\right\vert^{2}\right]\cr &\quad\leq\sum_{n\geq 1}\gamma_{n}^{2}\,{\BBE}\left[({\mbi Y}_{n}+\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1})^{T}P_{n}({\mbi Y}_{n}+\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1})\right]}View Source
\eqalignno{&{\BBE}\left[\vert S_{k}\vert^{2}\right]\leq \sum_{n\geq 1}\gamma_{n}^{2}\,{\BBE}\left[\left\vert e_{n}\right\vert^{2}\right]\cr &\quad\leq\sum_{n\geq 1}\gamma_{n}^{2}\,{\BBE}\left[({\mbi Y}_{n}+\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1})^{T}P_{n}({\mbi Y}_{n}+\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1})\right]} where we set P_{n}:=N^{-2}W_{n}^{T}{\bf 1}{\bf 1}^{T}W_{n}\otimes I_{d}. Note that P_{n} is independent of Y_{n} conditionally to {\cal F}_{n-1}. Since W_{n} is a stochastic matrix, its spectral norm is bounded uniformly in n. Therefore, there exists a constant C>0 such that\eqalignno{{\BBE}\left[\vert S_{n}\vert^{2}\right]\leq &\,C\sum_{n\geq1}\gamma_{n}^{2}\,{\BBE}\left[\left\vert{\mbi Y}_{n}+\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1}\right\vert^{2}\right]\cr \leq &\,2C\sum_{n\geq1}\gamma_{n}^{2}\,{\BBE}\left[\vert{\mbi Y}_{n}\vert^{2}\right]+2C\sum_{n\geq 1}{\BBE}\left[\vert{\mmb\theta}_{\bot,n-1}\vert^{2}\right].}View Source
\eqalignno{{\BBE}\left[\vert S_{n}\vert^{2}\right]\leq &\,C\sum_{n\geq1}\gamma_{n}^{2}\,{\BBE}\left[\left\vert{\mbi Y}_{n}+\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1}\right\vert^{2}\right]\cr \leq &\,2C\sum_{n\geq1}\gamma_{n}^{2}\,{\BBE}\left[\vert{\mbi Y}_{n}\vert^{2}\right]+2C\sum_{n\geq 1}{\BBE}\left[\vert{\mmb\theta}_{\bot,n-1}\vert^{2}\right].} By Lemma 5 and Assumption 2 it follows that \sup_{n}{\BBE}\left[\vert S_{n}\vert^{2}\right] is finite thus implying that the martingale (S_{n})_{n\geq 1} converges almost surely to a r.v. which is finite w.p.1. (see e.g., [34, Corollary 2.2.]). This concludes the proof. \hfill\blacksquare
SECTION D.
Proof of Theorem 4
Set V_{n}:=(I_{N}-{\bf 1}{\bf 1}^{T}/N) W_{n} and for any 1\leq k\leq n, \Phi_{n,k}:=(V_{n}\otimes I_{d})(V_{n-1}\otimes I_{d})\cdots(V_{k}\otimes I_{d}).\eqno{\hbox{(42)}}View Source
\Phi_{n,k}:=(V_{n}\otimes I_{d})(V_{n-1}\otimes I_{d})\cdots(V_{k}\otimes I_{d}).\eqno{\hbox{(42)}} Note that by Assumptions 1b–1c, \eqalignno{\Vert\Phi_{n,k}X\Vert_{2}^{2}=&\,{\BBE}[X^{T}\Phi_{n-1,k}^{T}(V_{n}^{T}V_{n}\otimes I_{d})\Phi_{n-1,k}X]\cr =&\,{\BBE}[X^{T}\Phi_{n-1,k}^{T}{\BBE}(V_{n}^{T}V_{n}\otimes I_{d})\Phi_{n-1,k}X]\cr \leq &\,\rho {\BBE}[X^{T}\Phi_{n-1,k}^{T}\Phi_{n-1,k}X]=\rho\Vert\Phi_{n-1,k}X\Vert_{2}^{2}.\,\,\,\,\quad&{\hbox{(43)}}}View Source
\eqalignno{\Vert\Phi_{n,k}X\Vert_{2}^{2}=&\,{\BBE}[X^{T}\Phi_{n-1,k}^{T}(V_{n}^{T}V_{n}\otimes I_{d})\Phi_{n-1,k}X]\cr =&\,{\BBE}[X^{T}\Phi_{n-1,k}^{T}{\BBE}(V_{n}^{T}V_{n}\otimes I_{d})\Phi_{n-1,k}X]\cr \leq &\,\rho {\BBE}[X^{T}\Phi_{n-1,k}^{T}\Phi_{n-1,k}X]=\rho\Vert\Phi_{n-1,k}X\Vert_{2}^{2}.\,\,\,\,\quad&{\hbox{(43)}}} From (4) and since {J_{\bot}}(W_{n}\otimes I_{d})={J_{\bot}}(W_{n}\otimes I_{d}){J_{\bot}}=(V_{n}\otimes I_{d}){J_{\bot}} by Assumption 1a, it holds for any n\geq 1, {\mmb\theta}_{\bot,n}=(V_{n}\otimes I_{d})({\mmb\theta}_{\bot,n-1}+\gamma_{n}{\mbi Y}_{\bot,n}). By induction, {\mmb\theta}_{\bot,n}=\sum_{k=1}^{n}\gamma_{k}\Phi_{n,k}{\mbi Y}_{\bot,k}+\Phi_{n,1}\mmb\theta_{\bot,0}\eqno{\hbox{(44)}}View Source
{\mmb\theta}_{\bot,n}=\sum_{k=1}^{n}\gamma_{k}\Phi_{n,k}{\mbi Y}_{\bot,k}+\Phi_{n,1}\mmb\theta_{\bot,0}\eqno{\hbox{(44)}} where \Phi_{n,k} is defined by (42). By (43) and Assumption 1c, the second term in the RHS of (44) is a {\cal O}_{L^{2}}(\rho^{n/2}). We now consider the first term in the RHS of (44). Using Minkowski's inequality and (43) \eqalignno{&\Vert\sum_{k=1}^{n}\gamma_{k}\Phi_{n,k}{\mbi Y}_{\bot,k}{\bf 1}_{\sup_{\ell\leq n-1}\vert\mmb\theta_{\ell}\vert\leq M}\Vert_{2}\cr &\quad\leq\sum_{k=1}^{n}\gamma_{k}\Vert\Phi_{n,k}{\mbi Y}_{\!\!\!\bot,k}{\bf 1}_{\sup_{\ell\leq n-1}\vert\mmb\theta_{\ell}\vert\leq M}\Vert_{2}\cr &\quad\leq\sum_{k=1}^{n}\gamma_{k}\sqrt{\rho}^{n-k+1}\Vert{\mbi Y}_{\!\!\!\bot,k}{\bf 1}_{\sup_{\ell\leq k-1}\vert\mmb\theta_{\ell}\vert\leq M}\Vert_{2}.}View Source
\eqalignno{&\Vert\sum_{k=1}^{n}\gamma_{k}\Phi_{n,k}{\mbi Y}_{\bot,k}{\bf 1}_{\sup_{\ell\leq n-1}\vert\mmb\theta_{\ell}\vert\leq M}\Vert_{2}\cr &\quad\leq\sum_{k=1}^{n}\gamma_{k}\Vert\Phi_{n,k}{\mbi Y}_{\!\!\!\bot,k}{\bf 1}_{\sup_{\ell\leq n-1}\vert\mmb\theta_{\ell}\vert\leq M}\Vert_{2}\cr &\quad\leq\sum_{k=1}^{n}\gamma_{k}\sqrt{\rho}^{n-k+1}\Vert{\mbi Y}_{\!\!\!\bot,k}{\bf 1}_{\sup_{\ell\leq k-1}\vert\mmb\theta_{\ell}\vert\leq M}\Vert_{2}.} By [35, Result 178, p. 38], the RHS is upper bounded by \limsup_{n\to\infty}\Vert{\mbi Y}_{\!\!\!\bot,n}{\bf 1}_{\vert\mmb\theta_{n-1}\vert\leq M}\Vert_{2}\rho(1-\sqrt{\rho})^{-1}. Under Assumption 4a, this upper bound is finite (the proof follows the same lines as in the proof of Lemma 2 and is omitted). This concludes the proof.
SECTION E.
Proof of Theorem 5
Assumption 2 implies that \lim_{n}\rho^{n/2}\gamma_{n}^{-2}=0. Upon noting that {\BBP}\left\{\bigcup_{M}\{\sup_{n}\vert{\mmb\theta}_{n}\vert\leq M\}\vert\lim_{q}\mmb\theta_{q}={\bf 1}\otimes\theta_{\star}\right\}=1,View Source
{\BBP}\left\{\bigcup_{M}\{\sup_{n}\vert{\mmb\theta}_{n}\vert\leq M\}\vert\lim_{q}\mmb\theta_{q}={\bf 1}\otimes\theta_{\star}\right\}=1, Theorem 4 implies that the sequence of r.v. (\gamma_{n}^{-1/2}{\mmb\theta}_{\bot,n})_{n} converges in probability to zero under the conditional probability {\BBP}\left\{\cdot\vert\lim_{q}\mmb\theta_{q}={\bf 1}\otimes\theta_{\star}\right\}. Since {\mmb\theta}_{n}={\bf 1}\otimes\langle{\mmb\theta}_{n}\rangle+{\mmb\theta}_{\bot,n}, it remains to prove that the sequence of r.v. (\gamma_{n}^{-1/2}(\langle{\mmb\theta}_{n}\rangle-\theta_{\star}))_{n\geq 0} converges in distribution to Z under the conditional distribution given the event \{\lim_{q}\theta_{q}={\bf 1}\otimes\theta_{\star}\}. To that goal, we write \langle{\mmb\theta}_{n}\rangle=\langle{\mmb\theta}_{n-1}\rangle+\gamma_{n}h\left(\langle{\mmb\theta}_{n-1}\rangle\right)+\gamma_{n}e_{n}+\gamma_{n}\xi_{n}View Source
\langle{\mmb\theta}_{n}\rangle=\langle{\mmb\theta}_{n-1}\rangle+\gamma_{n}h\left(\langle{\mmb\theta}_{n-1}\rangle\right)+\gamma_{n}e_{n}+\gamma_{n}\xi_{n} where \xi_{n}:=\int\langle{\mbi y}\rangle\mu_{{\mmb\theta}_{n-1}}(d{\mbi y})-\int\langle{\mbi y}\rangle\mu_{{\bf 1}\otimes\langle{\mmb\theta}_{n-1}\rangle}(d{\mbi y}) and \displaylines{e_{n}:=\langle (W_{n}\otimes I_{d})({\mbi Y}_{n}+\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1})\rangle\hfill\cr \hfill-\int\langle{\mbi y}\rangle\mu_{{\mmb\theta}_{n-1}}(d{\mbi y})=\langle{\mbi Y}_{n}\rangle-\int\langle{\mbi y}\rangle\mu_{{\mmb\theta}_{n-1}}(d{\mbi y}),}View Source
\displaylines{e_{n}:=\langle (W_{n}\otimes I_{d})({\mbi Y}_{n}+\gamma_{n}^{-1}{\mmb\theta}_{\bot,n-1})\rangle\hfill\cr \hfill-\int\langle{\mbi y}\rangle\mu_{{\mmb\theta}_{n-1}}(d{\mbi y})=\langle{\mbi Y}_{n}\rangle-\int\langle{\mbi y}\rangle\mu_{{\mmb\theta}_{n-1}}(d{\mbi y}),} since {\bf 1}^{T}W_{n}={\bf 1}^{T}. We then check the conditions C1 to C4 of [23, Th. 1] (see also [24, Th. 1]). Under the Assumptions 6 and 8a, the conditions C1 and C4 of [23, Th. 1] are satisfied. We now prove C2b: there exists a constant C such that \eqalignno{&{\BBE}\left[\vert e_{n+1}\vert^{2+\tau}{\bf 1}_{\vert{\mmb\theta}_{n}-{\bf 1}\otimes\theta_{\star}\vert\leq\delta}\right]\cr &\quad\leq C {\BBE}\left[\vert\int\langle{\mbi y}\rangle\mu_{{\mmb\theta}_{n}}(d{\mbi y})\vert^{2+\tau}{\bf 1}_{\vert{\mmb\theta}_{n}-{\bf 1}\otimes\theta_{\star}\vert\leq\delta}\right]\cr &\qquad+C {\BBE}\left[\vert\langle{\mbi Y}_{n+1}\rangle\vert^{2+\tau}{\bf 1}_{\vert{\mmb\theta}_{n}-{\bf 1}\otimes\theta_{\star}\vert\leq\delta}\right]\cr &\quad\leq 2C\sup_{\vert{\mmb\theta}-{\bf 1}\otimes\theta_{\star}\vert\leq\delta}\int\vert\langle{\mbi y}\rangle\vert^{2+\tau}\mu_{\mmb\theta}(d{\mbi y})}View Source
\eqalignno{&{\BBE}\left[\vert e_{n+1}\vert^{2+\tau}{\bf 1}_{\vert{\mmb\theta}_{n}-{\bf 1}\otimes\theta_{\star}\vert\leq\delta}\right]\cr &\quad\leq C {\BBE}\left[\vert\int\langle{\mbi y}\rangle\mu_{{\mmb\theta}_{n}}(d{\mbi y})\vert^{2+\tau}{\bf 1}_{\vert{\mmb\theta}_{n}-{\bf 1}\otimes\theta_{\star}\vert\leq\delta}\right]\cr &\qquad+C {\BBE}\left[\vert\langle{\mbi Y}_{n+1}\rangle\vert^{2+\tau}{\bf 1}_{\vert{\mmb\theta}_{n}-{\bf 1}\otimes\theta_{\star}\vert\leq\delta}\right]\cr &\quad\leq 2C\sup_{\vert{\mmb\theta}-{\bf 1}\otimes\theta_{\star}\vert\leq\delta}\int\vert\langle{\mbi y}\rangle\vert^{2+\tau}\mu_{\mmb\theta}(d{\mbi y})} and the RHS is finite under Assumption 7. For C2c, we have \displaylines{{\BBE}\left[e_{n+1}e_{n+1}^{T}\vert{\cal F}_{n}\right]\hfill\cr \hfill=\!\left\{\!\int\!\!\langle{\mbi y}\rangle\langle{\mbi y}\rangle^{T}\!\mu_{{\mmb\theta}_{n}}\!(d{\mbi y})\!-\!\left(\int\!\!\langle{\mbi y}\rangle\mu_{{\mmb\theta}_{n}}(d{\mbi y})\!\right)\left(\int\!\!\langle{\mbi y}\rangle\mu_{{\mmb\theta}_{n}}(d{\mbi y})\!\right)^{\!\!T}\!\right\}\!.}View Source
\displaylines{{\BBE}\left[e_{n+1}e_{n+1}^{T}\vert{\cal F}_{n}\right]\hfill\cr \hfill=\!\left\{\!\int\!\!\langle{\mbi y}\rangle\langle{\mbi y}\rangle^{T}\!\mu_{{\mmb\theta}_{n}}\!(d{\mbi y})\!-\!\left(\int\!\!\langle{\mbi y}\rangle\mu_{{\mmb\theta}_{n}}(d{\mbi y})\!\right)\left(\int\!\!\langle{\mbi y}\rangle\mu_{{\mmb\theta}_{n}}(d{\mbi y})\!\right)^{\!\!T}\!\right\}\!.} By Assumption 7, this term converges w.p.1 to \Upsilon on the set \{\lim_{k}{\mmb\theta}_{k}={\bf 1}\otimes\theta_{\star}\}. This concludes the proof of C2.
We now consider the condition C3 of [23] with r_{n}=\xi_{n}+e_{n}{\bf 1}_{\vert{\mmb\theta}_{n-1}-{\bf 1}\otimes\theta_{\star}\vert>\delta}: we prove that for any M>0, \gamma_{n}^{-1/2}r_{n}{\bf 1}_{\sup_{k}\vert{\mmb\theta}_{k}\vert\leq M}{\bf 1}_{\lim_{k}{\mmb\theta}_{k}={\bf 1}\otimes\theta_{\star}}={\cal O}_{w.p.1}o_{L^{1}}(1). By Assumption 4b, there exists a constant C such that \displaylines{\gamma_{n}^{-1/2}{\BBE}\left[\vert\xi_{n}\vert{\bf 1}_{\lim_{k}{\mmb\theta}_{k}={\bf 1}\otimes\theta_{\star}}{\bf 1}_{\sup_{k}\vert{\mmb\theta}_{k}\vert\leq M}\right]\hfill\cr \hfill\leq C \left(\gamma_{n}^{-1}{\BBE}\left[\vert{\mmb\theta}_{\bot,n}\vert^{2}{\bf 1}_{\sup_{k}\vert{\mmb\theta}_{k}\vert\leq M}\right]\right)^{1/2}}View Source
\displaylines{\gamma_{n}^{-1/2}{\BBE}\left[\vert\xi_{n}\vert{\bf 1}_{\lim_{k}{\mmb\theta}_{k}={\bf 1}\otimes\theta_{\star}}{\bf 1}_{\sup_{k}\vert{\mmb\theta}_{k}\vert\leq M}\right]\hfill\cr \hfill\leq C \left(\gamma_{n}^{-1}{\BBE}\left[\vert{\mmb\theta}_{\bot,n}\vert^{2}{\bf 1}_{\sup_{k}\vert{\mmb\theta}_{k}\vert\leq M}\right]\right)^{1/2}} and the RHS tends to zero as n\to\infty by Theorem 4. On the set \{\lim_{n}{\mmb\theta}_{n}={\bf 1}\otimes\theta_{\star}\}, the r.v. e_{n}{\bf 1}_{\vert{\mmb\theta}_{n-1}-{\bf 1}\otimes\theta_{\star}\vert>\delta} is null for all large n. This concludes the proof of the condition C3 of [23], and the proof of Theorem 5.
SECTION F.
Proof of Theorem 6
We preface the proof by a preliminary result, established by [23, Th. 2] (see also [21] for a similar result obtained under stronger assumptions).
Theorem 7
Let (\gamma_{n})_{n} be a deterministic positive sequence such that \log (\gamma_{k}/\gamma_{k+1})=o(\gamma_{k}) and satisfying Assumptions 8b and 8c. Consider the random sequence (u_{n})_{n} given by u_{n+1}=u_{n}+\gamma_{n+1}h(u_{n})+\gamma_{n+1}e_{n+1}+\gamma_{n+1}\xi_{n+1},\quad u_{0}\in{\BBR}^{d},View Source
u_{n+1}=u_{n}+\gamma_{n+1}h(u_{n})+\gamma_{n+1}e_{n+1}+\gamma_{n+1}\xi_{n+1},\quad u_{0}\in{\BBR}^{d}, where
AVER1. u_{\star} is a zero of the mean field: h(u_{\star})=0. The mean field h:{\BBR}^{d}\to{\BBR}^{d} is twice continuously differentiable (in a neighborhood of u_{\star}) and \nabla h(u_{\star}) is a Hurwitz matrix.
AVER2.
(e_{n})_{n\geq 1} is a {\cal F}_{n}-adapted martingale-increment sequence.
For any M>0, there exist \tau>0s.t. \sup_{k}{\BBE}\left[\vert e_{k}\vert^{2+\tau}{\bf 1}_{\sup_{\ell\leq k-1}\vert u_{\ell}-u_{\star}\vert\leq M}\right]<\infty.
There exists a positive definite (random) matrix U_{\star} such that on the set \{\lim_{q}u_{q}=u_{\star}\}, \lim_{k}{\BBE}\left[e_{k}e_{k}^{T}\vert{\cal F}_{k-1}\right]=U_{\star} almost surely.
AVER3. (\xi_{n})_{n\geq 1} is a {\cal F}_{n}-adapted sequence s.t.
\gamma_{n}^{-1/2} \vert\xi_{n}\vert{\bf 1}_{\lim_{q}u_{q}=u_{\star}}{\bf 1}_{\sup_{n}\vert u_{n}\vert\leq M}={\cal O}_{w.p.1}(1){\cal O}_{L^{2}}(1) for any M>0.
n^{-1/2}\sum_{k=0}^{n}\xi_{k+1}{\bf 1}_{\lim_{q}u_{q}=u_{\star}} converges to zero in probability.
Then, for any
t\in{\BBR}^{d},
\displaylines{\qquad\lim_{n}{\BBE}\left[{\bf 1}_{\lim_{q}u_{q}=\theta_{\star}}\;\exp\left(i\sqrt{n}\;t^{T}\left({{1}\over{n}}\sum_{k=1}^{n}u_{k}-u_{\star}\right)\right)\right]\hfill\cr \hfill={\BBE}\left[{\bf 1}_{\lim_{q}u_{q}=u_{\star}}\;\exp\left(-{{1}\over{2}}t^{T}\nabla h(u_{\star})^{-1}\;U_{\star}\;\nabla h(u_{\star})^{-T}t\right)\right].}View Source
\displaylines{\qquad\lim_{n}{\BBE}\left[{\bf 1}_{\lim_{q}u_{q}=\theta_{\star}}\;\exp\left(i\sqrt{n}\;t^{T}\left({{1}\over{n}}\sum_{k=1}^{n}u_{k}-u_{\star}\right)\right)\right]\hfill\cr \hfill={\BBE}\left[{\bf 1}_{\lim_{q}u_{q}=u_{\star}}\;\exp\left(-{{1}\over{2}}t^{T}\nabla h(u_{\star})^{-1}\;U_{\star}\;\nabla h(u_{\star})^{-T}t\right)\right].}Proof of Theorem 6
By Theorem 4 and Assumption 8c, \sqrt{N}^{-1}\sum_{n=1}^{N}{\mmb\theta}_{\bot,n}{\bf 1}_{\sup_{\ell}\vert\mmb\theta_{\ell}\vert\leq M} converges in L^{2} to zero for any M>0. Since {\mmb\theta}_{n}={\mmb\theta}_{\bot,n}+{\bf 1}\otimes\langle{\mmb\theta}_{n}\rangle, we now prove a CLT for the averaged sequence N^{-1}\sum_{n=1}^{N}\langle{\mmb\theta}_{n}\rangle. To that goal, we check the Assumptions AVER 1 to AVER 3 of Theorem 7 with u_{n}=\langle{\mmb\theta}_{n}\rangle; e_{n}, \xi_{n} defined as in the Proof of Theorem 5. AVER 1 and AVER 2 can be proved along the same lines as in the proof of Theorem 5; details are omitted. Finally, by Assumption 4b and Theorem 4, {\BBE}\left[\vert\xi_{n}\vert^{2}{\bf 1}_{\lim_{k}{\mmb\theta}_{k}={\bf 1}\otimes\theta_{\star}}{\bf 1}_{\sup_{\ell\leq n-1}\vert\mmb\theta_{\ell}\vert\leq M}\right]={\cal O}(\gamma_{n}^{2}); and\displaylines{\ell^{-1/2}\;\sum_{n=1}^{\ell}{\BBE}\left[\vert\xi_{n}\vert{\bf 1}_{\lim_{k}{\mmb\theta}_{k}={\bf 1}\otimes\theta_{\star}}{\bf 1}_{\sup_{\ell\leq n-1}\vert\mmb\theta_{\ell}\vert\leq M}\right]\hfill\cr \hfill\leq C\;\ell^{-1/2}\;\sum_{n=1}^{\ell}\gamma_{n}.}View Source
\displaylines{\ell^{-1/2}\;\sum_{n=1}^{\ell}{\BBE}\left[\vert\xi_{n}\vert{\bf 1}_{\lim_{k}{\mmb\theta}_{k}={\bf 1}\otimes\theta_{\star}}{\bf 1}_{\sup_{\ell\leq n-1}\vert\mmb\theta_{\ell}\vert\leq M}\right]\hfill\cr \hfill\leq C\;\ell^{-1/2}\;\sum_{n=1}^{\ell}\gamma_{n}.} The RHS tends to zero under Assumption 8c thus showing AVER 3.