Introduction
Collaborative in-network processing is a major tenet of wireless sensor networking, and has received much attention from the signal processing, control, and information theory communities during the past decade [1]. Early research in this area considered applications such as detection, classification, tracking, and pursuit [2]–[5]. By exploiting local computation resources at each node, it is possible to reduce the amount of data that needs to be transmitted out of the network, thereby saving bandwidth and energy, extending the network lifetime, and reducing latency.
In addition to having on-board sensing and processing capabilities, the archetypal sensor network node is battery powered and uses a wireless radio to communicate with the rest of the network. Since each wireless transmission consumes bandwidth and, on common platforms, also consumes considerably more energy than processing data locally [6], [7], reducing the amount of data transmitted can significantly prolong battery life. In applications where the phenomenon being sensed varies slowly in space, the measurements at nearby sensors will be highly correlated. In-network processing can compress the data to avoid wasting transmissions on redundant information. In other applications, rather than collecting data from each node, the goal of the system may be to compute a function of the data such as estimating parameters, fitting a model, or detecting an event. In-network processing can be used to carry out the computation within the network so that, instead of transmitting raw data to a fusion center, only the results of the computation are transmitted to the end user. In many situations, in-network computation leads to considerable energy savings over the centralized approach [8], [9].
Many previous approaches to in-network processing assume that the network can provide specialized routing services. For example, some schemes require the existence of a cyclic route through the network that passes through every node precisely one time1 [9]–[11]. Others are based on forming a spanning tree rooted at the fusion center or information sink, and then aggregating data up the tree [8], [12], [13]. Although using a fixed routing scheme is intuitive, there are many drawbacks to this approach in wireless networking scenarios. Aggregating data towards a fusion center at the root of a tree can cause a bottleneck in communications near the root and creates a single point of failure. Moreover, wireless links are unreliable, and in dynamic environments, a significant amount of undesirable overhead traffic may be generated just to establish and maintain routes.
A. Gossip Algorithms for In-Network Processing
This paper presents an overview of gossip algorithms and issues related to their use for in-network processing in wireless sensor networks. Gossip algorithms have been widely studied in the computer science community for information dissemination and search [14]–[16]. More recently, they have been developed and studied for information processing in sensor networks. They have the attractive property that no specialized routing is required. Each node begins with a subset of the data in the network. At each iteration, information is exchanged between a subset of nodes, and then this information is processed by the receiving nodes to compute a local update.
Gossip algorithms for in-network processing have primarily been studied as solutions to consensus problems, which capture the situation where a network of agents must achieve a consistent opinion through local information exchanges with their neighbors. Early work includes that of Tsitsiklis et al. [17], [18]. Consensus problems have arisen in numerous applications including: load balancing [19]; alignment, flocking, and multiagent collaboration [20], [21]; vehicle formation [22], tracking and data fusion [23], and distributed inference [24].
The canonical example of a gossip algorithm for information aggregation is a randomized protocol for distributed averaging. The problem setup is such that each node in an
Gossip algorithms can be classified as being randomized or deterministic. The scheme described above is randomized and asynchronous, since at each iteration a random pair of nodes is active. In deterministic, synchronous gossip algorithms, at each iteration, node
B. Paper Outline
Our overview of gossip algorithms begins on the theoretical side and progresses towards sensor network applications. Each gossip iteration requires wireless trans-mission and thus consumes valuable bandwidth and energy resources. Section II discusses techniques for bounding rates of convergence for gossip, and thus the number of transmissions required. Because standard pairwise gossip converges slowly on wireless network topologies, a large body of work has focused on developing faster gossip algorithms for wireless networks, and this work is also described. When transmitting over a wireless channel, one must also consider issues such as noise and coding. Section III discusses the effects of finite transmission rates and quantization on convergence of gossip algorithms. Finally, Section IV illustrates how gossip algorithms can be applied to accomplish distributed signal processing tasks such as distributed estimation and compression.
Rates of Convergence and Faster Gossip
Gossip algorithms are iterative, and the number of wireless messages transmitted is proportional to the number of iterations executed. Thus, it is important to characterize the rate of convergence of gossip and to understand what factors influence these rates. This section surveys convergence results, describing the connection between the rate of convergence and the underlying network topology, and then describes developments that have been made in the area of fast gossip algorithms for wireless sensor networks.
A. Analysis of Gossip Algorithms
In pairwise gossip, only two nodes exchange information at each iteration. More generally, a subset of nodes may average their information. Many of the gossip algorithms that we will be interested in can be described by an equation of the formx(t+1)=W(t)x(t)\eqno{\hbox{(1)}}
x_{i}(t+1)={1\over\vert S\vert}\sum_{i\in S}x_{i}(t),\qquad i\in S\eqno{\hbox{(2)}}
It is therefore easy to see that all such matrices will have the following properties: \cases{\vec{1}^{T}W(t)=\vec{1}^{T}\cr W(t)\vec{1}= \vec{1}}\eqno{\hbox{(3)}}
We are now ready to understand the evolution of the estimate vector x(t+1)=W(t)x(t)=\prod_{k=0}^{t}W(k)x(0).\eqno{\hbox{(4)}}
\prod_{k=0}^{t}W(k) \rightarrow{1\over n}\!\vec{\,1}\vec{1}^{\,T}.\eqno{\hbox{(5)}}
B. Expected Behavior
We start by looking at the expected evolution of the random vector \BBE x(t+1)=\BBE\left(\prod_{k=0}^{t}W(k)\right)x(0)=(\BBE W)^{t+1}x(0)\eqno{\hbox{(6)}}
C. Convergence Rate
The problem with the expectation analysis is that it gives no estimate on the rate of convergence, a key parameter for applications. Since the algorithms are randomized, we need to specify what we mean by convergence. One notion that yields clean theoretical results involves defining convergence as the first time where the normalized error is small with high probability, and controlling both error and probability with one parameter
Definition 1: [\epsilon -averaging time T_{\rm ave}(\epsilon) ]
Given T_{\rm ave}(\epsilon)\!=\!\sup_{x(0)} \arg \inf_{t=0,1,2\ldots}\!\left\{\!\BBP\!\left({\left\Vert x(t)-x_{\rm ave}\vec{1}\right\Vert\over\left\Vert x(0)\right \Vert}\!\geq\!\epsilon\right)\!\leq\!\epsilon\right\}.\eqno{\hbox{(7)}}
The key technical theorem used in the analysis of gossip algorithms is the following connection between the averaging time and the second largest eigenvalue of
Theorem 1
For any gossip algorithm that uses set-averaging matrices and converges in expectation, the averaging time is bounded byT_{\rm ave}(\epsilon,\BBE W)\leq{3\log \epsilon^{-1}\over\log\left({\displaystyle{1\over\lambda_{2}(\BBE W)}}\right)}\leq{3\log\epsilon^{-1}\over 1- \lambda_{2}(\BBE W)}.\eqno{\hbox{(8)}}
The topology of the network influences the convergence time of the gossip algorithm, and using this theorem, this is precisely quantified; the matrix
This was first analyzed for the complete graph4 and uniform pairwise gossiping [15], [25], [30]. For this case it was shown that
If the network topology is fixed, one can ask what is the selection of pairwise gossiping probabilities that maximizes the convergence rate (i.e., maximizes the spectral gap). This problem is equivalent to designing a Markov chain which approaches stationarity optimally fast and, interestingly, it can be formulated as a semidefinite program which can be solved efficiently [25], [26], [33]. Unfortunately, for random geometric graphs (RGGs) 5 and grids, which are the relevant topologies for large wireless ad hoc and sensor networks, even the optimized version of pairwise gossip is extremely wasteful in terms of communication requirements. For example, for a grid topology, the number of required messages scales like
D. Faster Gossip Algorithms
Pairwise gossip converges very slowly on grids and random geometric graphs because of its diffusive nature. Information from nodes is essentially performing random walks, and, as is well known, a random walk on the 2-D lattice has to perform
Li and Dai [36] recently proposed location-aided distributed averaging (LADA), a scheme that uses partial locations and Markov chain lifting to create fast gossiping algorithms. Lifting of gossip algorithms is based on the seminal work of Diaconis et al. [37] and Chen et al. [38] on lifting Markov chain samplers to accelerate convergence rates. The basic idea is to lift the original chain to one with additional states; in the context of gossiping, this corresponds to replicating each node and associating all replicas of a node with the original. LADA creates one replica of a node for each neighbor and associates the policy of a node given it receives a message from the neighbor with that particular lifted state. In this manner, LADA suppresses the diffusive nature of reversible Markov chains that causes pairwise randomized gossip to be slow. The cluster-based LADA algorithm performs slightly better than geographic gossip, requiring
Just as algorithms based on lifting incorporate additional memory at each node (by way of additional states in the lifted Markov chain), another collection of algorithms seek to accelerate gossip computations by having nodes remember a few previous state values and incorporate these values into the updates at each iteration. These memory-based schemes can be viewed as predicting the trajectory as seen by each node, and using this prediction to accelerate convergence. The schemes are closely related to shift-register methods studied in numerical analysis to accelerate linear system solvers. The challenge of this approach is to design local predictors that provide speedups without creating instabilities. Empirical evidence that such schemes can accelerate convergence rates is shown in [43], and numerical methods for designing linear prediction filters are presented in [44] and [45]. Recent work of Oreshkin et al. [46] shows that improvements in convergence rate on par with geographic gossip are achieved by a deterministic, synchronous gossip algorithm using only one extra tap of memory at each node. Extending these theoretical results to asynchronous gossip algorithms remains an open area of research.
The geographic gossip algorithm uses location information to route packets on long paths in the network. One natural extension of the algorithm is to allow all the nodes on the routed path to be averaged jointly. This can be easily performed by aggregating the sum and the hop length while routing. As long as the information of the average can be routed back on the same path, all the intermediate nodes can replace their estimates with updated value. This modified algorithm is called geographic gossip with path averaging. It was recently shown [47] that this algorithm converges much faster, requiring only
A related distributed algorithm was introduced by Savas et al. [48], using multiple random walks that merge in the network. The proposed algorithm does not require any location information and uses the minimal number of messages
Finally, we note the recent development of schemes that exploit the broadcast nature of wireless communications in order to accelerate gossip rates of convergence [49]–[51], either by having all neighbors that overhear a transmission execute a local update, or by having nodes eavesdrop on their neighbors' communication and then using this information to strategically select which neighbor to gossip with next. The next section discusses issues arising when gossiping specifically over wireless networks.
Rate Limitations in Gossip Algorithms
Rate limitations are relevant due to the bandwidth restrictions and the power limitations of nodes. Finite transmission rates imply that nodes learn of their neighbors' states with finite precision; if the distortion is measured by the mean squared error, then it is well established that the operational distortion rate function is exponentially decaying with the number of bits [52], which implies that the precision doubles for each additional bit of representation. For example, in an additive white Gaussian noise (AWGN) channel with path loss inversely proportional to the distance squared
Before summarizing the key findings of selected literature on the subject of average consensus under communication constraints, we explain why some papers care about this issue and some do not.
A. Are Rate Constraints Significant?
In most sensor network architectures today, the overhead of packet headers and reliable communication is so great that using a few bytes to encode the gossip state variables exchanged leads to negligible additional cost while practically giving a precision that can be seen as infinite. Moreover, we can ignore bit errors in transmissions, which very rarely go undetected thanks to cyclic redundancy check bits. It is natural to ask: why should one bother studying rate constraints at all?
One should bother because existing sensor network modems are optimized to transmit long messages, infrequently, to nearby neighbors, in order to promote spatial bandwidth reuse, and were not designed with decentralized iterative computation in mind. Transmission rates are calculated amortizing the overhead of establishing the link over the duration of very long transmission sessions.
Optimally encoding for computation in general (and for gossiping in particular) is an open problem; very few have treated the subject of communication for computation in an information theoretic sense (see, e.g., [53] and [54]) and consensus gossiping is nearly absent in the landscape of network information theory. This is not an accident. Broken up in parts, consensus gossip contains the elements of complex classical problems in information theory, such as multiterminal source coding, the two-way channel, the feedback channel, the multiple access of correlated sources, and the relay channel [55]; this is a frightening collection of open questions. However, as the number of possible applications of consensus gossip primitives expands, designing source and channel encoders to solve precisely this class of problems more efficiently, even though perhaps not optimally, is a worthy task. Desired features are efficiency in exchanging frequently, and possibly in an optimal order, few correlated bits, and exchanging with nodes that are (at least occasionally) very far, to promote rapid diffusion. Such forms of communications are very important in sensor networks and network control.
Even if fundamental limits are hard to derive, there are several heuristics that have been applied to the problem to yield some achievable bound. Numerous papers have studied the effects of intermittent or lossy links in the context of gossip algorithms [independent identically distributed (i.i.d.) and correlated models, symmetric and asymmetric] [56]–[65]. In these models, lossy links correspond to masking some edges from the topology at each iteration, and, as we have seen above, the topology directly affects the convergence rate. Interestingly, a common thread running through all of the work in this area is that, so long as the network remains connected on average, convergence of gossip algorithms is not affected by lossy or intermittent links, and convergence speeds degrade gracefully.
Another aspect that has been widely studied is that of source coding for average consensus and is the one that we consider next in Section III-B. It is fair to say that, particularly in wireless networks, the problem of channel coding is essentially open, as we will discuss in Section III-C.
B. Quantized Consensus
Quantization maps the state variable exchanged R_{\rm tot}^{\infty}=\sum_{t=1}^{\infty}R_{t,{\rm tot}}=\sum_{t=1}^{\infty}\sum_{j=1}^{n}R_{t,j}.\eqno{\hbox{(9)}}
The first simple question is: for a fixed uniform quantizer with step-size \mathhat{x}_{j}(t)={\rm uni}_{\Delta}\left(x_{j}(t)\right)={\rm arg}\!\min_{q\in{\cal Q}}\left\vert x_{j}(t)-q\right \vert
Quantized consensus over a random geometric graph with
Kashyap et al. [66] first considered a fixed code quantized consensus algorithm, which preserves the network average at every iteration. In their paper, the authors draw an analogy between quantization and load balancing among processors, which naturally comes with an integer constraint since the total number of tasks is finite and divisible only by integers (see, e.g., [19], [67], and [68]). Distributed policies to attain a balance among loads were previously proposed in [69] and [70]. Assuming that the average can be written as
To overcome the fact that not all nodes end up having the same quantized value, a simple variant on the quantized consensus problem that guarantees almost sure convergence to a unique consensus point was proposed concurrently in [74] and [75]. The basic idea is to dither the state variables by adding a uniform random variable
Carli et al. [77] noticed that the issue of quantizing for consensus averaging has analogies with the problem of stabilizing a system using quantized feedback [78], which amounts to partitioning the state space into sets whose points can be mapped to an identical feedback control signal. Hence, the authors resorted to control theoretic tools to infer effective strategies for quantization. In particular, instead of using a static mapping, they model the quantizer \xi_{i}(t)=\left(\mathhat{x}_{-1,i}(t),f_{i}(t) \right).\eqno{\hbox{(10)}}
\eqalignno{\mathhat{x}_{i}(t)=&\, \mathhat{x}_{-1,i}(t+1)=\mathhat{x}_{-1,i}(t)+f_{i}(t)q_{i}(t)&\hbox{(11)}\cr q_{i}(t)=&\,{\rm uni}_{\Delta} \left({x_{i}(t)-\mathhat{x}_{-1,i}(t)\over f_{i}(t)}\right)&\hbox{(12)}}
f_{i}(t+1)=\cases{k_{\rm in}f_{i}(t),&if $\left\vert q_{i}(t)\right\vert\ <\ 1$\cr k_{\rm out}f_{i}(t),&if $\left\vert q_{i}(t)\right\vert=1$}\eqno{\hbox{(13)}}
\eqalignno{\mathhat{x}_{i}(t)=&\,\xi_{i}(t+1)=\xi_{i}(t)+q_{i}(t)&\hbox{(14)}\cr q_{i}(t)=&\,{\log}_{\delta}\left(x_{i}(t)-\xi_{i}(t)\right)&\hbox{(15)}}
\eqalignno{q_{i}(t)=&\,{\rm sign}\left(x_{i}(t)-\xi_{i}(t)\right)\left({1+\delta\over 1-\delta}\right)^{\ell_{i}(t)}\cr\ell_{i}(t):{1\over 1-\delta}\leq&\,\left\vert x_{i}(t)-\xi_{i}(t)\right\vert\left({1+\delta\over 1-\delta}\right)^{-\ell_{i}(t)}\cr\leq&\,{1\over 1+\delta}.&\hbox{(16)}}
The vast signal processing literature on sampling and quantization can obviously be applied to the consensus problem as well to find heuristics. It is not hard to recognize that the quantizers analyzed in [77] are equivalent to predictive quantizers. Noting that the states are both temporally and spatially correlated, it is clear that encoding using the side information that is available at both transmitter and receiver can yield improved performance and lower cost; this is the tenet of the work in [79] and [80], which analyzed a more general class of quantizers. They can be captured in a similar framework as that of [77] by adding an auxiliary state variable
Recent work has begun to investigate information-theoretic performance bounds for gossip. These bounds characterize the rate-distortion tradeoff either i) as a function of the underlying network topology assuming that each link has a finite capacity [83], or ii) as a function of the rate of the information source providing new measurements to each sensor [84].
C. Wireless Channel Coding for Average Consensus
Quantization provides a source code, but equally important is the channel code that is paired with it. First, using the separation of source and channel coding in wireless networks is not optimal in general. Second, and more intuitively, in a wireless network, there is a variety of rates that can be achieved with a variety of nodes under different traffic conditions. The two key elements that determine what communications can take place are scheduling and channel coding. Theoretically, there is no fixed-range communication; any range can be reached albeit with lower capacity. Also, there is no such thing as a collision; rather, there is a tradeoff between the rate that multiple users can simultaneously access the channel.
The computational codes proposed in [85] aim to strike a near-optimal tradeoff for each gossip iteration, by utilizing the additive noise multiple-access channel as a tool for directly computing the average of the neighborhood. The idea advocated by the authors echoes their previous work [54]: nodes send lattice codes that, when added through the channel, result in a lattice point that encodes a specific algebraic sum of the inputs. Owing to the algebraic structure of the channel codes and the linearity of the channel, each recipient decodes directly the linear combination of the neighbors' states, which provides a new estimate of the network average when added to the local state. The only drawbacks of this approach is that 1) it requires channel state information at the transmitter, and 2) only one recipient can be targeted at the time. The scenario considered is closer to that in [84], since a stream of data needs to be averaged, and a finite round is dedicated to each input. The key result proven is that the number of rounds of gossip grows as
Sensor Network Applications of Gossip
This section illustrates how gossip algorithms can be applied to solve representative problems in wireless sensor networks. Of course, gossip algorithms are not suited for all distributed signal processing tasks. They have proven useful, so far, for problems that involve computing functions that are linear combinations of data or statistics at each node. Two straightforward applications arise from distributed inference and distributed detection. When sensors make conditionally independent observations, the log-likelihood function conditioned on a hypothesis
Below we consider three additional example applications. Section IV-A describes a gossip algorithm for distributed linear parameter estimation that uses stochastic approximation to overcome quantization noise effects. Sections IV-B and IV-C illustrate how gossip can be used for distributed source localization and distributed compression, respectively. We also note that gossip algorithms have recently been applied to problems in camera networks for distributed pose estimation [89], [90].
A. Robust Gossip for Distributed Linear Parameter Estimation
The present section focuses on robust gossiping for distributed linear parameter estimation of a vector of parameters with low-dimensional observations at each sensor. We describe the common assumptions on sensing, the network topology, and the gossiping protocols. Although we focus on estimation, the formulation is quite general and applies to many inference problems, including distributed detection and distributed localization.
1. Sensing/Observation Model
Let z_{i}(t)=H_{i}\theta+w_{i}(t)\eqno{\hbox{(17)}}
\sum_{i=1}^{n}H_{i}^{T}H_{i}\eqno{\hbox{(18)}}
An equivalent formulation of the estimation problem in the setting considered above comes from the distributed least mean square (LMS) adaptive filtering framework [93]–[95]. The objective here is slightly different. While we are interested in consistent estimates of the entire parameter at each sensor, the LMS formulations require, in a distributed way, to adapt to the environment to produce a desired response at each sensor, and the observability issue is not of primary importance. A generic framework for distributed estimation, both in the static parameter case and when the parameter is nonstationary, is addressed in [96]. An important aspect of algorithm design in these cases is the choice of the intersensor weight sequence for fusing data or estimates. In the static parameter case, where the objective is to drive all the sensors to the true parameter value, the weight sequence necessarily decays over time to overcome the accumulation of observation and other forms of noises, whereas, in the dynamic parameter estimation case, it is required that the weight sequence remains bounded away from zero, so that the algorithm possesses tracking abilities. We direct the reader to the recent paper [97] for a discussion along these lines. In the dynamic case, we also suggest the significant literature on distributed Kalman filtering (see, e.g., [98]–[102] and the references therein), where the objective is not consensus seeking among the local estimates, but, in general, optimizing fusion strategies to minimize the mean squared error at each sensor.
It is important to note here that average consensus is a specific case of a distributed parameter estimation model, where each sensor initially takes a single measurement, and sensing of the field thereafter is not required for the duration of the gossip algorithm. Several distributed inference protocols (for example, [24], [103], and [104]) are based on this approach, where either the sensors take a single snapshot of the field at the start and then initiate distributed consensus protocols (or more generally distributed optimization, as in [104]) to fuse the initial estimates, or the observation rate of the sensors is assumed to be much slower than the intersensor communicate rate, thus permitting a separation of the two time scales.
2. Distributed Linear Parameter Estimation
We now briefly discuss distributed parameter estimation in the linear observation model (17). Starting from an initial deterministic estimate of the parameters (the initial states may be random, we assume deterministic for notational simplicity),
3. Stochastic Approximation Algorithm
Let \displaylines{x_{i}(t+1)\!=\!x_{i}(t)-\alpha(t)\!\!\left[b\!\sum_{j\in{\cal N}_{i}(t)}\!\! \left(x_{i}(t)\!-\!Q\!\left(x_{j}(t)\!+\!\nu_{ij}(t)\right)\right)\right.\hfill\cr\hfill\left.{\vphantom{b\!\sum_{j \in{\cal N}_{i}(t)}\!\!\left(x_{i}(t)\!-\!Q\!\left(x_{j}(t)\!+\!\nu_{ij}(t)\right)\right)}}-H_{i}^{T} \left(z_{i}(t)-H_{i}x_{i}(t)\right)\right].\quad\hbox{(19)}}
\alpha(t)\geq 0,\quad\sum_{t}\alpha(t)=\infty,\quad\sum_{t}\alpha^{2}(t)\ <\ \infty.\eqno{\hbox{(20)}}
The following result from [92] characterizes the desired statistical properties of the distributed parameter estimation algorithm just described. The flavor of these results is common to other stochastic approximation algorithms [105]. First, we have a law of large-numbers-like result that guarantees that the estimates at each node will converge to the true parameter estimates\BBP\left(\lim_{t \rightarrow\infty}x_{i}(t)=\theta,\forall i\right)=1.\eqno{\hbox{(21)}}
\alpha(t)={a\over t+1}\eqno{\hbox{(22)}}
\sqrt{t}\left(x(t)-\vec{1}\otimes\theta\right)
Performance analysis of the algorithm for an example network is illustrated in Fig. 4. An example network of G= \sum_{i}H_{i}^{T}H_{i}=I=G^{-1}.\eqno{\hbox{(23)}}
Illustration of distributed linear parameter estimation. (a) Example network deployment of 45 nodes. (b) Convergence of normalized estimation error at each sensor.
It is interesting to note that, although the individual sensors suffer from low-rank observations of the true parameter, by collaborating, each of them can reconstruct the true parameter value. The asymptotic normality shows that the estimation error at each sensor decays as
As noted before, the observation model need not be linear for distributed parameter estimation. In [92], a large class of nonlinear observation models were considered and a notion of distributed nonlinear observability called separably estimable observable models introduced. Under the separably estimable condition, there exist local transforms under which the updates can be made linear. However, such a state transformation induces different time scales on the consensus potential and the innovation update, giving the algorithm a mixed time-scale behavior (see [92] and [107] for details). This mixed time-scale behavior and the effect of biased perturbations leads to the inapplicability of standard stochastic approximation techniques.
B. Source Localization
A canonical problem, encompassing many of the challenges that commonly arise in wireless sensor network applications, is that of estimating the location of an energy-emitting source [1]. Patwari et al. [108] present an excellent overview of the many approaches that have been developed for this problem. The aim in this section is to illustrate how gossip algorithms can be used for source localization using received signal strength (RSS) measurements.
Let f_{i}={\alpha\over\Vert y_{i}-\theta\Vert^{\beta}}+w_{i}\eqno{\hbox{(24)}}
An alternative approach, using gossip algorithms [112], forms a location estimate \widehat{\theta}={\sum_{i=1}^{n}y_{i}K(f_{i})\over \sum_{i=1}^{n}K(f_{i})}\eqno{\hbox{(25)}}
\widehat{\theta}_{1}={\sum_{i=1}^{n}y_{i}1_{\left\{\Vert y_{i}-\theta\Vert\leq\gamma^{-1/\beta}\right\}}\over\sum_{i=1}^{n}1_{\left\{\Vert y_{i}-\theta\Vert\leq \gamma^{-1/\beta}\right\}}}\eqno{\hbox{(26)}}
Note that (25) is a ratio of linear functions of the measurements at each node. To compute (25), we run two parallel instances of gossip over the network, one each for the numerator and the denominator. If each node initializes
C. Distributed Compression and Field Estimation
Extracting information in an energy-efficient and communication-efficient manner is a fundamental challenge in wireless sensor network systems. In many cases, users are interested in gathering data to see an “image” of activity or sensed values over the entire region. Let
An alternative approach is based on linear transform coding, gossip algorithms, and compressive sensing. It has been observed that many natural signals are compressible under some linear transformation. That is, although
To formally capture the notion of compressibility using ideas from the theory of nonlinear approximation [118], we reorder the coefficients \left\vert\theta_{(1)}\right\vert\geq\left\vert \theta_{(2)}\right\vert\geq\left\vert\theta_{(3)}\right\vert\geq\cdots\geq\left\vert\theta_{(n)}\right\vert\eqno{\hbox{(27)}}
{1 \over n}\left\Vert f-f^{(m)}\right\Vert^{2}\leq Cm^{-2\alpha}\eqno{\hbox{(28)}}
Example illustrating compression of a smooth signal. (a) Original smooth signal which is sampled at 500 random node locations, and nodes are connected as in a random geometric graph. (b) The
Observe that each coefficient
We can avoid this issue by making use of the recent theory of compressive sensing [120]–[122], which says that one can recover sparse signals from a small collection of random linear combinations of the measurements. In the present setting, to implement the gathering of \min_{\theta}{\Vert\bar{x}-AT^{T}\theta \Vert}^{2}+\tau\sum_{i=1}^{n}\vert\theta_{i}\vert\eqno{\hbox{(29)}}
Conclusion and Future Directions
Because of their simplicity and robustness, gossip algorithms are an attractive approach to distributed in-network processing in wireless sensor networks, and this paper surveyed recent results in this area. A major concern in sensor networks revolves around conserving limited bandwidth and energy resources, and in the context of iterative gossip algorithms, this is directly related to the rate of convergence. One thread of the discussion covered fast gossiping in wireless network topologies. Another thread focused on understanding and designing for the effects of wireless transmission, including source and channel coding. Finally, we have illustrated how gossip algorithms can be used for a diverse range of tasks, including estimation and compression.
Currently, this research is branching into a number of directions. One area of active research is investigating gossip algorithms that go beyond computing linear functions and averages. Just as the average can be viewed as the minimizer of a quadratic cost function, researchers are studying what other classes of functions can be optimized within the gossip framework [127]. A related direction is investigating the connections between gossip algorithms and message-passing algorithms for distributed inference and information fusion, such as belief propagation [88], [128]. While it is clear that computing pairwise averages is similar to the sum–product algorithm for computing marginals of distributions, there is no explicit connection between these families of distributed algorithms. It would be interesting to demonstrate that pairwise gossip and its generalizations correspond to messages of the sum–product (or max–product) algorithm for an appropriate Markov random field. Such potentials would guarantee convergence (which is not guaranteed in general iterative message passing) and further establish explicit convergence and message scheduling results.
Another interesting research direction involves understanding the effects of intermittent links and dynamic topologies, and in particular the effects of node mobility. Early work [129] has analyzed i.i.d mobility models and shown that mobility can greatly benefit convergence under some conditions. Generalizing to more realistic mobility models seems to be a very interesting research direction that would also be relevant in practice since gossip algorithms are more useful in such dynamic environments.
Gossip algorithms are certainly relevant in other applications that arise in social networks and the interaction of mobile devices with social networks. Distributed inference and information fusion in such dynamic networked environments is certainly going to pose substantial challenges for future research.