Processing math: 19%

Distributed Split Computing System in Cooperative Internet of Things (IoT)

Publisher: IEEE
Open Access

Abstract:

The split computing approach, where the head and tail models are respectively distributed between the IoT device and cloud, suffers from high network latency especially when the cloud is located far from the IoT device. To mitigate this problem, we introduce a distributed split computing system (DSCS) where an IoT device (called split computing requester) broadcasts a split computing request to its neighboring IoT devices. After receiving the request, the neighboring IoT devices (i.e., requestees) distributively determine whether or not to accept the split computing request by taking into account the unnecessary energy consumption and computation time. To minimize energy consumption while maintaining a specified probability of on-time computing completion, we develop a constrained stochastic game model. Then, a best-response dynamics-based algorithm is used to obtain the Nash equilibrium. The evaluation results demonstrate that the DSCS consumes can reduce more than 20% energy consumption compared to a probabilistic-based acceptance scheme, where the IoT devices accept a split computing request based on a predefined probability, while providing high on-time computing completion probability.
A distributed split computing system where IoT devices (i.e., requestees) distributively determine whether or not to accept the split computing request by considering the...
Published in: IEEE Access ( Volume: 11)
Page(s): 77669 - 77678
Date of Publication: 25 July 2023
Electronic ISSN: 2169-3536
Publisher: IEEE

Funding Agency:


SECTION I.

Introduction

Deep neural networks (DNNs) are currently the most frequently used machine learning approach in intelligent mobile applications and have grown more popular owing to their accurate and reliable inference capability [1]. Meanwhile, despite the recent improvements in the computing capabilities of IoT devices, their performances fall far short of that of cloud computing. Thus, when conducting inference for the entire DNN model, a sufficiently low latency cannot be achieved. In addition, the battery capacities of IoT devices have severe limitations, especially for inference with high complexity. Therefore, there is increasing interest in the split computing approach [2], [3]. In this approach, the DNN is split into two subnetworks (i.e., head and tail models), and the head and tail models are distributed between the IoT device and cloud, respectively. The IoT device first conducts inference of the head model to obtain intermediate data (i.e., output of the head model). It then sends this intermediate data to the cloud. Using the intermediate data as the input, the cloud processes the tail model sequentially. However, this split computing approach suffers from high network latency between the IoT device and cloud, especially when the cloud is located far from the IoT device [4], [5].

To mitigate this problem, we introduce a distributed split computing system (DSCS); here, an IoT device simply determines whether to use the split computing approach and the splitting point by considering its available computing power as well as computing deadline. If the IoT device decides to use the split computing approach, it conducts inference of the head model based on the splitting point. Then, the IoT device (which is the split computing requester) broadcasts a split computing request that includes the splitting point, intermediate data, computing latency of the head model, and computing deadline to its neighboring IoT devices. After receiving the request, the neighboring IoT devices (i.e., requestees) distributively determine whether or not to accept the split computing request by taking into account the unnecessary energy consumption and computation time. Because the total number of IoT devices (i.e., requestees) accepting the split computing request affects the amount of energy consumed and the probability of completing the computations on time, each IoT device should consider the actions of its neighboring IoT devices. In this context, we formulate a constrained stochastic game model and utilize a best-response dynamics-based algorithm to obtain the multipolicy constrained Nash equilibrium with minimized energy consumption while maintaining desirable on-time computing completion probability. The evaluation results show that the DSCS consumes can reduce more than 20% energy consumption compared to a probabilistic-based acceptance scheme, where the IoT devices (i.e., requestees) accept a split computing request based on a predefined probability, while providing high on-time computing completion probability. Moreover, it is found that the best-response dynamics-based algorithm converges quickly to the Nash equilibrium within a few iterations.

The main contributions of this study are as follows: 1) the proposed system is a pioneering effort in which the actions of the split computing requestees are distributively decided to optimize the performance of the split computing system; 2) the optimal policy of the requestees regarding acceptance of the computing request can be obtained in a few iterations, indicating that the proposed algorithm can be implemented in actual systems without significant signaling cost; 3) we show and scrutinize the evaluation results under various conditions to provide guidance for constructing a DSCS.

The remainder of this manuscript is structured as follows. The related works are detailed in Section II, and the proposed DSCS is described in Section III. The stochastic game model development is detailed in Section IV. The evaluations are reviewed in Section V, and the final conclusions are summarized in Section VI.

SECTION II.

Related Work

Many reported studies have investigated the possibility of lowering task completion times in split computing environments [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18].

Kang et al. [6] created an automated two-step model splitting technique. In the first step, performance prediction models are created for each layer. In the second step, the splitting point is dynamically determined from the prediction models by considering the importance of performance metrics. Li et al. [7] suggested a model splitting framework that considers early exit and allows the inference task to be undertaken at an appropriate intermediary layer; they decided the exit and splitting points concurrently to maximize the accuracy of inference while ensuring that the task completion time remained below a specified threshold. Laskaridis et al. [8] presented a system that continually monitors the resources of the edge cloud and mobile device to decide the splitting point by considering application requirements. Krouka et al. [9] introduced a technique involving pruning and compression before splitting the DNN model to minimize energy consumption by the mobile device while assuring correctness of inference. Yan et al. [10] jointly optimized placing and splitting of the model to minimize energy consumption and reduce task completion time while accounting for the network dynamics. Eshratifar et al. [11] determined several optimal splitting points by converting the presented model into a well-known one to exploit existing algorithms. Zhou et al. [12] suggested a strategy to minimize task completion time by pruning the model and compressing the intermediate data. He et al. [13] exploited a queuing model for task completion time to formulate a joint optimization problem regarding the splitting point and resource allocation, which can be divided into subproblems; moreover, they designed a heuristic algorithm to solve the subproblems sequentially. Tang et al. [14] designed an algorithm that uses the structural characteristics of the model splitting problem to obtain its solution in polynomial time. Wang and Zhang [15] designed a split computing architecture that exploits the error-tolerant characteristics of the intermediate data to reduce the communication overhead; in this architecture, the controller decides if retransmission is needed depending on the error rate. Wang et al. [16] proposed a multiple-splitting-points decision system that determines several optimal splitting points in real time with low signaling overhead. Matsubara et al. [17] suggested a supervised compression method that discretizes the intermediate data to avoid high communication overhead. Ahn et al. [18] introduced a system in which the DNN model is partitioned and deployed between the IoT device and cloud to improve inference accuracy and reduce task completion time. In [19] and [20], to minimize system energy consumption, authors developed a distributed DNN computing system orchestrating cooperative inference among multiple IoT devices by considering available computing power and network condition of IoT devices. Zhang et al. [21] introduced a collaborative and adaptive inference system that can handle various types of DNN models and optimize the tradeoff between the computation and synchronization. In [22] and [23], authors introduced a method dynamically detecting the best splitting point for a given DNN based on the communication channel state, batch size, and multiclass categorization. In [24], authors proposed a novel framework for the split computing, in which a round-robin schedule to select a device and Hungarian optimization algorithm to assign a layer to the device are exploited.

However, there are no existing works for optimizing the split computing performance from the perspective of the requestees in a distributed manner.

SECTION III.

Distributed Split Computing System

Figure 1 shows the proposed DSCS in which an IoT device (i.e., requester) generates the computing task periodically. During the computing task, the IoT device checks its available computing power and task deadline. If the available power is sufficient to complete the task within the deadline, the IoT device performs inference for the entire DNN model. Otherwise, the IoT device decides the splitting point according to its available computing power. Specifically, the entire DNN model is split at the m th splitting point, and m is obtained as \left \lceil{ { \alpha C }}\right \rceil where \alpha is a scale factor and C is the available computing power of the IoT device. Note that the scaling factor can be set to \frac {N_{L} }{{C^{\max } }} where N_{L} and C^{\max } denote the total number of layers and the maximum computing power of the IoT device, respectively. After deciding the splitting point, the IoT device conducts inference for the head model with the m th splitting point to obtain the intermediate data (i.e., result of the head model). Then, the IoT device broadcasts the split computing request message to its neighboring IoT devices using LoRa [25]1. This split computing request message includes the 1) splitting point, 2) intermediate data, 3) computing latency of the head model, and 4) task deadline.

FIGURE 1. - System model.
FIGURE 1.

System model.

After receiving the split computing request, the neighboring IoT devices (i.e., requestees) distributively determine whether to accept the split computing request by considering the unnecessary energy consumption and computation time. If the IoT device determines not to accept the request, it does nothing. Otherwise, it conducts inference of the tail model with the m th splitting point. At this point, the only neighboring IoT devices who can complete the task within its deadline accept the request. Therefore, even though the IoT device (i.e., requester) does not consider the computing power of the neighboring IoT devices (i.e., requestees), the on-time computing completion can be achieved. After completing inference, the IoT device returns the outcome of the tail model to the requester.

As the number of IoT devices accepting the request increases, the probability that at least one IoT device can complete the computations within the deadline (i.e., on-time computing completion probability) also increases. However, excessive duplicate acceptances increase the total amount of energy consumed. This means that each IoT device should consider the actions of others to achieve tradeoff between unnecessary energy consumption and on-time computing completion probability. Accordingly, we develop a constrained stochastic game model to minimize the energy consumed while ensuring that the on-time computing completion probability remains above a specified threshold; this model is explained in the following section.

SECTION IV.

Constrained Stochastic Game

In this section, we present the development of a constrained stochastic game model [27], [28] to accomplish distributed implementation of the split computing service. In the game, N_{I} players (i.e., N_{I} requestees) exist, and they have five tuples: 1) local state space; 2) local action space; 3) transition probability; 4) cost function; 5) constraint function. Table 1 summarizes some important notations.

TABLE 1 Summary of Notations
Table 1- 
Summary of Notations

A. State Space

Let {\mathbf {S_{i}}} be a finite local state space of player i (i.e., IoT device i or requestee i ). Then, the global state space {\mathbf {S}} can be represented as \prod \limits _{i} {{\mathbf {S_{i}}}} , where \prod is the Cartesian product. {\mathbf {S_{i}}} can be defined as \begin{equation*} {\mathbf {S}}_{\mathbf {i}} = {\mathbf { P}}_{\mathbf {i}} \times {\mathbf {C}}_{\mathbf {i}} \times {\mathbf {H}}_{\mathbf {i}} \times {\mathbf {E}}_{\mathbf {i}}, \tag{1}\end{equation*} View SourceRight-click on figure for MathML and additional features. where {\mathbf {P_{i}}} and {\mathbf {C_{i}}} are the state spaces of IoT device i for the splitting point and available computing power, respectively. In addition, {\mathbf {H_{i}}} and {\mathbf {E_{i}}} denote the state spaces of IoT device i for the channel gain and energy level, respectively. To denote an element of each local state space of the IoT device i , we use italic letters (i.e., P_{i} \in \mathbf {P_{i}} , C_{i} \in \mathbf {C_{i}} , H_{i} \in \mathbf {H_{i}} and E_{i} \in \mathbf {E_{i}} ).

Without loss of generality, we assume that the entire DNN model consists of M layers. Then, {\mathbf {P_{i}}} can be described as \begin{equation*} {\mathbf {P}}_{\mathbf {i}} = \left \{{ {1,2,\ldots,M} }\right \}, \tag{2}\end{equation*} View SourceRight-click on figure for MathML and additional features. where P_{i} = m denotes the m th splitting point.

When C^{\max } is the maximum computing power of the IoT device, {\mathbf {C_{i}}} can be represented by \begin{equation*} {\mathbf {C_{i}}} = \left \{{ {u_{C},2u_{C}, \ldots,C^{\max } } }\right \}, \tag{3}\end{equation*} View SourceRight-click on figure for MathML and additional features. where C_{i} and u_{C} are the computing power and unit computing power, respectively.

Since the channel is quantized to Q levels [29], {\mathbf {H_{i}}} can be represented by \begin{equation*} {\mathbf {H_{i}}} = \left \{{{h_{1}},{h_{2}},\ldots,{h_{Q}}}\right \}, \tag{4}\end{equation*} View SourceRight-click on figure for MathML and additional features. where H_{i} = h_{q} represents the q th quantized channel gain.

When E^{\max } denotes the maximum energy level of the IoT device, {\mathbf {E_{i}}} can be represented by \begin{equation*} {\mathbf {E_{i}}} = \left \{{ {0,1,2,\ldots,E^{\max } } }\right \}, \tag{5}\end{equation*} View SourceRight-click on figure for MathML and additional features. where E_{i} is the energy level of the IoT device.

B. Action Space

Let {\mathbf {A_{i}}} be a finite local action space of player i . The global action space {\mathbf {A}} is denoted by \prod \limits _{i} {{\mathbf {A_{i}}}} . Here, the IoT device can accept or reject the computing request. Therefore, {\mathbf {A_{i}}} can be represented by \begin{equation*} {\mathbf {A}}_{\mathbf {i}} = \left \{{ {0,1} }\right \}. \tag{6}\end{equation*} View SourceRight-click on figure for MathML and additional features. Here, for A_{i} \in {\mathbf {A_{i}}} , {A_{i}} = 1 denotes that IoT device i accepts the computing request and {A_{i}} = 0 means that IoT device i does not accept the computing request.

C. Transition Probability

Let P[S'_{i} |S_{i},A_{i}] be a transition probability from the current state S_{i} to the next state when the IoT device i performs the chosen action A_{i} . The energy of the IoT device i decreases when it accepts the request. In addition, depending on the splitting point, the energy consumed by IoT device i varies. That is, the transition of E_{i} is influenced by P_{i} and A_{i} . Meanwhile, the other states change independently. Thus, the transition probability from the current state S_{i} = [P_{i},C_{i},H_{i},E_{i}] to next state S'_{i} = [P'_{i},C'_{i},H'_{i},E'_{i}] can be represented as (7), shown at the bottom of the next page. \begin{equation*} P[S'_{i} |S_{i},A_{i}] = P[P'_{i} |P_{i}] \times P[C'_{i} |C_{i}] \times P[H'_{i} |H_{i}] \times P[E'_{i} |E_{i},P_{i},A_{i}]. \tag{7}\end{equation*} View SourceRight-click on figure for MathML and additional features.

According to the source condition, the harvesting energy volume changes. Therefore, it can be assumed that the harvested energy k during the decision epoch in the IoT device i follows a Poisson distribution with mean \lambda _{E}, P_{E} (\lambda _{E}, k) [30]. That is, the energy level of IoT device i can increase by k with probability P_{E} (\lambda _{E},k) when it does not accept the computing request (i.e., A_{i}=0 ) and its current energy level is less than the maximum energy level E^{\max } . Note that the IoT device i cannot harvest more energy if its current energy level is maximum. To summarize, the corresponding probabilities can be expressed as (8) and (9), shown at the bottom of the next page. \begin{align*} P[E'_{i} |E_{i} \ne E^{\max },P_{i},A_{i} = 0] &= \begin{cases} \displaystyle P_{E}(\lambda _{E},k), &\text {if} E'_{i} = E_{i} + k \\ \displaystyle 0, &\text {otherwise}. \end{cases} \tag{8}\\ P[E'_{i} |E_{i} = E^{\max },P_{i},A_{i} = 0] &= \begin{cases} \displaystyle 1, &\text {if} E'_{i} = E_{i} \\ \displaystyle 0, &\text {otherwise}. \end{cases} \tag{9}\end{align*} View SourceRight-click on figure for MathML and additional features.

When the IoT device i accepts the split computing request (i.e., A_{i} = 1 ), it consumes f_{E} \left ({{P_{i} } }\right) units of energy, where f_{E} \left ({{P_{i} } }\right) is a function that returns the energy consumption for the tail model with splitting point P_{i} . If the IoT device i has insufficient energy for the tail model (i.e., E_{i} < f_{E} \left ({{P_{i} } }\right) )2, then it does not consume any energy even though it accepts the split computing request. In addition, its energy can increase by k the probability P_{E} (\lambda _{E},k) . Therefore, P[E'_{i} |E_{i} \ge f_{E} \left ({{P_{i} } }\right),P_{i},A_{i} = 1] and P[E'_{i} |E_{i} < f_{E} \left ({{P_{i} } }\right),P_{i},A_{i} = 1] can be represented as (10) and (11), shown at the bottom of the next page, \begin{align*} P[E'_{i} |E_{i} \ge f_{E} \left ({{P_{i} } }\right),P_{i},A_{i} = 1] &= \begin{cases} \displaystyle P_{E}(\lambda _{E},k), &\text {if} E'_{i} = E_{i} + k - f_{E} \left ({{P_{i} } }\right) \\ \displaystyle 0, &\text {otherwise}. \end{cases} \tag{10}\\ P[E'_{i} |E_{i} < f_{E} \left ({{P_{i} } }\right),P_{i},A_{i} = 1] &= \begin{cases} \displaystyle P_{E}(\lambda _{E},k), &\text {if} E'_{i} = E_{i} + k \\ \displaystyle 0, &\text {otherwise}. \end{cases} \tag{11}\end{align*} View SourceRight-click on figure for MathML and additional features. respectively.

It is assumed that the available computing power of the requesting IoT device follows a discrete uniform distribution [31]. Then, since the requesting IoT device decides the splitting point according to its available computing power, the transition probability of P_{i} can be defined as \begin{align*} P[P'_{i} |P_{i}] = \begin{cases} \displaystyle \frac {1}{M}, &\text {if} P'_{i} \in \left \{{ {1, 2, \ldots, M} }\right \} \\ \displaystyle 0, &\text {otherwise}. \end{cases} \tag{12}\end{align*} View SourceRight-click on figure for MathML and additional features.

The available computing power of the IoT device i (i.e., C_{i} ) is assumed to follow a Poisson distribution with mean \lambda _{C}, P_{C}(\lambda _{C},k) , and P[C'_{i}|C_{i}] can be defined as follows:\begin{align*} P [C'_{i}|C_{i}]=\begin{cases} \displaystyle P_{C}(\lambda _{C},k), &\text {if} C'_{i}=k\\ \displaystyle 0,&\text {otherwise}. \end{cases} \tag{13}\end{align*} View SourceRight-click on figure for MathML and additional features.

In the channel gain with Q quantized levels [29], when the channel model follows Rayleigh fading with average channel gain \rho , P[H'_{i} |H_{i}] is defined as follows [32]:\begin{align*} P[H'_{i} |H_{i}] = \begin{cases} \displaystyle F(h_{q+1})-F(h_{q}), &\text {if} h_{q} \leq H'_{i} < h_{q+1} \\ \displaystyle 0, &\text {otherwise}, \end{cases} \tag{14}\end{align*} View SourceRight-click on figure for MathML and additional features. where F(x)=1-e^{-x/\rho } .

D. Cost Function

The energy consumption of the IoT device is exploited as the cost function r\left ({{S_{i},A_{i} } }\right) , which is defined as \begin{equation*} r\left ({{S_{i},A_{i} } }\right) = A_{i} f_{E} \left ({{P_{i} } }\right). \tag{15}\end{equation*} View SourceRight-click on figure for MathML and additional features.

E. Constraint Function

To provide high on-time computing completion probability, the constraint function c\left ({{S_{i},A_{i} } }\right) is defined. First, the computing completion time is derived as follows.

Because the size of output the DNN model output is generally much smaller than that of the intermediate data [33], the latency for result transmission from the IoT device i to split computing requester can be neglected. Then, the computing completion time can be calculated as the summation of the computational latency L^{H} of the head model, latency L^{D} for transmitting the intermediate data from split computing requester to IoT device i , and computing latency L^{T} of the tail model at IoT device i . Therefore, the computing completion time L can be represented by \begin{equation*} L = L^{H} + L_{i}^{D} + L_{i}^{T}. \tag{16}\end{equation*} View SourceRight-click on figure for MathML and additional features.

Note that the computing latency L^{H} of the head model is included in the split computing request message. Meanwhile, the transmission latency L^{D} of the intermediate data from the requester to IoT device i can be calculated as \begin{equation*} L_{i}^{D} = \frac {{f_{D} \left ({{P_{i} } }\right)}}{T_{i}^{R} }, \tag{17}\end{equation*} View SourceRight-click on figure for MathML and additional features. where f_{D} \left ({{P_{i} } }\right) is a function that returns the intermediate data size with the splitting point P_{i} . In addition, T_{i}^{R} denotes the effective transmission rate between the requester and IoT device i , which is obtained from the spreading factor chosen according to \gamma as [34] and [35]. Then \gamma is given by \begin{equation*} \gamma = \frac {{P_{T} \left |{ {H_{i} } }\right |^{2} }}{\sigma ^{2} }, \tag{18}\end{equation*} View SourceRight-click on figure for MathML and additional features. where P_{T} and \sigma ^{2} represent the transmission power and noise power, respectively.

The computing latency L^{T} of the tail model for the IoT device i can be calculated as \begin{equation*} L_{i}^{T} = \frac {{f_{F} \left ({{P_{i} } }\right)}}{C_{i} }, \tag{19}\end{equation*} View SourceRight-click on figure for MathML and additional features. where f_{F} \left ({{P_{i} } }\right) is a function that returns floating-point operations (FLOPS) of the tail model for the splitting point P_{i} .

The probability \lambda _{i} that the IoT device i can complete computations within the deadline D can be calculated as \begin{equation*} \lambda _{i} = P\left [{ {A_{i} = 1} }\right] \cdot P\left [{ {L^{H} + L_{i}^{D} + L_{i}^{T} < D} }\right], \tag{20}\end{equation*} View SourceRight-click on figure for MathML and additional features. where P\left [{ {A_{i} = 1} }\right] is the probability that IoT device i accepts the computing request with its current policy. Meanwhile, the probability \lambda _{A} that at least one IoT device can complete computing within the deadline is calculated as \lambda _{A} = 1 - \prod \limits _{i} {\left ({{1 - \lambda _{i} } }\right)} , which can be considered as c\left ({{S_{i},A_{i} } }\right) .

F. Optimization Formulation

Let \pi _{i} and \pi be a stationary policy of the IoT device i and the stationary multi-policy of all players, respectively. Then, the long-term average energy consumption of the IoT device i can be defined as \begin{equation*} \zeta _{E} \left ({\pi }\right) = \lim \limits _{T \to \infty } \frac {1}{T}\sum \limits _{t = 1}^{T} {E_{\pi} \left [{ {r\left ({{S^{t},A^{t} } }\right)} }\right]}, \tag{21}\end{equation*} View SourceRight-click on figure for MathML and additional features. where S^{t} and A^{t} are the global state and action at time t , respectively.

Meanwhile, IoT device i tries to maintain its on-time computing completion probability above a certain level. Thus, the constraint can be represented as \begin{equation*} \zeta _{C} \left ({\pi }\right) = \lim \limits _{T \to \infty } \frac {1}{T}\sum \limits _{t = 1}^{T} {E_{\pi} \left [{ {c\left ({{S^{t},A^{t} } }\right)} }\right]} \ge \theta _{D}, \tag{22}\end{equation*} View SourceRight-click on figure for MathML and additional features. where \theta _{D} is the target on-time computing completion probability.

The multi-policy \pi ^{*} = \left ({{\pi _{i}^{*},\pi _{ - i}^{*} } }\right) is the constraint Nash equilibrium if \zeta _{C} \left ({{\left ({{\pi _{i}^{*},\pi _{ - i}^{*} } }\right)} }\right) \le \zeta _{C} \left ({{\left ({{\pi _{i}^{},\pi _{ - i}^{*} } }\right)} }\right) for each IoT device among any \pi _{i}^{} such that \left ({{\pi _{i}^{},\pi _{ - i}^{*} } }\right) is feasible [36]. To achieve the multi-policy constrained Nash equilibrium, the best response policy \pi _{i}^{*} of the IoT device i given any stationary policies of the other IoT devices \pi _{-i} can be defined as \zeta _{C} \left ({{\left ({{\pi _{i}^{*},\pi _{ - i}^{} } }\right)} }\right) \le \zeta _{C} \left ({{\left ({{\pi _{i}^{},\pi _{ - i}^{} } }\right)} }\right) .

Based on the LP problem, we can obtain the best response policy of IoT device i . Let \phi _{i,\pi _{ - i} } \left ({{S_{i},A_{i} } }\right) denote the stationary probability for local state S_{i} and action A_{i} of the IoT device i when stationary policies of the other IoT devices \pi _{ - i} are given. Then, the solution of the LP problem \phi ^{*} _{i,\pi _{ - i} } \left ({{S_{i},A_{i} } }\right) can be matched to the best response policy of the constrained stochastic game.

To minimize the average energy consumption of the IoT device i , the objective function can be expressed as \begin{equation*} \min \limits _{\phi (S,A)} \sum \limits _{S} {\sum \limits _{A} {{\phi _{i,{\pi _{ - i}}}}\left ({{S_{i},{A_{i}}} }\right)r({S_{i}},{A_{i}})} }. \tag{23}\end{equation*} View SourceRight-click on figure for MathML and additional features.

Since the average on-time computing completion probability should be maintained above the target on-time computing completion probability \theta _{D} , we have \begin{equation*} \sum \limits _{S} {\sum \limits _{A} {{\phi _{i,{\pi _{ - i}}}}\left ({{S_{i},{A_{i}}} }\right)c({S_{i}},{A_{i}})} } \ge {\theta _{D}}. \tag{24}\end{equation*} View SourceRight-click on figure for MathML and additional features.

For the Chapman-Kolmogorov equation [37], we have \begin{align*} \sum \limits _{A} {{\phi _{i,{\pi _{ - i}}}}\left ({{S'_{i},{A_{i}}} }\right)} = \sum \limits _{S} {\sum \limits _{A} {{\phi _{i,{\pi _{ - i}}}}\left ({{S_{i},{A_{i}}} }\right)P[{S'_{i}}|{S_{i}},{A_{i}}]} }. \tag{25}\end{align*} View SourceRight-click on figure for MathML and additional features.

The fundamental properties of the probability are constrained by \begin{equation*} \sum \limits _{S} {\sum \limits _{A} {{\phi _{i,{\pi _{ - i}}}}\left ({{S_{i}',{A_{i}}} }\right)} } = 1 \tag{26}\end{equation*} View SourceRight-click on figure for MathML and additional features. and \begin{equation*} {\phi _{i,{\pi _{ - i}}}}\left ({{S_{i}',{A_{i}}} }\right) \ge 0. \tag{27}\end{equation*} View SourceRight-click on figure for MathML and additional features.

If a feasible solution exists for the LP problem, then the stationary best response policy of IoT device i is given by \begin{equation*} {\pi ^{*} _{i}}\left ({{S_{i},{A_{i}}} }\right) = \frac {{\phi ^{*} _{i,{\pi _{ - i}}}}\left ({{S_{i},{A_{i}}} }\right)} {\sum \limits _{A'_{i} \in \mathbf {A_{i}}}{\phi ^{*} _{i,{\pi _{ - i}}}}\left ({{S_{i},{A'_{i}}}}\right)}. \tag{28}\end{equation*} View SourceRight-click on figure for MathML and additional features.

To obtain the optimal policies of the IoT devices, we design a best response dynamics-based algorithm (see Algorithm 1). First, each IoT device generates its policy randomly (line 1 in Algorithm 1). Next, each IoT device i calculates the probability \lambda _{i} that it can complete computing within the deadline using (20) and transmits the probability to the other IoT devices (see lines 4–5 in Algorithm 1)3. After receiving the probability \lambda _{i} from all the IoT devices, each IoT device calculates \lambda _{A} that at least one the IoT device can complete computing within the deadline. Each IoT device then solves the LP problem to achieve the optimal policy \pi _{i}^{*} (line 7 in Algorithm 1). Once the policies of all the IoT devices converge, the algorithm is complete.

Algorithm 1 Best Response Dynamics-Based Algorithm

1:

Generate the random policy \pi _{i} for \forall i .

2:

repeat

3:

for each IoT device i do

4:

Calculate the probability \lambda _{i}

5:

Transmit the probability \lambda _{i} to other IoT devices

6:

Calculate the probability \lambda _{A}

7:

Solve the LP problem to obtain the optimal policy \pi _{i}^{*}

8:

end for

9:

until All policies of IoT devices are converged

The LP problem is generally solved with low complexity [38]. For example, Vaidya’s algorithm [38], which is one of the traditional methods of solving the LP problem, has a polynomial complexity of \mathcal {O}(\Omega ({\mathbf {S_{i}}})\cdot \Omega ({\mathbf {A_{i}}})^{3}) , where \mathcal {O}(\cdot) and \Omega (\cdot) represent the big O notation and number of elements in the given set, respectively. Moreover, since the stationary policies can be attained from a small number of iterations (as shown in Section V-A), we conclude that Algorithm 1 has a manageable overhead and can hence be utilized practically.

SECTION V.

Evaluation Results

To evaluate the performance of the proposed system, we built a simulation program using MATLAB. We simulated a cooperative IoT based system with N_{I}=4\sim 8 IoT devices. We assumed that the initial energies of all IoTs were fully charged, the target on-time computing completion probability was \theta _{D}=0.9 . For the effective transmission rate, we assumed that the transmission rate is zero when \gamma is under threshold \gamma _{th} , i.e., \gamma < \gamma _{th} . Additionally, P_{T}=1 and \sigma ^{2}=1 to observe the effects of the parameters. To obtain the realistic evaluation setup, we have measured the inference latency of VGG16 several times at Raspberry Pi 4B according to the splitting points. For example, the average inference latency using the whole model for all test data is 5.1 sec. Note that, since we measure the inference latency without any other background applications, the measured inference latency can be considered as the inference latency when the IoT device has the maximum computing power. In addition, it is assumed that the inference latency is inversely proportional to the available computing power. Meanwhile, it is assumed that the energy consumption of IoT device (i.e., requestee) is proportional to floating point operations (FLOPS) in each layer. FLOPS in each layer can be measured by python-papi package [39]. The energy consumption for the layer having the smallest FLOPS is normalized as 1. The other default parameter settings are as shown in Table 2.

TABLE 2 System Parameters
Table 2- 
System Parameters

For the performance evaluation, we compare the proposed system to five schemes: 1) ALWAYS, where the IoT devices always accept the computing request; 2) CE-BASED [29], where the IoT devices perform channel estimation with imperfect channel-state information; 3) RAND, where the IoT devices randomly accept the computing request; 4) P-BASED, where the IoT devices accept a computing request with probability p_{A} that is set to 0.7; and 5) CoEdge [19], where the IoT devices compute the evenly splitted the workload. For the fair comparison, IoT devices in all comparison schemes operate in the same channel. The average energy consumption and average on-time computing completion probability are used as the performance metrics.

A. Convergence to Nash Equilibrium

Figure 2 describes the process of convergence to the Nash equilibrium. Herein, the initial probabilities of accepting the computing requests of IoT devices 1, 2, 3, and 4 are set to 0.9, 0.5, 0.1, and 0.8, respectively. In addition, the computing power of IoT device 1 is set to 4, whereas those of IoT devices 2, 3, and 4 are all set to 5. As shown in Figure 2, the policies of IoT devices converge to the Nash equilibrium only after 3 iterations. This indicates that each IoT device needs to transmit its completion probability \lambda _{i} to the other devices only 2 times (see line 5 in Algorithm 1)4.

FIGURE 2. - Converging process to Nash equilibrium.
FIGURE 2.

Converging process to Nash equilibrium.

From Figure 2, it is evident that the action of an IoT device is based on the other IoT devices. For example, in this result, IoT device 1 has a lower computing power than the other devices; hence, its acceptance of the computing request is not helpful for increasing the on-time computing completion probability. In other words, the acceptance of the computing request by IoT device 1 can be considered as unnecessary energy consumption. In this situation, IoT device 1 does not probably accept the request (i.e., it has low probability of accepting the computing request). Considering this fact, the other IoT may devices accept the computing request with a greater probability.

B. Effect of N_{I}

Figures 3(a) and (b) show the effects of the number of IoT devices N_{I} on the average energy consumption \zeta _{E} and average on-time computing completion probability \zeta _{C} , respectively. From the figures, it is shown that the DSCS can minimize the average energy consumed while maintaining desirable average on-time computing completion probability (i.e., 0.9). This is because the IoT devices in the DSCS decide the action regarding acceptance of the computing request by considering their operating environments. For example, if an IoT device has high available computing power and channel gain, it accepts the computing request because it can probably complete the computation within the deadline, which also entails avoiding unnecessary energy consumption. On the contrary, if an IoT device has low available computing power and channel gain, it does not accept the computing request to avoid unnecessary energy consumption5.

FIGURE 3. - Effect of the number of IoT devices 
$N_{I}$
.
FIGURE 3.

Effect of the number of IoT devices N_{I} .

From Figure 3(a), it is observed that the average energy consumption of the DSCS decreases with increasing numbers of IoT devices. This is because the IoT devices in the DSCS consider their neighboring devices when deciding whether to accept the computing request. Specifically, in a situation where numerous IoT devices exist, from the perspective of a specific device, it is expected that the task can be completed with high probability within the deadline even though that device does not accept the request. Therefore, each IoT device accepts the computing request with a lower probability to reduce the amount of energy consumed. Meanwhile, since the other comparison schemes (except CoEdge) do not consider the number of IoT devices N_{I} , their average energy consumption are constant regardless of N_{I} . Note that, all IoT devices in CoEdge compute the evenly splitted the workload, its average energy consumption decreases as N_{I} increases.

C. Effect of \theta_{D}

Figure 4 shows the effect of the target on-time computing completion probability \theta _{D} on the average energy consumption \zeta _{E} . From the figure, it is seen that \zeta _{E} of the DSCS increases as \theta _{D} increases, whereas \zeta _{E} of the other schemes are constant regardless of \theta _{D} . This can be explained as follows. As the number of IoT devices accepting the computing request increases, the probability that at least one of them can complete the computation within the deadline also increases. The IoT devices in the DSCS recognize this fact and accept a computing request aggressively when a higher \theta _{D} is given. However, the other schemes compared herein do not consider the target on-time computing completion probability, so they do not change their policies.

FIGURE 4. - Effect of the target on-time computing completion probability 
$\theta _{D}$
 on the average energy consumption 
$\zeta _{E}$
.
FIGURE 4.

Effect of the target on-time computing completion probability \theta _{D} on the average energy consumption \zeta _{E} .

D. Effect of \lambda _{E}

Figures 5(a) and (b) demonstrate the effects of the average energy units harvested \lambda _{E} on the average energy consumed \zeta _{E} and average on-time computing completion probabilities \zeta _{C} , respectively. Interestingly, from Figure 5(a), it is seen that the average energy consumption \zeta _{E} of all schemes except CE-BASED increase logarithmically with increase in \lambda _{E} . This can be explained as follows. If an IoT device does not have sufficient energy E ( < f_{E}(P) ), it cannot infer the tail model and does not consume any energy. The probability of occurrence of this situation decreases as \lambda _{E} increases. Thus, with the increase in \lambda _{E} , the average on-time computing completion probabilities \zeta _{C} of all the comparison schemes also increase, as shown in Figure 5(b). However, the DSCS does not consume the additional energy to increase the average on-time computing completion probability in excess above a certain level (i.e., 0.9 herein); thus, its average on-time computing completion probability remains unchanged irrespective of \lambda _{E} , as shown in Figure 5(b).

FIGURE 5. - Effect of 
$\lambda _{E}$
.
FIGURE 5.

Effect of \lambda _{E} .

E. Effect of \rho

Figure 6 demonstrates the effect of the average channel gain \rho on the average energy consumption \zeta _{E} . As observed in the figure, \zeta _{E} of the DSCS decreases as \rho increases. This is because the intermediate data can be delivered with a lower latency when \rho is higher, which indicates a higher probability of completing the computation within the deadline. Each IoT device in the DSCS recognizes this condition and does not accept the computing request so as to reduce the amount of energy consumed. However, since the other schemes except CE-BASED do not recognize this condition, their average amounts of energy consumed remain unchanged irrespective of \rho .

FIGURE 6. - Effect of the average channel gain 
$\rho $
 on the average energy consumption 
$\zeta _{E}$
.
FIGURE 6.

Effect of the average channel gain \rho on the average energy consumption \zeta _{E} .

Meanwhile, in CE-BASED, the decision to accept the computing request is based on the channel condition (i.e., the IoT devices accept the request when the channel gain is high). Thus, its average energy consumption \zeta _{E} increases as the average channel gain increases.

F. Effect of \lambda _{C}

Figure 7 demonstrates the effect of the average computing power \lambda _{C} on the average on-time computing completion probability \zeta _{C} . Intuitively, when an IoT device has higher computing power, it can complete inference of the given tail model within a shorter duration. Therefore, as shown in Figure 7, the average on-time computing completion probabilities \zeta _{C} of the other comparison schemes increase with increase in average computing power \lambda _{C} . However, in the DSCS, if the IoT devices have more computing power, they reduce their acceptance probabilities for the computing request to reduce the amount of energy consumed. Thus, the average on-time computing completion probability of the DSCS is maintained at a specific level (i.e., 0.9).

FIGURE 7. - Effect of the average computing power 
$\lambda _{C}$
 on the average on-time computing completion probability 
$\zeta _{C}$
.
FIGURE 7.

Effect of the average computing power \lambda _{C} on the average on-time computing completion probability \zeta _{C} .

G. Effect of D

Figure 8 demonstrates the effect of the deadline D on the average energy consumption \zeta _{E} . When the deadline is farther, even though the IoT devices accept the computing request with a lower probability, the desired on-time computing completion probability can be achieved. In the DSCS, the IoT devices are aware of this fact and can thus reduce energy consumption when D is large, as demonstrated in Figure 8. However, the other comparison schemes do not alter their policies according to the deadline D , so their amounts of energy consumed remain constant.

FIGURE 8. - Effect of the deadline 
$D$
 on the average energy consumption 
$\zeta _{E}$
.
FIGURE 8.

Effect of the deadline D on the average energy consumption \zeta _{E} .

SECTION VI.

Conclusion

In this work, we introduce a distributed split computing system (DSCS) wherein the IoT devices (i.e., requestees) distributively determine acceptance of a split computing request from a specific IoT device by considering the unnecessary energy consumption and computation completion time. For performance optimization, a constrained stochastic game model is developed, and a multipolicy constrained Nash equilibrium is attained using a best-response dynamics-based algorithm. The evaluation results show that the DSCS significantly reduces the amounts of energy consumed by the IoT devices while providing high on-time computing completion probability. Moreover, it is noted that the IoT devices in the DSCS adaptively adjust their actions by considering their neighbors’ actions and operating environments. In the future, we plan to expand the proposed system to also account for multitask learning.

    References

    1.
    V. Sze, Y.-H. Chen, T.-J. Yang and J. S. Emer, "Efficient processing of deep neural networks: A tutorial and survey", Proc. IEEE, vol. 105, no. 12, pp. 2295-2329, Dec. 2017.
    2.
    Y. Matsubara, M. Levorato and F. Restuccia, "Split computing and early exiting for deep learning applications: Survey and research challenges", ACM Comput. Surv., vol. 55, no. 5, pp. 1-30, May 2023.
    3.
    Y. Matsubara and M. Levorato, "Split computing for complex object detectors: Challenges and preliminary results", Proc. 4th Int. Workshop Embedded Mobile Deep Learn., pp. 7-12, Sep. 2020.
    4.
    H. Ko, J. Lee and S. Pack, "Spatial and temporal computation offloading decision algorithm in edge cloud-enabled heterogeneous networks", IEEE Access, vol. 6, pp. 18920-18932, 2018.
    5.
    X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of edge computing and deep learning: A comprehensive survey", IEEE Commun. Surveys Tuts., vol. 22, no. 2, pp. 869-904, 2nd Quart. 2020.
    6.
    Y. Kang, "Neurosurgeon: Collaborative intelligence between the cloud and mobile edge", Proc. ASPLOS, pp. 1-15, Apr. 2017.
    7.
    E. Li, Z. Zhou and X. Chen, "Edge intelligence: On-demand deep learning model co-inference with device-edge synergy", Proc. Workshop Mobile Edge Commun., pp. 31-36, Aug. 2018.
    8.
    S. Laskaridis, S. I. Venieris, M. Almeida, I. Leontiadis and N. D. Lane, "SPINN: Synergistic progressive inference of neural networks over device and cloud", Proc. 26th Annu. Int. Conf. Mobile Comput. Netw., pp. 1-15, Sep. 2020.
    9.
    M. Krouka, A. Elgabli, C. B. Issaid and M. Bennis, "Energy-efficient model compression and splitting for collaborative inference over time-varying channels", Proc. IEEE 32nd Annu. Int. Symp. Pers. Indoor Mobile Radio Commun. (PIMRC), pp. 1173-1178, Sep. 2021.
    10.
    J. Yan, S. Bi and Y.-J.-A. Zhang, "Optimal model placement and online model splitting for device-edge co-inference", IEEE Trans. Wireless Commun., vol. 21, no. 10, pp. 8354-8367, Oct. 2022.
    11.
    A. E. Eshratifar, M. S. Abrishami and M. Pedram, "JointDNN: An efficient training and inference engine for intelligent mobile cloud computing services", IEEE Trans. Mobile Comput., vol. 20, no. 2, pp. 565-576, Feb. 2021.
    12.
    H. Zhou, "BBNet: A novel convolutional neural network structure in edge-cloud collaborative inference", Sensors, vol. 21, no. 13, pp. 1-16, Jun. 2021.
    13.
    W. He, S. Guo, S. Guo, X. Qiu and F. Qi, "Joint DNN partition deployment and resource allocation for delay-sensitive deep learning inference in IoT", IEEE Internet Things J., vol. 7, no. 10, pp. 9241-9254, Oct. 2020.
    14.
    X. Tang, X. Chen, L. Zeng, S. Yu and L. Chen, "Joint multiuser DNN partitioning and computational resource allocation for collaborative edge intelligence", IEEE Internet Things J., vol. 8, no. 12, pp. 9511-9522, Jun. 2021.
    15.
    S. Wang and X. Zhang, "NeuroMessenger: Towards error tolerant distributed machine learning over edge networks", Proc. IEEE INFOCOM Conf. Comput. Commun., pp. 2058-2067, May 2022.
    16.
    S. Wang, X. Zhang, H. Uchiyama and H. Matsuda, "HiveMind: Towards cellular native machine learning model splitting", IEEE J. Sel. Areas Commun., vol. 40, no. 2, pp. 626-640, Feb. 2022.
    17.
    Y. Matsubara, R. Yang, M. Levorato and S. Mandt, "SC2 benchmark: Supervised compression for split computing", arXiv:2203.08875, 2022.
    18.
    H. Ahn, M. Lee, C.-H. Hong and B. Varghese, "ScissionLite: Accelerating distributed deep neural networks using transfer layer", arXiv:2105.02019, 2021.
    19.
    L. Zeng, X. Chen, Z. Zhou, L. Yang and J. Zhang, "CoEdge: Cooperative DNN inference with adaptive workload partitioning over heterogeneous edge devices", IEEE/ACM Trans. Netw., vol. 29, no. 2, pp. 595-608, Apr. 2021.
    20.
    E. Samikwa, A. D. Maio and T. Braun, "Adaptive early exit of computation for energy-efficient and low-latency machine learning over IoT networks", Proc. IEEE 19th Annu. Consum. Commun. Netw. Conf. (CCNC), pp. 200-206, Jan. 2022.
    21.
    S. Zhang, S. Zhang, Z. Qian, J. Wu, Y. Jin and S. Lu, "DeepSlicing: Collaborative and adaptive CNN inference with low latency", IEEE Trans. Parallel Distrib. Syst., vol. 32, no. 9, pp. 2175-2187, Sep. 2021.
    22.
    A. Bakhtiarnia, N. Miloševic, Q. Zhang, D. Bajovic and A. Iosifidis, "Dynamic split computing for efficient deep EDGE intelligence", Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), pp. 1-5, Jun. 2023.
    23.
    F. Cunico, L. Capogrosso, F. Setti, D. Carra, F. Fummi and M. Cristani, "I-SPLIT: Deep network interpretability for split computing", Proc. 26th Int. Conf. Pattern Recognit. (ICPR), pp. 2575-2581, Aug. 2022.
    24.
    T. A. Khoa, D.-V. Nguyen, M.-S. Dao and K. Zettsu, "SplitDyn: Federated split neural network for distributed edge AI applications", Proc. IEEE Int. Conf. Big Data (Big Data), pp. 6066-6073, Dec. 2022.
    25.
    LoRa Alliance LoRaWAN L2 1.0.4 Specification, Jul. 2023, [online] Available: https://lora-alliance.org/resource_hub/lorawan-104-specification-package/.
    26.
    A. Lavric and V. Popa, "Internet of Things and LoRa low-power wide-area networks: A survey", Proc. Int. Symp. Signals Circuits Syst. (ISSCS), pp. 1-5, Jul. 2017.
    27.
    E. Altman, K. Avrachenkov, N. Bonneau, M. Debbah, R. El-Azouzi and D. S. Menasche, "Constrained cost-coupled stochastic games with independent state processes", Oper. Res. Lett., vol. 36, no. 2, pp. 160-164, Mar. 2008.
    28.
    H. Ko and S. Pack, "Function-aware resource management framework for serverless edge computing", IEEE Internet Things J., vol. 10, no. 2, pp. 1310-1319, Jan. 2023.
    29.
    R. Xie, Q. Tang, C. Liang, F. R. Yu and T. Huang, "Dynamic computation offloading in IoT fog systems with imperfect channel-state information: A POMDP approach", IEEE Internet Things J., vol. 8, no. 1, pp. 345-356, Jan. 2021.
    30.
    X. Wang, J. Gong, C. Hu, S. Zhou and Z. Niu, "Optimal power allocation on discrete energy harvesting model", EURASIP J. Wireless Commun. Netw., vol. 2015, no. 1, pp. 1-14, Dec. 2015.