Introduction
Deep neural networks (DNNs) are currently the most frequently used machine learning approach in intelligent mobile applications and have grown more popular owing to their accurate and reliable inference capability [1]. Meanwhile, despite the recent improvements in the computing capabilities of IoT devices, their performances fall far short of that of cloud computing. Thus, when conducting inference for the entire DNN model, a sufficiently low latency cannot be achieved. In addition, the battery capacities of IoT devices have severe limitations, especially for inference with high complexity. Therefore, there is increasing interest in the split computing approach [2], [3]. In this approach, the DNN is split into two subnetworks (i.e., head and tail models), and the head and tail models are distributed between the IoT device and cloud, respectively. The IoT device first conducts inference of the head model to obtain intermediate data (i.e., output of the head model). It then sends this intermediate data to the cloud. Using the intermediate data as the input, the cloud processes the tail model sequentially. However, this split computing approach suffers from high network latency between the IoT device and cloud, especially when the cloud is located far from the IoT device [4], [5].
To mitigate this problem, we introduce a distributed split computing system (DSCS); here, an IoT device simply determines whether to use the split computing approach and the splitting point by considering its available computing power as well as computing deadline. If the IoT device decides to use the split computing approach, it conducts inference of the head model based on the splitting point. Then, the IoT device (which is the split computing requester) broadcasts a split computing request that includes the splitting point, intermediate data, computing latency of the head model, and computing deadline to its neighboring IoT devices. After receiving the request, the neighboring IoT devices (i.e., requestees) distributively determine whether or not to accept the split computing request by taking into account the unnecessary energy consumption and computation time. Because the total number of IoT devices (i.e., requestees) accepting the split computing request affects the amount of energy consumed and the probability of completing the computations on time, each IoT device should consider the actions of its neighboring IoT devices. In this context, we formulate a constrained stochastic game model and utilize a best-response dynamics-based algorithm to obtain the multipolicy constrained Nash equilibrium with minimized energy consumption while maintaining desirable on-time computing completion probability. The evaluation results show that the DSCS consumes can reduce more than 20% energy consumption compared to a probabilistic-based acceptance scheme, where the IoT devices (i.e., requestees) accept a split computing request based on a predefined probability, while providing high on-time computing completion probability. Moreover, it is found that the best-response dynamics-based algorithm converges quickly to the Nash equilibrium within a few iterations.
The main contributions of this study are as follows: 1) the proposed system is a pioneering effort in which the actions of the split computing requestees are distributively decided to optimize the performance of the split computing system; 2) the optimal policy of the requestees regarding acceptance of the computing request can be obtained in a few iterations, indicating that the proposed algorithm can be implemented in actual systems without significant signaling cost; 3) we show and scrutinize the evaluation results under various conditions to provide guidance for constructing a DSCS.
The remainder of this manuscript is structured as follows. The related works are detailed in Section II, and the proposed DSCS is described in Section III. The stochastic game model development is detailed in Section IV. The evaluations are reviewed in Section V, and the final conclusions are summarized in Section VI.
Related Work
Many reported studies have investigated the possibility of lowering task completion times in split computing environments [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18].
Kang et al. [6] created an automated two-step model splitting technique. In the first step, performance prediction models are created for each layer. In the second step, the splitting point is dynamically determined from the prediction models by considering the importance of performance metrics. Li et al. [7] suggested a model splitting framework that considers early exit and allows the inference task to be undertaken at an appropriate intermediary layer; they decided the exit and splitting points concurrently to maximize the accuracy of inference while ensuring that the task completion time remained below a specified threshold. Laskaridis et al. [8] presented a system that continually monitors the resources of the edge cloud and mobile device to decide the splitting point by considering application requirements. Krouka et al. [9] introduced a technique involving pruning and compression before splitting the DNN model to minimize energy consumption by the mobile device while assuring correctness of inference. Yan et al. [10] jointly optimized placing and splitting of the model to minimize energy consumption and reduce task completion time while accounting for the network dynamics. Eshratifar et al. [11] determined several optimal splitting points by converting the presented model into a well-known one to exploit existing algorithms. Zhou et al. [12] suggested a strategy to minimize task completion time by pruning the model and compressing the intermediate data. He et al. [13] exploited a queuing model for task completion time to formulate a joint optimization problem regarding the splitting point and resource allocation, which can be divided into subproblems; moreover, they designed a heuristic algorithm to solve the subproblems sequentially. Tang et al. [14] designed an algorithm that uses the structural characteristics of the model splitting problem to obtain its solution in polynomial time. Wang and Zhang [15] designed a split computing architecture that exploits the error-tolerant characteristics of the intermediate data to reduce the communication overhead; in this architecture, the controller decides if retransmission is needed depending on the error rate. Wang et al. [16] proposed a multiple-splitting-points decision system that determines several optimal splitting points in real time with low signaling overhead. Matsubara et al. [17] suggested a supervised compression method that discretizes the intermediate data to avoid high communication overhead. Ahn et al. [18] introduced a system in which the DNN model is partitioned and deployed between the IoT device and cloud to improve inference accuracy and reduce task completion time. In [19] and [20], to minimize system energy consumption, authors developed a distributed DNN computing system orchestrating cooperative inference among multiple IoT devices by considering available computing power and network condition of IoT devices. Zhang et al. [21] introduced a collaborative and adaptive inference system that can handle various types of DNN models and optimize the tradeoff between the computation and synchronization. In [22] and [23], authors introduced a method dynamically detecting the best splitting point for a given DNN based on the communication channel state, batch size, and multiclass categorization. In [24], authors proposed a novel framework for the split computing, in which a round-robin schedule to select a device and Hungarian optimization algorithm to assign a layer to the device are exploited.
However, there are no existing works for optimizing the split computing performance from the perspective of the requestees in a distributed manner.
Distributed Split Computing System
Figure 1 shows the proposed DSCS in which an IoT device (i.e., requester) generates the computing task periodically. During the computing task, the IoT device checks its available computing power and task deadline. If the available power is sufficient to complete the task within the deadline, the IoT device performs inference for the entire DNN model. Otherwise, the IoT device decides the splitting point according to its available computing power. Specifically, the entire DNN model is split at the
After receiving the split computing request, the neighboring IoT devices (i.e., requestees) distributively determine whether to accept the split computing request by considering the unnecessary energy consumption and computation time. If the IoT device determines not to accept the request, it does nothing. Otherwise, it conducts inference of the tail model with the
As the number of IoT devices accepting the request increases, the probability that at least one IoT device can complete the computations within the deadline (i.e., on-time computing completion probability) also increases. However, excessive duplicate acceptances increase the total amount of energy consumed. This means that each IoT device should consider the actions of others to achieve tradeoff between unnecessary energy consumption and on-time computing completion probability. Accordingly, we develop a constrained stochastic game model to minimize the energy consumed while ensuring that the on-time computing completion probability remains above a specified threshold; this model is explained in the following section.
Constrained Stochastic Game
In this section, we present the development of a constrained stochastic game model [27], [28] to accomplish distributed implementation of the split computing service. In the game,
A. State Space
Let \begin{equation*} {\mathbf {S}}_{\mathbf {i}} = {\mathbf { P}}_{\mathbf {i}} \times {\mathbf {C}}_{\mathbf {i}} \times {\mathbf {H}}_{\mathbf {i}} \times {\mathbf {E}}_{\mathbf {i}}, \tag{1}\end{equation*}
Without loss of generality, we assume that the entire DNN model consists of \begin{equation*} {\mathbf {P}}_{\mathbf {i}} = \left \{{ {1,2,\ldots,M} }\right \}, \tag{2}\end{equation*}
When \begin{equation*} {\mathbf {C_{i}}} = \left \{{ {u_{C},2u_{C}, \ldots,C^{\max } } }\right \}, \tag{3}\end{equation*}
Since the channel is quantized to \begin{equation*} {\mathbf {H_{i}}} = \left \{{{h_{1}},{h_{2}},\ldots,{h_{Q}}}\right \}, \tag{4}\end{equation*}
When \begin{equation*} {\mathbf {E_{i}}} = \left \{{ {0,1,2,\ldots,E^{\max } } }\right \}, \tag{5}\end{equation*}
B. Action Space
Let \begin{equation*} {\mathbf {A}}_{\mathbf {i}} = \left \{{ {0,1} }\right \}. \tag{6}\end{equation*}
C. Transition Probability
Let \begin{equation*} P[S'_{i} |S_{i},A_{i}] = P[P'_{i} |P_{i}] \times P[C'_{i} |C_{i}] \times P[H'_{i} |H_{i}] \times P[E'_{i} |E_{i},P_{i},A_{i}]. \tag{7}\end{equation*}
According to the source condition, the harvesting energy volume changes. Therefore, it can be assumed that the harvested energy \begin{align*} P[E'_{i} |E_{i} \ne E^{\max },P_{i},A_{i} = 0] &= \begin{cases} \displaystyle P_{E}(\lambda _{E},k), &\text {if} E'_{i} = E_{i} + k \\ \displaystyle 0, &\text {otherwise}. \end{cases} \tag{8}\\ P[E'_{i} |E_{i} = E^{\max },P_{i},A_{i} = 0] &= \begin{cases} \displaystyle 1, &\text {if} E'_{i} = E_{i} \\ \displaystyle 0, &\text {otherwise}. \end{cases} \tag{9}\end{align*}
When the IoT device \begin{align*} P[E'_{i} |E_{i} \ge f_{E} \left ({{P_{i} } }\right),P_{i},A_{i} = 1] &= \begin{cases} \displaystyle P_{E}(\lambda _{E},k), &\text {if} E'_{i} = E_{i} + k - f_{E} \left ({{P_{i} } }\right) \\ \displaystyle 0, &\text {otherwise}. \end{cases} \tag{10}\\ P[E'_{i} |E_{i} < f_{E} \left ({{P_{i} } }\right),P_{i},A_{i} = 1] &= \begin{cases} \displaystyle P_{E}(\lambda _{E},k), &\text {if} E'_{i} = E_{i} + k \\ \displaystyle 0, &\text {otherwise}. \end{cases} \tag{11}\end{align*}
It is assumed that the available computing power of the requesting IoT device follows a discrete uniform distribution [31]. Then, since the requesting IoT device decides the splitting point according to its available computing power, the transition probability of \begin{align*} P[P'_{i} |P_{i}] = \begin{cases} \displaystyle \frac {1}{M}, &\text {if} P'_{i} \in \left \{{ {1, 2, \ldots, M} }\right \} \\ \displaystyle 0, &\text {otherwise}. \end{cases} \tag{12}\end{align*}
The available computing power of the IoT device \begin{align*} P [C'_{i}|C_{i}]=\begin{cases} \displaystyle P_{C}(\lambda _{C},k), &\text {if} C'_{i}=k\\ \displaystyle 0,&\text {otherwise}. \end{cases} \tag{13}\end{align*}
In the channel gain with \begin{align*} P[H'_{i} |H_{i}] = \begin{cases} \displaystyle F(h_{q+1})-F(h_{q}), &\text {if} h_{q} \leq H'_{i} < h_{q+1} \\ \displaystyle 0, &\text {otherwise}, \end{cases} \tag{14}\end{align*}
D. Cost Function
The energy consumption of the IoT device is exploited as the cost function \begin{equation*} r\left ({{S_{i},A_{i} } }\right) = A_{i} f_{E} \left ({{P_{i} } }\right). \tag{15}\end{equation*}
E. Constraint Function
To provide high on-time computing completion probability, the constraint function
Because the size of output the DNN model output is generally much smaller than that of the intermediate data [33], the latency for result transmission from the IoT device \begin{equation*} L = L^{H} + L_{i}^{D} + L_{i}^{T}. \tag{16}\end{equation*}
Note that the computing latency \begin{equation*} L_{i}^{D} = \frac {{f_{D} \left ({{P_{i} } }\right)}}{T_{i}^{R} }, \tag{17}\end{equation*}
\begin{equation*} \gamma = \frac {{P_{T} \left |{ {H_{i} } }\right |^{2} }}{\sigma ^{2} }, \tag{18}\end{equation*}
The computing latency \begin{equation*} L_{i}^{T} = \frac {{f_{F} \left ({{P_{i} } }\right)}}{C_{i} }, \tag{19}\end{equation*}
The probability \begin{equation*} \lambda _{i} = P\left [{ {A_{i} = 1} }\right] \cdot P\left [{ {L^{H} + L_{i}^{D} + L_{i}^{T} < D} }\right], \tag{20}\end{equation*}
F. Optimization Formulation
Let \begin{equation*} \zeta _{E} \left ({\pi }\right) = \lim \limits _{T \to \infty } \frac {1}{T}\sum \limits _{t = 1}^{T} {E_{\pi} \left [{ {r\left ({{S^{t},A^{t} } }\right)} }\right]}, \tag{21}\end{equation*}
Meanwhile, IoT device \begin{equation*} \zeta _{C} \left ({\pi }\right) = \lim \limits _{T \to \infty } \frac {1}{T}\sum \limits _{t = 1}^{T} {E_{\pi} \left [{ {c\left ({{S^{t},A^{t} } }\right)} }\right]} \ge \theta _{D}, \tag{22}\end{equation*}
The multi-policy
Based on the LP problem, we can obtain the best response policy of IoT device
To minimize the average energy consumption of the IoT device \begin{equation*} \min \limits _{\phi (S,A)} \sum \limits _{S} {\sum \limits _{A} {{\phi _{i,{\pi _{ - i}}}}\left ({{S_{i},{A_{i}}} }\right)r({S_{i}},{A_{i}})} }. \tag{23}\end{equation*}
Since the average on-time computing completion probability should be maintained above the target on-time computing completion probability \begin{equation*} \sum \limits _{S} {\sum \limits _{A} {{\phi _{i,{\pi _{ - i}}}}\left ({{S_{i},{A_{i}}} }\right)c({S_{i}},{A_{i}})} } \ge {\theta _{D}}. \tag{24}\end{equation*}
For the Chapman-Kolmogorov equation [37], we have \begin{align*} \sum \limits _{A} {{\phi _{i,{\pi _{ - i}}}}\left ({{S'_{i},{A_{i}}} }\right)} = \sum \limits _{S} {\sum \limits _{A} {{\phi _{i,{\pi _{ - i}}}}\left ({{S_{i},{A_{i}}} }\right)P[{S'_{i}}|{S_{i}},{A_{i}}]} }. \tag{25}\end{align*}
The fundamental properties of the probability are constrained by \begin{equation*} \sum \limits _{S} {\sum \limits _{A} {{\phi _{i,{\pi _{ - i}}}}\left ({{S_{i}',{A_{i}}} }\right)} } = 1 \tag{26}\end{equation*}
\begin{equation*} {\phi _{i,{\pi _{ - i}}}}\left ({{S_{i}',{A_{i}}} }\right) \ge 0. \tag{27}\end{equation*}
If a feasible solution exists for the LP problem, then the stationary best response policy of IoT device \begin{equation*} {\pi ^{*} _{i}}\left ({{S_{i},{A_{i}}} }\right) = \frac {{\phi ^{*} _{i,{\pi _{ - i}}}}\left ({{S_{i},{A_{i}}} }\right)} {\sum \limits _{A'_{i} \in \mathbf {A_{i}}}{\phi ^{*} _{i,{\pi _{ - i}}}}\left ({{S_{i},{A'_{i}}}}\right)}. \tag{28}\end{equation*}
To obtain the optimal policies of the IoT devices, we design a best response dynamics-based algorithm (see Algorithm 1). First, each IoT device generates its policy randomly (line 1 in Algorithm 1). Next, each IoT device
Algorithm 1 Best Response Dynamics-Based Algorithm
Generate the random policy
repeat
for each IoT device
Calculate the probability
Transmit the probability
Calculate the probability
Solve the LP problem to obtain the optimal policy
end for
until All policies of IoT devices are converged
The LP problem is generally solved with low complexity [38]. For example, Vaidya’s algorithm [38], which is one of the traditional methods of solving the LP problem, has a polynomial complexity of
Evaluation Results
To evaluate the performance of the proposed system, we built a simulation program using MATLAB. We simulated a cooperative IoT based system with
For the performance evaluation, we compare the proposed system to five schemes: 1) ALWAYS, where the IoT devices always accept the computing request; 2) CE-BASED [29], where the IoT devices perform channel estimation with imperfect channel-state information; 3) RAND, where the IoT devices randomly accept the computing request; 4) P-BASED, where the IoT devices accept a computing request with probability
A. Convergence to Nash Equilibrium
Figure 2 describes the process of convergence to the Nash equilibrium. Herein, the initial probabilities of accepting the computing requests of IoT devices 1, 2, 3, and 4 are set to 0.9, 0.5, 0.1, and 0.8, respectively. In addition, the computing power of IoT device 1 is set to 4, whereas those of IoT devices 2, 3, and 4 are all set to 5. As shown in Figure 2, the policies of IoT devices converge to the Nash equilibrium only after 3 iterations. This indicates that each IoT device needs to transmit its completion probability
From Figure 2, it is evident that the action of an IoT device is based on the other IoT devices. For example, in this result, IoT device 1 has a lower computing power than the other devices; hence, its acceptance of the computing request is not helpful for increasing the on-time computing completion probability. In other words, the acceptance of the computing request by IoT device 1 can be considered as unnecessary energy consumption. In this situation, IoT device 1 does not probably accept the request (i.e., it has low probability of accepting the computing request). Considering this fact, the other IoT may devices accept the computing request with a greater probability.
B. Effect of N_{I}
Figures 3(a) and (b) show the effects of the number of IoT devices
From Figure 3(a), it is observed that the average energy consumption of the DSCS decreases with increasing numbers of IoT devices. This is because the IoT devices in the DSCS consider their neighboring devices when deciding whether to accept the computing request. Specifically, in a situation where numerous IoT devices exist, from the perspective of a specific device, it is expected that the task can be completed with high probability within the deadline even though that device does not accept the request. Therefore, each IoT device accepts the computing request with a lower probability to reduce the amount of energy consumed. Meanwhile, since the other comparison schemes (except CoEdge) do not consider the number of IoT devices
C. Effect of \theta_{D}
Figure 4 shows the effect of the target on-time computing completion probability
Effect of the target on-time computing completion probability
D. Effect of \lambda _{E}
Figures 5(a) and (b) demonstrate the effects of the average energy units harvested
E. Effect of \rho
Figure 6 demonstrates the effect of the average channel gain
Effect of the average channel gain
Meanwhile, in CE-BASED, the decision to accept the computing request is based on the channel condition (i.e., the IoT devices accept the request when the channel gain is high). Thus, its average energy consumption
F. Effect of \lambda _{C}
Figure 7 demonstrates the effect of the average computing power
Effect of the average computing power
G. Effect of D
Figure 8 demonstrates the effect of the deadline
Conclusion
In this work, we introduce a distributed split computing system (DSCS) wherein the IoT devices (i.e., requestees) distributively determine acceptance of a split computing request from a specific IoT device by considering the unnecessary energy consumption and computation completion time. For performance optimization, a constrained stochastic game model is developed, and a multipolicy constrained Nash equilibrium is attained using a best-response dynamics-based algorithm. The evaluation results show that the DSCS significantly reduces the amounts of energy consumed by the IoT devices while providing high on-time computing completion probability. Moreover, it is noted that the IoT devices in the DSCS adaptively adjust their actions by considering their neighbors’ actions and operating environments. In the future, we plan to expand the proposed system to also account for multitask learning.