- 边缘智能-模型分割
- 边缘智能-内容缓存
- 边缘智能-强化学习
1.Distributed DNN Inference with Fine-grained Model Partitioning in Mobile Edge Computing Networks
Hui Li*, Xiuhua Li, Qilin Fan and Qiang He, Xiaofei Wang, and Victor C. M. Leung
- We present a fine-grained model partitioning mechanism that supports distributed DNN inference with the collaboration of ESs and IoT devices, for significantly reducing the DNN inference delay with specific delay constraints. We formulate the optimization problem as a Markov Decision Process (MDP) with the objective to maximize the long-term discounted cumulative reward of distributed DNN inference.
- We propose a novel multi-task learning based A3C approach to search for an appropriate fine-grained model partitioning policy. Specifically, we employ soft parameter sharing to integrate the shared layers
of both the actor-network and critic-network, and expand the output layer into multiple branches to determine the fine-grained model partitioning policy for each individual DNN block. It can significantly reduce the action space of DRL agents, thereby reducing the training time of the proposed approach.
- We evaluate the performance of the proposed approach through extensive experiments conducted on widely-used datasets in MEC networks. Simulation results show that the proposed approach can significantly reduce the total inference delay, edge inference delay and local inference delay.
2.Collaborative DNNs Inference with Joint Model Partition and Compression in Mobile Edge-Cloud Computing Networks
- We investigate the joint DNN partition and compression in edge-cloud networks to achieve a trade-off between the latency and accuracy of collaborative DNN inference.
- We propose a dual-agent DRL algorithm CPCDRL. The two agents collaborate to autonomously determine the location of partition point and the corresponding compression rates.
- We conduct extensive simulation experiments to evaluate the performance of the proposed algorithm. The results demonstrate that our proposed algorithm significantly reduces inference latency compared to baseline schemes by 10% to 67% while incurring minimal accuracy loss.